Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

ASK-ViT: A Model with Improved ViT Robustness through Incorporating SK Modules Using Adversarial Training

Electronics 2022, 11(20), 3370; https://doi.org/10.3390/electronics11203370

by Youkang Chang^*

, Hong Zhao and Weijie Wang

Reviewer 1:

Nyothiri Aung

Reviewer 2:

Yusuf Perwej

Reviewer 3:

Thanh-Phong Dao

Electronics 2022, 11(20), 3370; https://doi.org/10.3390/electronics11203370

Submission received: 27 September 2022 / Revised: 16 October 2022 / Accepted: 17 October 2022 / Published: 19 October 2022

(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

This paper studies the robustness of ViT model in the face of adversarial 10
example attacks, and proposes the ASK-ViT model with improving robustness by introducing the SK module. Specifically, The authors have an ASK-ViT model that adaptively adjusts the size of the perceptual field, extracts multi-scale spatial information of the features, and enhances the model's robustness in the face of different adversarial attacks.
The paper is well-written, the contributions are well-defined, and most importantly, the novelty enriches the existing body of knowledge in the field. However, I do have a few comments that could help you improve your manuscript.
- add the paper outline at the end of the introduction
-what is the metric and unit of measurement in table 4,5,6,7. currently you refer to it as 'performance', you need to define what metirc you have used
-highlight the best result in bold in table 4,5,6,7. so the reader can easily see the best-performing model.
-some related references are missing:
[1]Messina, Nicola, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, and Stéphane Marchand-Maillet. "Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders." ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, no. 4 (2021): 1-23.
[2] Chen, Haoyuan, et al. "GasHis-Transformer: A multi-scale visual transformer approach for gastric histopathological image detection." Pattern Recognition 130 (2022): 108827.
[3] Zhang, Wenyin, Yong Wu, Bo Yang, Shunbo Hu, Liang Wu, and Sahraoui Dhelim. "Overview of multi-modal brain tumor mr image segmentation." In Healthcare, vol. 9, no. 8, p. 1051. MDPI, 2021.

Author Response

Point 1. add the paper outline at the end of the introduction.

Response 1: Thanking the reviewers for their comments. We have added an outline of the paper at the end of the introduction, in lines 75-79.

Point 2. what is the metric and unit of measurement in table 4,5,6,7. currently you refer to it as 'performance', you need to define what metirc you have used.

Response 2: In table 4,5,6,7, we use accuracy as an evaluation metric. The metrics we used are defined in lines 324-330 of the paper.

Point 3. highlight the best result in bold in table 4,5,6,7. so the reader can easily see the best-performing model.

Response 3: Thanking the reviewers for their comments. We have bolded the data results in the table4,5,6,7.

Point 4. some related references are missing:

[1]Messina, Nicola, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, and Stéphane Marchand-Maillet. "Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders." ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, no. 4 (2021): 1-23.

[2] Chen, Haoyuan, et al. "GasHis-Transformer: A multi-scale visual transformer approach for gastric histopathological image detection." Pattern Recognition 130 (2022): 108827.

[3] Zhang, Wenyin, Yong Wu, Bo Yang, Shunbo Hu, Liang Wu, and Sahraoui Dhelim. "Overview of multi-modal brain tumor mr image segmentation." In Healthcare, vol. 9, no. 8, p. 1051. MDPI, 2021.

Response 4: We have added relevant literature to the paper, such as the literature3,4,9.

Author Response File: Author Response.pdf

Reviewer 2 Report

This Paper is better defined and the Parameter Setting is good.

But the less elaborate conclusion, needs for widely discussed in the conclusion.

Author Response

Response to Reviewer 2 Comments

Point 1.This Paper is better defined and the Parameter Setting is good.

But the less elaborate conclusion, needs for widely discussed in the conclusion.

Response 1: Thanking the reviewers for their comments. We have added to the conclusions, elaborated them in more detail, and added a comparison chart of the different methods in extracting features in Section 4.4.1.

Author Response File: Author Response.pdf

Reviewer 3 Report

This study investigates the robustness of ViT model in the face of adversarial example attacks, and proposes the ASK-ViT model with improving robustness by introducing the SK module.

Comments: This paper presents details of material, method, simulated experiment and comparison. It can be considered for a possible publication with an improvement.

1. All section title should be written in a full form. Should not be abbreviated.

2. Section 3: a paragraph describing figure 1 should be located before this figure. Details of this figure must be explained.

3. Section 4.3, line 316: “The parameters of the model were optimized during training using the Adam optimizer” the authors are suggested to explain why need to optimize and how to determine an optimum set.

Comments for author File: Comments.pdf

Author Response

Response to Reviewer 3 Comments

This study investigates the robustness of ViT model in the face of adversarial example attacks, and proposes the ASK-ViT model with improving robustness by introducing the SK module.

Comments: This paper presents details of material, method, simulated experiment and comparison. It can be considered for a possible publication with an improvement.

Point 1: All section title should be written in a full form. Should not be abbreviated.

Response 1: Thanking the reviewers for their comments. We have revised this and the section title has been written in a full form.

Point 2: Section 3: a paragraph describing figure 1 should be located before this figure. Details of this figure must be explained.

Response 2: We have moved the description of the graph to before Figure 1 and described the flow of the neural network.

Point 3: Section 4.3, line 316: “The parameters of the model were optimized during training using the Adam optimizer” the authors are suggested to explain why need to optimize and how to determine an optimum set.

Response 3: We have redescribed the parameter settings in lines 332-339 of the paper and explained asking what needs to be optimized.

To address the question of how to determine the optimal set, it should be noted that the Adam optimizer combines first-order momentum and second-order momentum. The adam is updated automatically when the gradient is updated, its parameters are dynamically changing and its values depend on the learning rate, weight decay, optimizer epsilon, momentum.

Author Response File: Author Response.pdf

Article Menu

ASK-ViT: A Model with Improved ViT Robustness through Incorporating SK Modules Using Adversarial Training

Further Information

Guidelines

MDPI Initiatives

Follow MDPI