Next Article in Journal
Bayesian Spatio-Temporal Modeling of the Dynamics of COVID-19 Deaths in Peru
Previous Article in Journal
Analytic Formulae for T Violation in Neutrino Oscillations
 
 
Article
Peer-Review Record

Revisiting the Transferability of Few-Shot Image Classification: A Frequency Spectrum Perspective

Entropy 2024, 26(6), 473; https://doi.org/10.3390/e26060473
by Min Zhang 1,2, Zhitao Wang 2 and Donglin Wang 2,*
Reviewer 1:
Reviewer 2: Anonymous
Entropy 2024, 26(6), 473; https://doi.org/10.3390/e26060473
Submission received: 22 April 2024 / Revised: 27 May 2024 / Accepted: 29 May 2024 / Published: 29 May 2024
(This article belongs to the Section Multidisciplinary Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study explores the issue of performance decline in few-shot image classification (FSIC) from a frequency spectrum perspective, introducing the Frequency Spectrum Mask (FRSM) method to enhance transferability by mitigating the impact of non-causal frequencies, demonstrating significant improvements across nine datasets.

Additional Comments:

1) Why were parts (a) and (b) of Figure 3 not combined? Combining these could clarify the related details and enhance the figure’s explanatory power.

2) The baseline comparisons used in the study appear outdated. Updating these with more recent methodologies could strengthen the validation of the FRSM method’s effectiveness, such as:

[a] Graph Complemented Latent Representation for Few-Shot Image Classification. TMM '23

[b] Graph Neural Networks With Triple Attention for Few-Shot Learning. TMM '23

3) There is an error in the citation of LEO in Table 1. Please correct the reference to ensure accuracy and reliability of the citations.

Overall, the vision of the work is impressive, but the actual scope of work appears somewhat limited.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper investigates the problem of few-shot classification and focuses on understanding the transferability of extracted features across datasets through a frequency-based analysis.

The idea is interesting and the results indeed validate the author's claims. Furthermore, the paper is well written and the use of the English language is overall correct.   

Some comments that could enhance the quality of the paper are the following

Reference to “LEO” in section 2.1 is missing

It is not clear how causality is introduced. How are causal and non-causal components identified, i.e., how do different components get assigned to either the “causal” or “non-causal” components?

What is the impact of the support sample set size?

In Section 3.1, the problem of few shot classification is described as a two-step process, i.e. (i) pre-training and (ii) test-tuning phase. One could argue that pre-training is an intelligent initialization of the model so all the effort is focused on the fine-tuning step and the subsequent evaluation. In that sense, the discussion “The pre-trained model undergoes re-learning based on the few labeled images S at each gradient step and subsequently undergoes testing on the unlabeled images Q.” lacks sufficient detail.

 

The following text appears to be missing something “Because zAi represents the magnitude of the frequency.”

 

The discussion on how Fig 1 was created should be expanded by including information such the type of layer that was considered as input to the t-SNE, the feature dimensionality, number of training examples, and stability of performance as some indicative examples.

 

In Section 1, it is stated that “Figure 2 showcases the average 48

amplitudes of eigenfrequencies across four testing datasets using a pre-trained model”

Please provide more details regarding the characteristics of the actual signal where the eigendecomposition takes place, i.e., is the eigenanalysis applied

 

Provide results on the following assertion “For example, consider a scenario where the dog 55

images in the training data feature grass backgrounds, this scenario poses a challenge 56

during the testing phase when encountering the dog image with a water background, as 57

 

the inconsistent background information hampers recognition.”

Comments on the Quality of English Language

use of the English language is overall correct

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop