Next Article in Journal
Oil Tank Detection and Recognition via Monogenic Signal
Previous Article in Journal
Approximation of a Convective-Event-Monitoring System Using GOES-R Data and Ensemble ML Models
 
 
Article
Peer-Review Record

GeoViewMatch: A Multi-Scale Feature-Matching Network for Cross-View Geo-Localization Using Swin-Transformer and Contrastive Learning

Remote Sens. 2024, 16(4), 678; https://doi.org/10.3390/rs16040678
by Wenhui Zhang 1, Zhinong Zhong 1, Hao Chen 1,2,* and Ning Jing 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2024, 16(4), 678; https://doi.org/10.3390/rs16040678
Submission received: 1 January 2024 / Revised: 7 February 2024 / Accepted: 12 February 2024 / Published: 14 February 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper introduces a new lightweight neural network for cross-view geo-localization using Swin-Transformer and contrastive learning. The proposed method effectively addresses the challenges of establishing accurate and robust dependencies between street-view images and remote sensing images by extracting multi-scale features and employing contrastive learning. Comprehensive experiments demonstrate its superiority over state-of-the-art methods. Generally speaking, this paper is well-organized and easy to follow; however, there are still some concerns that should be improved.

1. The Abstract lacks brevity and conciseness, with repetitive explanations of the proposed method for cross-view geo-localization. It could benefit from a more streamlined approach to highlight the paper’s main contributions.

2. The Introduction provides comprehensive information about the challenges and methods in cross-view geo-localization, but it could benefit from more concise and focused content. It may be streamlined by removing redundancies and structuring the text to emphasize the specific contributions of the paper and the gap it addresses in the existing research.

3. The Related Work section could be improved by focusing on the key points and contributions of the various related methods without delving into extensive descriptions of each approach, thereby providing a clearer outline of the existing methods and their limitations. Additionally, the focus should be on the relevance of these methods to the current study and how they contribute to addressing the identified gaps in the literature.

4. The Methodology section should explicitly highlight the key innovations of the proposed method. Additionally, the text could be enhanced by providing a clearer explanation of the Swin-Geo module and how it operates to extract multi-scale features.

5. In the experimental comparison, it is necessary to include comparisons with other Transformer-based methods to make the experimental results more persuasive.

6. The Conclusion section could benefit from a more comprehensive and contextualized summary of the contributions and significance of the proposed method. Specific details such as the improvements in performance and efficiency need to be more clearly emphasized and linked to the broader significance of the findings in the field of cross-view geo-localization. Additionally, discussing potential areas for further exploration based on the study’s outcomes.

Author Response

I would like to extend my heartfelt gratitude to the reviewers for their insightful comments and suggestions. Based on your feedback, I have carefully revised the manuscript. A detailed description of the modifications made can be found in the attached document. Once again, thank you for your valuable input.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Authors in this paper present GeoViewMatch: A Multi-Scale Feature Matching Network for Cross-view Geo-localization Using Swim-transformer and Contrastive Learning. This paper  can be accepted, but there are several issues that need to be fixed.

1. Figure 2 doesn't contain (Ik and street-view images Ip as inputs) please write I_k and I_p on the figure.

2. How to share the weights, why, and what is the time complexity to achieve this task?

3. What is the mean of project head 

4. Figure 2 is not logic how the output is loss function?

5. I think the results are good but authors can add Statistical Analysis such Friedman test.

 

 

 

Author Response

I would like to extend my heartfelt gratitude to the reviewers for their insightful comments and suggestions. Based on your feedback, I have carefully revised the manuscript. A detailed description of the modifications made can be found in the attached document. Once again, thank you for your valuable input.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This article suggest improvements in street view and image (remote sensing) matching.
The improvements in accuracy are not of significant amounts, the improvements in
speed of algorithm are significant.  But the article does not do a good job of stating
this.  It is really comparable to previous best results in significantly less time.
But since it is really only one test it is very hard to justify is this truly
a remarkable improvement of technology as more testing is needed to make it statistically
valid.  The article has issues with singular/plural context in more places than I
care to point out.  Strange English issues as shown below should be caught by
the authors.  I thus suggest a minor re-write and an excellent English reviewer
edits the article.  I still have some reservation with stating improvements that
are not statistically significant and based on two examples so all one can say
is that it appears there is reason to continue on this research line.
Line 22 - define what you mean by GPS-tagged RS image - is it an ortho photo, what is
the accuracy of the GPS, what is the elevation model, etc. etc.
Line 29 - the success is a function of the quality of the coordinates of the
street view image - this needs to be addressed
Line 33 - define egocentric
Line 40 - what is high dimensional
Line 82 - swin has become swim - please search and be consistent with correct term
Lots of places - I expect a primary author but instead terms like SAFA, CDTE, CVFT, etc.
are used in discussing a reference, I do not understand this way to doing this.
They seem to be names of software but this needs to be explained.
Line 202 - inputes?? - maybe inputs
Line 244-245 datasets only needed once
Line 249 Fellow???  Probably Following
Line 264 same as Line 249
Line 269 same as line 249
Line 271 - how is this threshhold determined?
Line 273 reference for PyTorch
Line 279 blocks lower case
Line 287-290 - how did you determine these settings?
Line 321-322 - how can you say these are significant improvements as less than 2%?
Line 333 no italics needed
Line 333-334 Why were these values selected?
Line 342-343 line 321-333 are these really significant improvements?
Line 354-355 are top-1 and top-1% the same thing?
Line 373 - missing Fig or Table in front of 4
Table 4 wirh bitch is incorrect term
Line 386 - barely more than zero improvement
Line 388-389 - same as Line 386
Line 390-397 - same as Line 386
Line 409 - values areas makes no sense
Line 446 - values regions makes no sense
Line 458 - fearures is features
Line 479 - level is level

Comments on the Quality of English Language

Issues with plural/singular in many places, very obvious misspellings, and a thorough English review and edit is required.

Author Response

I would like to extend my heartfelt gratitude to the reviewers for their insightful comments and suggestions. Based on your feedback, I have carefully revised the manuscript. A detailed description of the modifications made can be found in the attached document. Once again, thank you for your valuable input.

Author Response File: Author Response.pdf

Back to TopTop