Next Article in Journal
Comparative Analysis of the H2PT Ionosphere Model
Next Article in Special Issue
High-Resolution Network with Transformer Embedding Parallel Detection for Small Object Detection in Optical Remote Sensing Images
Previous Article in Journal
Editorial: Special Issue on Geographical Analysis and Modeling of Urban Heat Island Formation
 
 
Article
Peer-Review Record

RTV-SIFT: Harnessing Structure Information for Robust Optical and SAR Image Registration

Remote Sens. 2023, 15(18), 4476; https://doi.org/10.3390/rs15184476
by Siqi Pang 1, Junyao Ge 1, Lei Hu 1, Kaitai Guo 1, Yang Zheng 1, Changli Zheng 2, Wei Zhang 2 and Jimin Liang 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2023, 15(18), 4476; https://doi.org/10.3390/rs15184476
Submission received: 30 July 2023 / Revised: 3 September 2023 / Accepted: 8 September 2023 / Published: 12 September 2023

Round 1

Reviewer 1 Report

In this study, authors have proposed a novel optical and SAR image registration method based on Relative Total Variation (RTV) and Scale-Invariant Feature Transform (SIFT), named SIFT.

Comments/Suggestions

1.      It is suggested to add the details of the datasets used in this study.

2.      Some of the recent relevant feature-based studies were not considered in the literature of the manuscript. A few of them are:

a.       Forero, M. G., Mambuscay, C. L., Monroy, M. F., Miranda, S. L., Méndez, D., Valencia, M. O., & Gomez Selvaraj, M. (2021). Comparative Analysis of Detectors and Feature Descriptors for Multispectral Image Matching in Rice Crops. Plants, 10(9), 1791. https://doi.org/10.3390/plants10091791

b.      Sharma, S. K., Jain, K., & Shukla, A. K. (2023). A Comparative Analysis of Feature Detectors and Descriptors for Image Stitching. Applied Sciences, 13(10), 6015. https://doi.org/10.3390/app13106015

3.      Page number 5, section 2.2, Harris operators were used for the detection of feature points. There are other feature point detection algorithms, why they were not considered. A few pieces of literature about the feature detector and descriptors are (literature mentioned in point number 2 can be referred).

4.      Page number 7, section 3, it is suggested to make a table of the experimental environment instead of text.

5.      Section 3.1, CMN, CMR; there are already technical terms used in the literature for these metrics which can be used. The literature mentioned in point number 2 can be referred.

6.      Section 3.1, Why not execution time considered for evaluation?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This manuscript presents a novel registration method for optical and SAR images. The proposed method, called RTV-SIFT, utilizes the proposed RTV-Harris feature point detector to extract feature points. Then, constructed a feature point descriptor on EPCE. Finally, the POED between feature points is utilized to achieve fine feature point matching. The experiments show that the approach can achieve great registration performance for optical and SAR data. Nevertheless, there are several issues that need to be addressed:

1. All capitalized phrases should be given their full names for the first occurrence, please check the entire text.

2. The layout of the figures requires some adjustments. For example, the placement of Figure 2 confused me, interrupting the continuity of the narrative of the contribution to this paper, and Figure 2 is referenced in Section 2, where it might have been more appropriately placed. The placement of Figure 3 also interrupts the continuity of the content, etc.

3. The use of t and i as superscripts for WTV and WIV in Equation (1) and (2) may cause ambiguity, and it is recommended that WTV and WIV be used directly.

4. In Page 6 line 14, The number 136 in “136-dimensional” should be explained.

5. H1 in Equation (10) needs to be explained again in the Equation description section. As well as how POED bridges between coarse and fine alignment needs to be explained.

6. The experimental data used in the paper are from so-called highly structured regions, and it might be more convincing to add some experimental results in weakly structured regions.

7. The repeatability rate and Scat lack of formula. The 11 pairs of optical-SAR images used in the experiment are suggested to be fully displayed.

8. As a multimodal image registration method, it is not enough to review and compare the improved methods based on SIFT. It is recommended to add comparative experiments with mainstream multimodal image registration methods.

 

English writing needs further improvement

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper presents RTV-SIFT, a new feature point extraction approach suitable for optical and SAR image registration. In the step of feature point detection, this approach first smooths the input images with an iterative RTV smoothing method, and then detects feature points with the multi-scale Harris algorithm. In the step of feature point description, the approach extracts the enhanced phase consistent edge (EPCE) map and computes the histogram of gradient orientation as the descriptor. Finally, the registration result is obtained with a coarse-to-fine matching strategy.

  The designs of the method are consistent with the motivations. However, the experiment is insufficient to verify the effectiveness of the proposed method.

 1. The scope of the testing dataset is too limited. Only 8/11 pairs of images are used to evaluate the performance of the feature point detection/description. Note that the public OS dataset (the literature [34] in this paper) contains thousands of aligned pairs. All these pairs can be used to evaluate both the detection and description accuracy. Some data augmentation can be performed on the dataset if the authors would like to evaluate the registration performance under scale or rotation changes.

 2. The comparison methods cannot represent the state-of-the-art of the existing hand-crafted Optical-SAR registration approaches. As introduced in the sections 3.4 and 3.6, the newest comparison methods (OS-SIFT and PCG-SIFT) were published in 2018. Therefore, some recent approaches should be included in the comparison experiment. Note that there are some recent hand-crafted Optical-SAR registration approaches with public codes, for example:

(1) Li J, Xu W, Shi P, et al. LNIFT: Locally normalized image for rotation invariant multimodal feature matching[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-14.

(2) Ye Y, Zhu B, Tang T, et al. A robust multimodal remote sensing image registration method and system using steerable filters with first-and second-order gradients[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 188: 331-350.

 3. The discussion about the learning-based methods is inadequate. All the related statement is in line 30-31, that is, “these methods are not widely applicable due to the difficulty in acquiring sufficient multimodal remote sensing images with ground truth for training and testing.” However, there are already some public datasets (like the literature [34] in this paper) suitable for training learning-based models. Therefore, the authors should discuss “why the proposed method is still valuable even though the learning-based models can be constructed based on the existing datasets.” Some supporting literature or experimental results should be necessary.

 4. In the section 3.3, the number of layers N is selected according to the accuracy of the final test images. This setting may be unfair because the hyperparameters of the comparison methods are not tuned on the test images. More evaluation results on other datasets (as discussed in Question 1) can avoid such unfairness.

 5. The time consumption of the proposed approach and the comparison methods should be given. It will be more helpful if the paper can report the time for every step of the proposed approach.

 6. According to the statement in line 165, the symbol “&” in Equation (6) denotes the pixel-wise “AND” operation. However, the edge maps M and H contain floating point numbers. What are the results of the logical operation “AND” when performing it on the floating-point numbers?

 7. In Table 2, different methods may achieve the same best results on some criteria. For example, several methods obtain 100% accuracy on CMR. All the best results should be represented in a bold style.

The paper is organized clearly with a good language representation.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Not satisfied with the response to the comment #5. It was not about the citation of the article. It was about the literature terms which are already defined in the literature so why use the same thing with a different name?

For the rest of the comments, satisfied with the authors' response.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

All my concerns are well responded

Author Response

Thank you very much for your comments, it helps us a great deal to improve the article!

Back to TopTop