Next Article in Journal
Three-Dimensional Point Cloud Semantic Segmentation for Cultural Heritage: A Comprehensive Review
Previous Article in Journal
Evaluating the Value of CrIS Shortwave-Infrared Channels in Atmospheric-Sounding Retrievals
 
 
Article
Peer-Review Record

Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction

Remote Sens. 2023, 15(3), 549; https://doi.org/10.3390/rs15030549
by Bo Zhong 1,2, Tengfei Wei 1,2,*, Xiaobo Luo 1, Bailin Du 2,3, Longfei Hu 2, Kai Ao 2, Aixia Yang 2 and Junjun Wu 2
Reviewer 1:
Reviewer 2:
Reviewer 3:
Remote Sens. 2023, 15(3), 549; https://doi.org/10.3390/rs15030549
Submission received: 2 December 2022 / Revised: 22 December 2022 / Accepted: 10 January 2023 / Published: 17 January 2023

Round 1

Reviewer 1 Report

This study looked into the application of the MSMT method for agricultural field extraction. Overall the subject of the manuscript was very interesting and timely considering overall movement toward digital agriculture. The manuscript seems to be well prepared with sound methodology and scientific result presentation. I think that this manuscript can be published after addressing some minor comments. The detailed comments are as follows;

1. There are several expressions that are colloquial rather than written language. These should be changed to written form. Some examples include line 36-37, "bunch" in line 331, "a lot" in several places, and so forth.  

2. There are also so many typos that should be corrected. Examples are "1) edge (Edge?)" in line 43, "watered" in line 59, and others in lines of 238, 337, 341, and so on. The authors should take close care about these. 

3. The terms of "verification" and "validation" are mingled in method and figure 1. This should be unified to either one for consistency. 

4. Figure 3 and 4: There is no process of patch merging in the overall architecture although three different resolution levels were used in this study. Shouldn't this process be included in the second through the forth stages? All acronyms in figure 4 should be fully explained in the caption. 

5. The performance of the MSMT was compared with the previous models in this study. More information or description about the previous models of R-CNN, HTC, Swin-T & S should be elaborated into the text. 

6. Figure 10: I can see that the newly developed model performed better than it shows in performance indexes from the visual presentations, ie, figures. One thing does not make sense to me is "Can additional segmentations be made during the merging process as presented in Figure 10 c (Comparing blue and red segments in Figure 10c, new segmented shapes including curved ones appeared in the lower left section (ie, red one, supposedly merged one)?"

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

In Figure 3, I do not understand what a "Partch Partition" means. Is this a misspelling? 

On line 363, I do not understand what a "bto" means. Is this also a misspelling?

Overall, I found the paper relatively easy to read and understand. I found the results important for actual application of machine learning to the important task of analyzing agricultural images from remote sensing. 

I found the result particularly interesting that small improvements in metrics reflected large improvements in actual usefulness by removing many errors that would be frustrating in actual application.

While the results have a narrow scope, I believe, I think the results will be of great interest to specialists in the field, and practitioners in the application area and thus worthy of publication.

Especially in the introduction, I found the writing stilted and the grammar off-key. I would encourage the authors to bring in an editor to review the writing and offer suggestions on grammar improvements.

 

The impact of the work would be greatly enhanced by making the code and models available to the community through GitHub or GitLab or another public code repository, and I strongly encourage the authors to submit their code and models to the community, for others to build upon.

Author Response

Sincerely thank you for your affirmation of our manuscript, the significance of the "field extraction" project, the model improvement and the research results. Your affirmation has given us the motivation to continue our in-depth research. We have corrected some misspelling("Partch Partition", "bto") in the manuscript in a timely manner. We are very sorry for our carelessness. Thank you very much for finding this error. As for the introduction, we have adjusted the sentence in some places and hope that this new version of the manuscript will be better.

Reviewer 3 Report

In this paper, an improved Mask2Former model is proposed combining the scale distribution of the field with the multi-scale idea. Good results was achieved. However, some details should be further discussed, especially in Methods and Results. Here are the detailed comments.

 

1.      Abstract: line 10 to 13 should be an appropriate scaling back (one sentence for background description).

2.      Key words: the first keyword should be your research field. Please add

3.      The introduction seems to be light and not rich enough. Better to supplement. Some articles about application of CNNs in Remote Sensing should be discussed in Line 63 to 76: 10.3390/rs14163892, 10.3390/rs14081877.

4.     You said: “The preprocessed images are divided into the training set and the verification set according to 6:1, and the numbers of the processed images for the training set and verification set are 2950: 601 respectively”, 6:1 and 2950:601 is not consistent. Besides, why is this ratio? And no testing set?

5.      Please describe the model details and necessity in detail. The author should reference this paper and see the Figure 6 in the paper for the description (network parameters): https://doi.org/10.1016/j.autcon.2022.104698.

6.      Figure 9 is too large, I suggest rearranging it.

7.      As for Mask R-CNN network, the mIoU should be reported instead of F1 score and AP because it is more reasonable to consider false positive and false negative as the authors described the formula. Besides, this paper should be discussed (https://doi.org/10.1016/j.autcon.2022.104689), which contained the comprehensive evaluation indicators (mIoU).

8.      The conclusion of a manuscript should be written so that readers understand what it exactly investigated rather than a simple statement of the work.

9.      What is the detection speed of the proposed model?

 

10.   Please revise language editing of the manuscript, some errors were found.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

  • The current version is excellent and recommended for publication.

Back to TopTop