Next Article in Journal
Study on Sensitivity of Observation Error Statistics of Doppler Radars to the Radar forward Operator in Convective-Scale Data Assimilation
Previous Article in Journal
Linear and Non-Linear Vegetation Trend Analysis throughout Iran Using Two Decades of MODIS NDVI Imagery
 
 
Article
Peer-Review Record

LLAM-MDCNet for Detecting Remote Sensing Images of Dead Tree Clusters

Remote Sens. 2022, 14(15), 3684; https://doi.org/10.3390/rs14153684
by Zongchen Li 1,†, Ruoli Yang 1,†, Weiwei Cai 2, Yongfei Xue 1,*, Yaowen Hu 1 and Liujun Li 3
Reviewer 1:
Reviewer 2: Anonymous
Remote Sens. 2022, 14(15), 3684; https://doi.org/10.3390/rs14153684
Submission received: 4 July 2022 / Revised: 28 July 2022 / Accepted: 29 July 2022 / Published: 1 August 2022

Round 1

Reviewer 1 Report

Review

Remote Sensing: 1825552

Manuscript Title: LLAM-MDCNet for Detecting Remote Sensing Images of Dead Tree Clusters

Article

Authors: Zongchen Li, Ruoli Yang, Weiwei Cai, Yongfei Xue, Yaowen Hu, Liujun Li

Suggested revision: Minor

Comments for Authors:

The present manuscript presents an interesting new object detection method called Longitude Latitude cross Attention multi-path dense composite network (LLAM-MDCNet) for detecting dead tree clusters in UAV remote sensing images. The multi-directional LLAM improves the representation of high-level semantic feature information, enhancing the extraction capability for rich-information regions. The topic oft he manuscript suits the scientific objectives and is of high interest fort he readers oft he journal.

Please refer to the following suggestions and comments:

Minor comments:

Line 23: rich-information regions

Lines 76-79: not the same text font and size

Line 146: „Considering the above problems, in this paper we present a new …“

Line 196: „To address the above problems, in this paper we propose the Multipath Dense …“

Line 198: the overall structure shown in Figure 10? Not Figure 1?

Line 216: Please reformulate sentence

Line 218: „can significantly characterize the contrast oft he color features of healthy trees“

Line 226: Sentence beginning with „And“, plase reformulate

Line 232: „the redundand detection frames are removed by using the non-maximal …“

Same at line 181

Line 283-285: different text font and size

Line 298: is shown in equation (4)

Line 404: we used DTI Mavic 3

Figure 7 caption: Example images from the aerial dead tree clusters dataset: (a) images without dead and old trees: (b) images including dead and old trees.

Line 420: repeated word „paper“, plase reformulate: „all the performed experiments are conducted …“

Line 525: Please check reference „yolov4“

Figure 8 Caption: (b) LLAM MDCN; (c) DenseNet (Capital letters)

Line 618: „The reason lies in ..“

Figure 10: Please consider only (a) and (b) in the figure and the descriptions included in caption

Line 660: „In future ..“

Lines 663-665: please reformulate sentence

Author Response

Response Letter

Dear Editor, Dear reviewers

Thank you for your letter dated July 28. We were pleased to know that our work was rated as potentially acceptable for publication in Journal, subject to adequate revision. We thank the reviewers for the time and effort that they have put into reviewing the previous version of the manuscript. Their suggestions have enabled us to improve our work. Based on the instructions provided in your letter, we uploaded the file of the revised manuscript. Accordingly, we have uploaded a copy of the original manuscript with all the changes highlighted by using the track changes mode in MS Word. Appended to this letter is our point-by-point response to the comments raised by the reviewers. The comments are reproduced and our responses are given directly afterward in a different color (bule). We would like also to thank you for allowing us to resubmit a revised copy of the manuscript.

 

Comments from the editors and reviewers:

-Reviewer 1

Reviewer#1, Comment1: Line 23: rich-information regions.

Author Response: Thank you very much for your question. According to your guidance, we have made corresponding modifications. The revised sentence is as follows:

The network's multipath structure can substantially increase the extraction of underlying and semantic features to enhance its extraction capability for rich-information regions.

 

Reviewer#1, Comment2:Lines 76-79: not the same text font and size.

Author Response: Thank you very much for your question. According to your tips, we have unified the fonts in the corresponding positions.

 

Reviewer#1, Comment3:Line 146: “Considering the above problems, in this paper we present a new …”

Author Response: Thank you very much for your question. According to your tips, we have finished modifying the sentence to make our expression clearer. The revised sentence is as follows:

Considering the above problems, in this paper we present a new Longitude latitude cross attention-multipath dense composite network (LLAM-MDCNet) in this paper to overcome the difficulties of low detection accuracy due to the mixed distribution of adjacent different classes, redundant feature interference, and high differences in object scales. Our contributions are summarized as follows:

 

Reviewer#1, Comment4:Line 196: “To address the above problems, in this paper we propose the Multipath Dense”

Author Response: Thank you very much for your question. According to your prompt, we have made corresponding modifications. The revised sentence is as follows:

To address the above problems, in this paper we propose the Multipath Dense Network (MDCN) is proposed in this paper to extract remote sensing image features, and the overall structure is shown in Figure 4 .

 

Reviewer#1, Comment5:Line 198: the overall structure shown in Figure 10? Not Figure 1?

Author Response: Thank you very much for your question. According to your prompt, we have corrected the referenced drawing annotation number.

 

Reviewer#1, Comment6:Line 216: Please reformulate sentence.

Author Response: Thank you very much for your question. We have rewritten the above sentence according to your request. The revised sentence is as follows:

Further, low-level texture features and high-level semantic information are better combined using AugFPN for up-sampling to fuse their features.

 

Reviewer#1, Comment7:Line 218: “can significantly characterize the contrast oft he color features of healthy trees”

Author Response: Thank you very much for your question. According to your suggestion, we have made corresponding modifications. The revised sentence is as follows:

The low-level texture features can significantly characterize the contrast of the color features of healthy trees, and the high-level semantic features can express the overall trend of irreg-ular dead tree clusters.

 

Reviewer#1, Comment8:Line 226: Sentence beginning with “And”, plase reformulate.

Author Response: Thank you very much for your question. According to your tips, we have rewritten the above sentence. The rewritten sentence is as follows:

Simultaneously, the fully connected layer is decomposed into two sub-fully connected layers using singular value decomposition to speed up the computation of the fully connected layer and significantly reduce the computations.

 

Reviewer#1, Comment9: Line 232: “the redundand detection frames are removed by using the non-maximal” same at line 181.

Author Response:

Thank you very much for your question. According to your tips, we have rewritten the above sentence. The rewritten sentence is as follows:

And lastly the redundand detection frames are removed by using the non-maximal, so as to obtain the final object detection results of forest remote sensing images for dead and old tree clusters.

 

Reviewer#1, Comment10:Line 283-285: Different text font and size.

Author Response: Thank you very much for your question. According to your prompt, we have changed the different parts of the font to the same font.

 

Reviewer#1, Comment11:Line 298: Is shown in equation (4).

Author Response: Thank you very much for your question. According to your prompt, we have corrected the serial number of the referenced formula.

 

Reviewer#1, Comment12:Line 404: We used DTI Mavic 3.

Author Response: Thank you very much for your question. According to your prompt, we have rewritten the sentence. The rewritten sentence is as follows:

First, we used DTI Mavic 3 (1080p, 60FPS) to capture 9766 images in different states at an altitude of about 90 m.

 

Reviewer#1, Comment13: Figure 7 caption: Example images from the aerial dead tree clusters dataset: (a) images without dead and old trees: (b) images including dead and old trees.

Author Response: Thank you very much for your question. According to your prompt, we have made corresponding modifications. The revised drawing notes are as follows:

Figure 3 Example images from the aerial dead tree clusters dataset: (a) images without dead and old trees: (b) images including dead and old trees.

 

Reviewer#1, Comment14: Line 420: repeated word “paper”, plase reformulate: “all the performed experiments are conducted”

Author Response: Thank you very much for your question. According to your prompt, we have rewritten the sentence. The rewritten sentence is as follows:

To ensure that the experiments in this paper are valid and fair, all the experiments are conducted in the same environment and use the exact same hyperparameters in the model.

 

Reviewer#1, Comment15: Line 525: Please check reference “yolov4”.

Author Response: Thank you very much for your question. According to your tips, we have rewritten the incorrect sentence format.

 

Reviewer#1, Comment16:Figure 8 Caption: (b) LLAM MDCN; (c) DenseNet (Capital letters).

Author Response: Thank you very much for your question. According to your suggestion, we have made corresponding modifications to the drawing annotation.

Figure 8 Class activation maps(CAM): (a) Original picture (b) LLAM-MDCNet (c) DenseNet

Reviewer#1, Comment17:Line 618: “The reason lies in ..”

Author Response: Thank you very much for your question. According to your prompt, we have made corresponding modifications. The revised sentence is as follows:

The reason lies in that while MDCN substantially increases the semantic features that can be extracted by the network, the two parallel feature extraction networks are able to complement and correct mutually during the backpropagation.

 

Reviewer#1, Comment18:Figure 10: Please consider only (a) and (b) in the figure and the descriptions included in caption.

Author Response: Thank you very much for your question. According to your suggestion, we have made corresponding modifications to the picture. The revised drawing notes are as follows:

Figure 10 (a) The image size is too large and there is a lot of noise. (b) The viewing angle is completely parallel to the ground

 

Reviewer#1, Comment19:Line 660: “In future ..”

Author Response: Thank you very much for your question. According to your prompt, we have corrected the above sentence. The revised sentence is as follows:

In future, we make the collected dataset publicly available for other researchers who study forest fire prevention, which is also a major contribution of this paper. In future, we will focus on addressing the limitations of our work in this paper.

 

Reviewer#1, Comment20:Lines 663-665: Please reformulate sentence.

Author Response: Thank you very much for your question. According to your prompt, we have made corresponding modifications. The revised sentence is as follows:

An effective image pre-processing algorithm is needed to improve the performance of detecting blurred and noisy images. Simultaneously, more datasets are required to improve the algorithm’s accuracy and performance so that it can play a more important role in forest fire prevention and control as well as ecosystem protection.

 

Reviewer 2 Report

   The manuscript by Li et al. is devoted to development of methods of revealing dead trees on images that can be important for protection of forests against fire. Results of manuscript seems to be interesting; I have only minor remarks and questions.

   1. It is interesting: Can hyperspectral or multispectral imaging be used for revealing dead trees? I suppose using the hyperspectral or multispectral imaging for this problem should be discussed.

   2. Section 3.1 seems to be more suitable in “2. Materials and methods”. It should be corrected.

   3. Data acquisition should be described in more detail.

Author Response

Response Letter

Dear Editor, Dear reviewers

Thank you for your letter dated July 7. We were pleased to know that our work was rated as potentially acceptable for publication in Journal, subject to adequate revision. We thank the reviewers for the time and effort that they have put into reviewing the previous version of the manuscript. Their suggestions have enabled us to improve our work. Based on the instructions provided in your letter, we uploaded the file of the revised manuscript. Accordingly, we have uploaded a copy of the original manuscript with all the changes highlighted by using the track changes mode in MS Word. Appended to this letter is our point-by-point response to the comments raised by the reviewers. The comments are reproduced and our responses are given directly afterward in a different color (bule). We would like also to thank you for allowing us to resubmit a revised copy of the manuscript.

 

Comments from the editors and reviewers:

 

-Reviewer 2

Reviewer#2, Comment1: It is interesting: Can hyperspectral or multispectral imaging be used for revealing dead trees? I suppose using the hyperspectral or multispectral imaging for this problem should be discussed.

Author Responser:

Thank you for your comments! You made us think about the significance of hyperspectral images for detecting dead wood clusters. We put some ideas in the "conclusion" section. The added content has been highlighted in the manuscript, as follows:

Hyperspectral imaging is a fine-grained technique capable of capturing and analyzing point-by-point spectra over a spatial area. Because unique spectral "features" can be detected at different spatial locations of individual objects, it can detect visually indistinguishable substances. Classification and detection of ground objects using hyperspectral or multispectral images is a typical application of computer vision technology in remote sensing. However, we believe that in this paper hyperspectral images are not applicable to our method for the following reasons: (1) Hyperspectral images typically have a large number of channels (much more than the three channels of RGB images), where only partial information is useful. As a consequence, the information of these channels must be filtered before input to the network. (2) Since hyperspectral images contain large and dense semantic information, deep learning methods are employed with a shallow structure. This results in a smaller receptive field, which is inconsistent with our original idea of using global contextual information. In conclusion, the use of hyperspectral images for dead tree cluster detection is a good direction because hyperspectral images have richer details than RGB images. Until then, a proven network is still required in machine learning to harness it.

Reviewer#2, Comment2: “Section 3.1 seems to be more suitable in “2. Materials and methods”. It should be corrected.”

Author Responser: Thank you for your suggestion. Your proposed article structure has been adopted.

Reviewer#2, Comment3: Data acquisition should be described in more detail.

Author Responser:

Thank you for your suggestion. We added the location of the collected data and the reasons for selecting these data, and introduced the conversion of data format in detail. The new contents are as follows:

Further, in order to cover as many forests states as possible, we collected forest stand types including coniferous, deciduous broad-leaved, and evergreen broad-leaved for-ests from the Xiaoxing'an Ling in northeastern China, the Qin Ling in central China, the Zepu jinhuyang National Forest Park in northwestern China, and the Hengduan Mountain Range in southwestern China, mainly using climatic zones as di-visions. These data are first cropped to an appropriate size and then precisely labeled by experts in the field of forestry. Then these images are converted into VOC format (the same as our open-source format for more convenient comparison with other tar-get detection networks.) Before input to LLAM-DRNet, our VOC format data is con-verted into YOLO format for easier reading.

Author Response File: Author Response.doc

Back to TopTop