Next Article in Journal
Geometrical Segmentation of Multi-Shape Point Clouds Based on Adaptive Shape Prediction and Hybrid Voting RANSAC
Previous Article in Journal
ST-PN: A Spatial Transformed Prototypical Network for Few-Shot SAR Image Classification
 
 
Article
Peer-Review Record

Assessing the Impact of the Loss Function and Encoder Architecture for Fire Aerial Images Segmentation Using Deeplabv3+

Remote Sens. 2022, 14(9), 2023; https://doi.org/10.3390/rs14092023
by Houda Harkat 1,2,*, José M. P. Nascimento 1,3, Alexandre Bernardino 4 and Hasmath Farhana Thariq Ahmed 5
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2022, 14(9), 2023; https://doi.org/10.3390/rs14092023
Submission received: 13 February 2022 / Revised: 21 March 2022 / Accepted: 13 April 2022 / Published: 22 April 2022

Round 1

Reviewer 1 Report

Generally, the authors presented a processing chain (see Figure 1) for wildfire detection using existing deep learning model, i.e., Deeplabv3+. Therefore, the novelty should be highlighted.

 

  1. The title is too broad, deep learning includes many methods and models, therefore the Deeplabv3+ method should be included.
  2. The proposed method should be highlighted by pointing out the main innovations.
  3. Chapter number has many problems, especially Chapter 2.
  4. The presentation could be better, e.g., the wave lines in Figure 3 should be removed.
  5. The third chapter mainly compares five existing models, some comparative experiments with other traditional methods can be added.
  6. In Table 8, some data is missing.
  7. How about the generalization performance by increasing the ratio of training samples, e.g., using 10%, 20%, … training samples.
  8. The conclusion should analyze the limitations of the proposed method and the future research direction of the current method.
  9. Rewritten the author contributions. 
  10. A comparison of different loss function can be added.

Author Response

  • Generally, the authors presented a processing chain (see Figure 1) for wildfire detection using existing deep learning model, i.e., Deeplabv3+. Therefore, the novelty should be highlighted.

Thanks for the comment. A brief description highlighting the novelty and contribution of the paper was added in the introduction section.

The main contributions of this paper include the following:

 

-    An in-depth analysis of the impact of different loss function, with different encoder architectures over different types of aerial images is performed. Firefront_Gestosa and FLAME are two dataset of aerial images covering different scenarios. The first set contains very limited fire pixels, less than 1% over the final dataset. The second set contains a higher ratio of fire pixels in comparison to the first one but it includes some different images of the same view. Usually, the aerial datasets draw segmentation results very low in comparison with the attended performance presented in this paper.

-    Deeplabv3+ parameter fine tuning to train a model in order to efficiently segment aerial images with limited flame area. Moreover, choosing the adequate encoder architecture combined with a proper loss function will reduce the false negatives (FN) and boost the intersection over union (IoU) and BF score.

-    A private labeled set of aerial fire pictures named Firefront_Gestosa dataset has been used in the experiments. The labeling task of such aerial profiles is challenging since the part of smoke sometimes fully cover the flame part. A wrong labeled data while induce a misleading trained classifier. The firefighters are more interested in localizing the exact GPS positions of flames to promptly start intervention to limit the propagation. With huge smoke clouds, it is unbearable to visibly localize the flame positions from sol or air.

 

  • The title is too broad, deep learning includes many methods and models, therefore the Deeplabv3+ method should be included.

 

Thanks for the comment. In order to make more clear the work subject the title has been changed to ‘Assessing the Impact of the Loss Function and Encoder architecture for Fire Aerial Images Segmentation using Deeplabv3+’.

 

  • The proposed method should be highlighted by pointing out the main

A brief description highlighting the novelty and contribution of the paper was added in the introduction section, namely , the use of Deeplabv3+ to train a model to segment aerial images of flame. For this purpose, the parameters were fine tunned. Additionally, the architecture and the loss function have been chosen to enhance the results. The experiments have been conducted over to dataset, the Firefront Gestosa, which has been labelled and will be made public in the near future and the FLAME dataset. 

 

 

  • Chapter number has many problems, especially Chapter 2.

 

A deep revision of the manuscript has been conducted and the numeration of chapters had been corrected. Moreover, the duplicated title ‘Dataset’ was replaced by ‘Description’.

 

  • The presentation could be better, e.g., the wave lines in Figure 3
    should be removed.

 

We had enhanced the visibility of the figure. We understand the fact that the wave line makes the figure huge but not visible since the inside component are little. However, we could not remove the wave lines since it determines the point of features injection between the three modules.

 

  • The third chapter mainly compares five existing models, some comparative experiments with other traditional methods can be added. In Table 8, some data is missing.

 

Thanks for the comment, in table 8, some cells are empty since the corresponding papers do not use the same metrics that we opted for. Hence, we didn’t compare the results with other traditional methods since it would be an additional non-concordance point in the dataset used, which will be completely inadequate to take a global conclusion about the performance of model comparison. In order to mention this, we added in the paper a paragraph to explain this fact:

We note however, that in table. 8. some cells do not present results since the original works do not use the same metrics that have been adopted in this work. »

.

 

  • How about the generalization performance by increasing the ratio of
    training samples, e.g., using 10%, 20%, … training samples?

 

This is a very good point. For segmentation propose, the aerial images are considered as limited datasets since the area of flames is very limited within the pictures. This why we had included the Corsican data that have a frontal view of fire, hence more pixels labeled as fire. The FIREFONT_Gestosa dataset is private limited content. So, we didn’t consider to analse the performance by increasing the ratio of training samples. This work concerns  about the conception of a well-trained network to segment FIREFRONT_Gestosa similar aerial profiles. In future work, with  bigger dataset we will consider doing this. This point is included in the outlook of the work in conclusions:

 

The used aerial images are considered as limited set of data since the number of pixels labeled as fire are less than usual. Hence, we had reinforced the dataset with Corsican pictures that have a frontal view of fire. The firefront_Gestosa is private limited content. A performance analysis as functions of the training samples ratio could be conducted, however that would be more suited with a larger dataset.  So,it will be considered in future work. Besides, we will consider a bigger labeled set as well of Firefront_Gestosa pictures.

 

  • The conclusion should analyze the limitations of the proposed method and
    the future research direction of the current method.

 

This point is included in the outlook of the work in conclusions:

 

The used aerial images are considered as limited set of data since the number of pixels labeled as fire are less than usual. Hence, we had reinforced the dataset with Corsican pictures that have a frontal view of fire. The firefront_Gestosa is private limited content. A performance analysis as functions of the training samples ratio could be conducted, however that would be more suited with a larger dataset.  So,it will be considered in future work. Besides, we will consider a bigger labeled set as well of Firefront_Gestosa pictures.

 

 

  • Rewritten the author contributions. A comparison of different loss function can be added.

Yes, thank for the suggestion. This point was considered when rewritten the author contributions.

The main contributions of this paper include the following:

 

-   

An in-depth analysis of the impact of different loss function, with different encoder architectures over different types of aerial images is performed. Firefront_Gestosa and FLAME are two dataset of aerial images covering different scenarios. The first set contains very limited fire pixels, less than 1% over the final dataset. The second set contains a higher ratio of fire pixels in comparison to the first one but it includes some different images of the same view. Usually, the aerial datasets draw segmentation results very low in comparison with the attended performance presented in this paper.

-    Deeplabv3+ parameter fine tuning to train a model in order to efficiently segment aerial images with limited flame area. Moreover, choosing the adequate encoder architecture combined with a proper loss function will reduce the false negatives (FN) and boost the intersection over union (IoU) and BF score.

-    A private labeled set of aerial fire pictures named Firefront_Gestosa dataset has been used in the experiments. The labeling task of such aerial profiles is challenging since the part of smoke sometimes fully cover the flame part. A wrong labeled data while induce a misleading trained classifier. The firefighters are more interested in localizing the exact GPS positions of flames to promptly start intervention to limit the propagation. With huge smoke clouds, it is unbearable to visibly localize the flame positions from sol or air.

Author Response File: Author Response.docx

Reviewer 2 Report

The topic of the work is relevant.

The article may be published in the current version. 

Author Response

  • The topic of the work is relevant. The article may be published in the current version.

 

Thank you for your valuable comments.

Reviewer 3 Report

This paper addressed the challenge to notice the fire pixels and warn the firemen as soon as possible to handle the problem more quickly by implementing an on-site detection system that detects fire pixels in real-time in the given scenario. The main goal of this work is to create a model that can properly segment fire images captured from aerial datasets with extremely small flame regions. The subject of this paper is important for wildfire studies. The proposed method showed to improve the detection of fire pixels in real time. It was clearly described and the results are reasonable when compared with other methods.

 

Some specific comments

L.216- pourcentage  ---  percentage

L.228- Figure 3- Spacial Pyramid Pooling ----  - Spatial Pyramid Pooling

L.279- by the flowing formula: ---   by the following formula:

L.418- detecting small-scall objects  ---  detecting small-scale objects

L.429- Table 6- 1,0181      0,6178      0,8745  ------  1.0181      0.6178      0.8745 (change , (comma) by . (period) through the table.  The same in Tables 5, 6, 7.

L.454- BF score of 87,81% and 95,66%  ---  BF score of 87.81% and 95.66%

 

Author Response

  • This paper addressed the challenge to notice the fire pixels and warn the firemen as soon as possible to handle the problem more quickly by implementing an on-site detection system that detects fire pixels in real-time in the given scenario. The main goal of this work is to create a model that can properly segment fire images captured from aerial datasets with extremely small flame regions. The subject of this paper is important for wildfire studies. The proposed method showed to improve the detection of fire pixels in real-time. It was clearly described, and the results are reasonable when compared with other methods.

 

Thank you for your valuable comments.

 

  • 216-pourcentage --- percentage

 

Corrected thank you.

 

  • 228- Figure 3- Spacial Pyramid Pooling ----  - Spatial Pyramid Pooling

 

Corrected thank you.

 

  • 279- by the flowing formula: ---   by the following formula:

 

Corrected thank you.

 

 

  • 418- detecting small-scall objects  ---  detecting small-scale objects

 

Corrected thank you.

 

 

  • 429- Table 6- 1,0181      0,6178      0,8745  ------  1.0181

 

Corrected thank you.

 

  • 6178      0.8745 (change , (comma) by . (period) through the table.
    The same in Tables 5, 6, 7.

 

Corrected thank you.

 

  • 454- BF score of 87,81% and 95,66%  ---  BF score of 87.81% and 95.66%

 

Corrected thank you.

Reviewer 4 Report

In this paper, an existing deep learning architecture Deeplabv3+ using four distinct loss functions and five different models as backbones is applied for fire aerial image segmentation.

Despite the apparent efforts made by the authors, the paper turns out to be poorly organized, very confusing, and hard to read. Therefore, the necessary reorganization of the paper must be done both at the level of content and the level of presentation. Some main critical points are as follows.

I suggest describing the architecture used first by moving subsection 2.2.1 (Deeplabv3+), starting at line 217, and the next subsection 2.2.2 (Loss function) before the current section 2 starting at line123. Section 2.2.1 should also be expanded. Note the wrong numbering of the subsections.

The current section 2 should be carefully revised by explaining some key steps. For example, the way images are pre-processed is given as algorithms in their own right and is not cited in the description. In addition, there are some repetitions; for example, “MatlabImage Laber” is described both at the end of section 2.2.1 (Dataset), see 179-181, and at the beginning of subsection 2.2.1 (Data annotation technique), see lines 183-186. Finally, note that the wrong numbering of the subsections is continued.

The introduction is too long. Probably a new section titled related work should be added.

The first part of the abstract should be transported in the first part of the introduction and be more concisely substituted.

In the introduction, the utility of the paper should be highlighted since the results of this study do not seem particularly useful for future development. Probably a critical discussion indicating what the critical issues are would be more helpful than the simple listing, although accompanied by tables and graphs, given in section 3.2

Much of the text is missing period endpoints.

Author Response

  • Despite the apparent efforts made by the authors, the paper turns out to be poorly organized, very confusing, and hard to read. Therefore, the necessary reorganization of the paper must be done both at the level of content and the level of presentation. Some main critical points are as follows.

Thank you for your valuable comments. We had established some modification of the paper that you will notice that some missing points were further clarified. Moreover, the chapters numbering is also changed.

  • I suggest describing the architecture used first by moving subsection 2.2.1 (Deeplabv3+), starting at line 217, and the next subsection 2.2.2 (Loss function) before the current section 2 starting at line123. Section 2.2.1 should also be expanded. Note the wrong numbering of the subsections.

Thanks for the suggestion, the subsections numbering have been corrected.

The subsection concerning the deeplabv3+ has been further expanded. Namely, the reason why DeeplabV3+ is adopted and the explanation on atrous convolution is provided in the revised manuscript, as follows in Section 3.2.1 shown below.

DeepLabv3+ is an extension of DeepLabv3 architecture that includes an encoder and decoder structure that helps the model work more efficiently. Dilated convolution is used by the encoder module to deal with multiscale contextual information, whereas the decoder structure improves the segmentation performance by focusing on object boundaries. Thus, the  encoder-decoder architecture is adopted by Deeplabv3+ [35] in the present study. The model is divided into three blocks: encoder, effective decoder, and a spatial pyramid pooling block, as shown in Fig. 3. To create a rich feature map, the encoder uses dilated or atrous convolution at varied rates [35]. The present work applies an atrous convolution to each location on the output and filter, where the rate of convolution corresponds to how quickly the inputs are sampled. We can maintain a constant stride while increasing the field of view using atrous convolution without increasing the number of parameters or the amount of computation. Finally, a larger feature map is obtained through this process as an output, that enhances the segmentation process.

Nonetheless, the current paper adopts the imrad format, the fact of changing the disposition of methods explanation after the introduction and before the material and methods section will completely change the format of the paper.

  • The current section 2 should be carefully revised by explaining some key steps. For example, the way images are pre-processed is given as algorithms in their own right and is not cited in the description.

Thanks for the comment. The algorithm used for pre-processing images (referred as well as format adjustment in figure 1) was not detailed in the part of description since this part was a brief introduction about the initial data provided by the proper authors (format of acquisition, used camera types, local of acquisition, ….). However, we had referred that further information will be provided in the next paragraph (line 205-206):

More details on the data annotation procedure are explained briefly in the subsequent section.

Our contribution, on the pre-processing part, is explained in the data annotation part line 210-225:

Table.2 summarizes the preprocessing steps adopted in the present study. The mask variable refers to ground truth pictures. Firstly, the indexed ground truth images were normalized and then transformed to grayscale pictures.

Furthermore, an inverse colormap algorithm [16] was applied to convert the pictures again to indexed values. The algorithm quantizes the colormap into 25 distinct nuance degrees per color component. Later, the closest nuance in the quantized colormap is lo-calized for each pixel in the grayscale image.

The main objective of the transformations mentioned above (indexed to grayscale and grayscale to indexed) is to have images that deploy a fixed colormap to make the pro-cessing easier later. Then the images are binarized based on a threshold value and stored correctly.

The same preprocessing steps were applied to the FLAME dataset that was already labeled. However, before training the models, it was required to convert infrared photos of Corsican data to RGB storage format by simply duplicating the content of the red channel over the green and blue ones.

  • In addition, there are some repetitions; for example, “MatlabImage Laber” is described both at the end of section 2.2.1 (Dataset), see 179-181, and at the beginning of subsection 2.2.1 (Data annotation technique), see lines 183-186.

The mentioned repetition is removed, thank you for your remark.

  • Finally, note that the wrong numbering of the subsections is continued.

The wrong numbering of the subsections is corrected.

 

  • The introduction is too long. Probably a new section titled related work should be added.

Thanks for the suggestion, the introduction had been split to two sections: Introduction and related work.

  • The first part of the abstract should be transported in the first part of the introduction and be more concisely substituted.

The first part of abstract ‘Wildfires account for most burnt land in Portugal, which has burnt regions of more than 500 thousand hectares in the last decade.’ was moved to introduction, and it had been substituted by the following sentence:

Wildfire early detection and prevention had become a priority. The detection using Internet …. ’

  • In the introduction, the utility of the paper should be highlighted since the results of this study do not seem particularly useful for future development. Probably a critical discussion indicating what the critical issues are would be more helpful than the simple listing, although accompanied by tables and graphs, given in section 3.2

Thanks for the comment. In the introduction the utility of the paper is highlighted, please check the following test added in the paper:

‘The main contributions of this paper include the following:

 

-    An in-depth analysis of the impact of different loss function, with different encoder architectures over different types of aerial images is performed. Firefront_Gestosa and FLAME are two dataset of aerial images covering different scenarios. The first set contains very limited fire pixels, less than 1% over the final dataset. The second set contains a higher ratio of fire pixels in comparison to the first one but it includes some different images of the same view. Usually, the aerial datasets draw segmentation results very low in comparison with the attended performance presented in this paper.

-    Deeplabv3+ parameter fine tuning to train a model in order to efficiently segment aerial images with limited flame area. Moreover, choosing the adequate encoder architecture combined with a proper loss function will reduce the false negatives (FN) and boost the intersection over union (IoU) and BF score.

-    A private labeled set of aerial fire pictures named Firefront_Gestosa dataset has been used in the experiments. The labeling task of such aerial profiles is challenging since the part of smoke sometimes fully cover the flame part. A wrong labeled data while induce a misleading trained classifier. The firefighters are more interested in localizing the exact GPS positions of flames to promptly start intervention to limit the propagation. With huge smoke clouds, it is unbearable to visibly localize the flame positions from sol or air.

 

 

We believe that this study is useful for future development on the early detection of fire flame.this is further mentioned in the conclusion section.

Nevertheless, the trained model could be used for segmentation of more similar aerial pictures that the manual segmentation is challenging and time consuming since it requires an affine level of precision.

Moreover, a critical discussion of the results is presented in section 4.2.

  • Much of the text is missing period endpoints.

Thank you for your remark, the text was carefully revised in order to correct the mentioned typos.

 

Back to TopTop