Next Article in Journal
Effects of Land Use-Land Cover Thematic Resolution on Environmental Evaluations
Next Article in Special Issue
Spatiotemporal Modeling of Coniferous Forests Dynamics along the Southern Edge of Their Range in the Central Russian Plain
Previous Article in Journal
Enhancement of Cloud-to-Ground Lightning Activity Caused by the Urban Effect: A Case Study in the Beijing Metropolitan Area
Previous Article in Special Issue
Surface Tradeoffs and Elevational Shifts at the Largest Italian Glacier: A Thirty-Years Time Series of Remotely-Sensed Images
 
 
Article
Peer-Review Record

Earth Observation and Biodiversity Big Data for Forest Habitat Types Classification and Mapping

Remote Sens. 2021, 13(7), 1231; https://doi.org/10.3390/rs13071231
by Emiliano Agrillo 1,†, Federico Filipponi 1,†, Alice Pezzarossa 1,*, Laura Casella 1, Daniela Smiraglia 1, Arianna Orasi 1, Fabio Attorre 2 and Andrea Taramelli 1,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2021, 13(7), 1231; https://doi.org/10.3390/rs13071231
Submission received: 26 January 2021 / Revised: 1 March 2021 / Accepted: 22 March 2021 / Published: 24 March 2021
(This article belongs to the Special Issue New Insights into Ecosystem Monitoring Using Geospatial Techniques)

Round 1

Reviewer 1 Report

In this study, the authors utilized big data to develop a national map of detailed forested habitats for Italy. This study included many predictor variables ranging from geographic information, geomorphic information, climatic, soil properties, leaf area indices, phenology, spectral information, and vegetation indices. Redundant predictor variables were removed, and random forest models were optimized for various response variable levels. This is an impressive study that will interest many readers. Big data is a new paradigm in remote sensing/geographic information science and this paper will add to the discussion on this topic. The work is particularly relevant as developing these types of datasets using big data will lead to broad and sweeping impacts for preserving, restoring, and conserving important habitats at large scales. While this is an impressive study, the writing could be polished some. I also found the method section could be improved by moving some text from the results section. Finally, I strongly suggest incorporating bias and standard error into the accuracy assessment. I have listed other specific comments below.

Comments:

  1. I think the title is a bit misleading. It is overly broad with regard to habitats, while the paper is focused just on a subset of habitats (i.e., forests).
  2. For Figure1, I recommend showing just the regions important to this paper (the ones in Italy). With the overview map could just have country boundaries.
  3. Lines 212-213- More information could be provided on the development of slope, “northerness,” and “easterness.” What was the neighborhood size for slope? A 9-by-9-pixel neighborhood? How was “northerness” and “easterness” calculated?
  4. Line 225-226- What were the source of soil data? What parameters were used?
  5. I also think it may be helpful to move Figure 2 to the beginning of the methods section. There are inconsistencies in use capitalization throughout this figure (e.g., “Multispectral Data” compared to “Time series stacking”). I also recommend spelling out all acronyms in the caption or in the figure (if space allows).
  6. Lines 303-307-The authors state how random forests is popular and performs well for forest classifications. Random forest may also work well with the big data approach since it produces trees by sampling a subset of predicts with replacement. It may be worth adding a statement along those lines. However, I wonder why random forest was the only algorithm tested? Why were other popular algorithms like support vector machine not explored? I recommend expanding on this. Should other algorithms be explored in the future? This was mentioned on line 587-588. What are recommended algorithms to explore next?
  7. Line 325- Kappa is no longer recommended for accuracy assessments (Stehman and Foody, 2019; https://doi.org/10.1016/j.rse.2019.05.018). I recommend just using overall accuracy. I also recommend incorporating bias and standard error into the accuracy assessment. This will make this approach more valuable for tracking change in areal estimation for repeat assessments. The methods for this are also included in the Stehman and Foody paper.
  8. Table 3 or some form of Table 3 would be helpful in the methods section. These are the response variables and it would be good to expose readers to these prior to the results section. I recommend moving the entire section 3.1 to the methods section.
  9. Lines 360-365- These sentences seem like they belong in the methods section. The reference to Table A1 would certainly benefit readers in the methods section.
  10. Check errors on lines 434 and 436.
  11. Figure 5- I recommend referring readers back to the table with habitat names.
  12. Table A1- Formatting in the description could be enhanced (i.e., spacing)
  13. Table A2- This table is very informative. I suggest moving it into the result section of the paper.
  14. What predictors were excluded from analysis due to multi-collinearity? Can predictors that were removed be added to Table A1 using a footnote?

Author Response

Dear Reviewer,

Thank you for taking the time to read the manuscript and report your comments and considerations. Please find below the responses to your comments, point by point.

Furthermore, we have uploaded a version where changes resulting from each of the reviewer’s comments are added with a comment with the reference. Hopefully, this helps in the review process. Reviews regarding your comments are kept with the code REV 1 in the comments.

Finally, as you request, we made a review of the English language and style. Therefore, we invite you to re-read the entire document again.

 

  • Comment 1. I think the title is a bit misleading. It is overly broad with regard to habitats, while the paper is focused just on a subset of habitats (i.e., forests).

Reply: Thank you for this suggestion, the title now reads: “Earth Observation and Biodiversity Big Data for Forest Habitat Classification and Mapping”

 

  • Comment 2. For Figure1, I recommend showing just the regions important to this paper (the ones in Italy). With the overview map could just have country boundaries.

Reply: We modified the Figure 1 as you suggest and you will find it in the uploaded version, but now in the new version is the Figure number.

 

  • Comment 3. Lines 212-213- More information could be provided on the development of slope, “northerness,” and “easterness.” What was the neighborhood size for slope? A 9-by-9-pixel neighborhood? How was “northerness” and “easterness” calculated?

Reply: Thank you for this comment. At line 227 you can find the following modified sentence: “Digital elevation model [72] at 20 m spatial resolution was used to calculate elevation, slope and aspect using a 3x3 pixel neighbourhood. Northerness and easterness components were later calculated form aspect, using sine or cosine transformation, in order to obtain continuous variables.”

 

  • Comment 4. Line 225-226- What were the source of soil data? What parameters were used?

Reply: At line 242 you can find the following modified sentence: “Datasets representing soil properties at 250 m, specifically the soil organic carbon stock, pH and absolute depth to bedrock, were collected from the SoilGrids (see Table A1) repository and interpolated to reference grid resolution using a Bartlett filter with 750 m radius.”

 

  • Comment 5. I also think it may be helpful to move Figure 2 to the beginning of the methods section. There are inconsistencies in use capitalization throughout this figure (e.g., “Multispectral Data” compared to “Time series stacking”). I also recommend spelling out all acronyms in the caption or in the figure (if space allows).

Reply: We modified the Figure 2, now Figure 1 in the uploaded version, both in the capitalization and in the spelling of acronyms. We have also moved the paragraph to the beginning of Chapter 2 as you suggested. See line 138.

 

 

  • Comment 6. Lines 303-307-The authors state how random forests is popular and performs well for forest classifications. Random forest may also work well with the big data approach since it produces trees by sampling a subset of predicts with replacement. It may be worth adding a statement along those lines. However, I wonder why random forest was the only algorithm tested? Why were other popular algorithms like support vector machine not explored? I recommend expanding on this. Should other algorithms be explored in the future? This was mentioned on line 587-588. What are recommended algorithms to explore next?

Reply: Thank you for this comment. About the application of RF at big data context, we add the following sentence and respective reference at line 330: “Since Rf produced several independent trees by intensive resampling of different subset of predictors, it is natural to consider this adapted bootstrapping scheme for big data context [Genuer, R.; Poggi, J. M.; Tuleau-Malot, C.; Villa-Vialaneix, N. Random forests for big data. Big Data Res. 2017, 9, 28-46. Doi: 10.1016/j.bdr.2017.07.003.]”.

Moreover, considering that our study was not a modelling comparison study, but only a demonstrative one, we analysed the suitable bibliography and chose the most performing algorithm for our case study. Anyway, thank of your comment we add the following sentence and respective reference at line 332: “In addition, in some studies of comparison between different classification algorithms, RF was found to be the most performing [30,50,94,96] or at least comparable with other. Hence, considering that the proposed study was not a comparison of classification methods, but a demonstrative procedure, the final choice was to apply the RF classifier.”

Otherwise, we thank for your recommendation and modify the text on line 625 recommended algorithms to explore in the future and adding reference.

 

  • Comment 7. Line 325- Kappa is no longer recommended for accuracy assessments (Stehman and Foody, 2019; https://doi.org/10.1016/j.rse.2019.05.018). I recommend just using overall accuracy. I also recommend incorporating bias and standard error into the accuracy assessment. This will make this approach more valuable for tracking change in areal estimation for repeat assessments. The methods for this are also included in the Stehman and Foody paper.

Reply: Thank you for these suggestions. We definitely improved our accuracy assessment thanks to your suggestion. Assuming a stratified random sampling of the reference data with equal allocation we re-estimated the error matrix, including the percent of area and incorporating the Standard error starting from the formulas provided by Stehman and Foody, 2019.

Consequently, in the 2.3.2 section of Methods chapter we added the following sentence on line xxx: “Each RF model was trained using a stratified random sample of 70% of data, and model performance was tested using the remaining 30% (internal evaluation). The accuracy assessment of the procedure has depended on a confusion matrix (error matrix) and the following accuracy measures derived from that:  the Overall accuracy, the User’s and Producer’s accuracy and their Standard error. The formulas to obtain the accuracy metrics and the standard error were in Stehamn & Foody, 2019 [94]”. In the Results chapter we substitute the error matrix with the standard error for each classification.

Moreover, about the Cohen's kappa coefficient we know that in contrast to calculating overall accuracy, kappa takes imbalance in class distribution into account and therefore, supply a more objective description of the model performance. Comparison of overall accuracy and Cohen's kappa coefficient demonstrated that kappa is more suitable as evaluation metric in classification applications with imbalanced datasets (see Fatourechi, M., Ward, RK, Mason, SG, Huggins, J., Schlögl, A., & Birch , GE, 2008. Comparison of evaluation metrics in classification applications with imbalanced datasets. In 2008 seventh international conference on machine learning and applications (pp. 777-782). IEEE.). Since the feature of response variable dataset have used has imbalance in class distribution (i.e. high ecological variability in a wide geographic area), we have decided to keep both accuracy metrics for a more comprehensive description of the model performance.

  • Comment 8. Table 3 or some form of Table 3 would be helpful in the methods section. These are the response variables and it would be good to expose readers to these prior to the results section. I recommend moving the entire section 3.1 to the methods section.

Reply: We prefer do not move the 3.1 section to the methods because it represent a result due to the methodology exposed in the 2.3.1 section. From the native vegetation archive to our response variable data, we have performed specific action (e.g. labelling) on all the vegetation plot considered. Otherwise we added in the methods section, a reference on the Table 3, in order to show previously the results obtained. See line 194.

 

  • Comment 9. Lines 360-365- These sentences seem like they belong in the methods section. The reference to Table A1 would certainly benefit readers in the methods section.

Reply: We modify the sentences both in Methods and in Results section of the uploaded version. Moreover, the reference to the Table A1 was yet in the text. We moved in the section 3.2.1 the Table A2, now Table 4, to better explain the results of variable selection procedure.

 

  • Comment 10. Check errors on lines 434 and 436.

Reply: We reread the results and modify “slope” with “latitude” at line 459.

 

  • Comment 11. Figure 5- I recommend referring readers back to the table with habitat names.

Reply: We add the reference of Table 3 to the caption of both Figure 4 and 5.

 

  • Comment 12. Table A1- Formatting in the description could be enhanced (i.e., spacing)

Reply: Thank you for this suggestion, we did it.

 

  • Comment 13. Table A2- This table is very informative. I suggest moving it into the result section of the paper.

Reply: Thank you for this suggestion, see Reply to comment 9.

 

  • Comment 14. What predictors were excluded from analysis due to multi-collinearity? Can predictors that were removed be added to Table A1 using a footnote?

Reply: The variable excluded form the multi-collinearity was the variable not include in the table 4. They are included in Table A1 but not in Table 4.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript presents an habitat classification methodology based on a random forest classifier using features driven from remote sensed data obtained by the Sentinel 2. The document is consice and well organized.

Please find some minor comments and suggestions in the attached PDF.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you for taking the time to read the manuscript and report your comments and considerations.

Please find below the responses to your comments, point by point. I extracted them from the pdf and added them here to facilitate our communication.

Furthermore, we have uploaded a version where changes resulting from each of the reviewer’s comments are added with a comment with the reference. Hopefully, this helps in the review process.

Reviews regarding your comments are kept with the code REV 2 in the comments.

Finally, as you request, we made a minor review of the English language and style. Therefore, we invite you to re-read the entire document again.

 

  • Comment: Page 1/ Abstract: Indexes Replace by indices in the whole document, there are more occurences.

Reply: We modified the terms indexes in indices as you suggest and accepted all the suggestion in the Abstract and in the whole paper. Please see the uploaded version.

 

  • Comment: Page 1/L 119: Replace paper with article.

Reply: Thank you for this recommendation. We replace paper with “article”.  See now line 120.

 

  • Comment: Page 4/L 165: Tried to open this URL and it presents a message with "This page does not seem to exist…". Please fix this. Moreover, try to avoid the usage of a URL, use a reference instead.

Reply: Thank you for your comment, but the URL in our word version works. Otherwise, we change the URL with the reference in the caption as you suggested. See now line 177.

 

  • Comment: Page 5/L 173: This URL gives presents a website with no content, I could only visualize its menu. Please consider to convert into references with the access date. Proceed this way in similar cases presented along the manuscript.

Reply: Thank you for your recommendation, also for this issue we change the URL with the reference in the caption as you suggested. See now line 187.

 

  • Comment: Page 6/L 235-237: Were all bands converted from top of atmosphere reflectance to bottom of atmosphere reflectance?

Reply: Thank you for this comment. The input Sentinel-2 data used for the analysis are Level 2A (L2A) spectral bands, that represent Bottom Of Atmosphere reflectance. We have added “used for the analysis” in the text at line 252, in order to clarify it.

 

  • Comment: Page 6/L 255-256: Add a reference

Reply: The following reference has been added in the manuscript at line 276: Croft, H., & Chen, J. M. (2017). Leaf pigment content. Reference Module in Earth Systems and Environ-mental Sciences. Oxford: Elsevier Inc, 1-22.

 

  • Comment: Page 6/L 258: Add each band name as table footnote or directly to the legend.

Reply: Instead of modifying the caption or the whole table, we decided to add in Appendix A the Table A2 with the band name and also all the spectral characteristic of Sentinel-2 MSI sensor for each band. See line 673.

 

  • Comment: Page 7/L 282-283: Please specify what was the role of each software used to build each part of the dataset.

Reply: The sentence has been modified with “All the procedures used to obtain the dataset of predictors were performed using GRASS GIS for the processing of environmental predictors, SNAP for the Sentinel-2 MSI data processing, and R software for the processing of temporal predictors.” See line 302.

 

  • Comment: Page 8/L 323-324: Address the usage of User's and Producer's accuracy, Include those also as accuracy metrics in Figure 2.

Reply: Thank you for this suggestion. The sentence has been modified with “Each RF model was trained using a stratified random sample of 70% of data, and model performance was tested using the remaining 30% (internal evaluation). The accuracy assessment of the procedure has depended on a confusion matrix (error matrix) and the following accuracy measures derived from that:  the Overall accuracy, the User’s and Producer’s accuracy and their Standard error. The formulas to obtain the accuracy metrics and the Standard error were in Stehamn & Foody, 2019. The computing of the Cohen’s kappa was also made.”. See line 352. We also added those metrics in Figure 2, now Figure 1.

Author Response File: Author Response.docx

Reviewer 3 Report

The study proposes a method for mapping land cover of the country of Italy, based in combining environmental variables, Earth Observing (satellite) data and high resolution maps. By using a supervised machine learning algorithm, the study classified 24 forest habitats, obtaining an overall accuracy of 76,.1%

The study claims to propose a novel method for classifying vegetation types, based on using a set of 163 variables (13 environmental, 124 spectral and 26 temporal) of which 69 variables were retained. Besides the high number of variables used the approach applied seems a straightforward land cover classification exercise. There are minor details such as the methods mentioning four groups of variables when in fact there are only three groups. The percentage of cloud cover referred seems wrong (less than 90%), we suppose the authors wanted to say less than 10% of cloud cover.

It would be interesting if the authors would address conditions required to reply the proposed approach, such as availability of satellite data and high resolution existing land cover maps. 

Author Response

Dear Reviewer,

Thank you for taking the time to read the manuscript and report your comments and considerations.

Please find below the responses to your comments, point by point.

Furthermore, we have uploaded a version where changes resulting from each of the reviewer’s comments are added with a comment with the reference. Hopefully, this helps in the review process.

Reviews regarding your comments are kept with the code REV 3 in the comments.

Finally, as you request, we made a minor review of the English language and style. Therefore, we invite you to re-read the entire document again.

 

  • Comment: There are minor details such as the methods mentioning four groups of variables when in fact there are only three groups.

Reply: Thanks for your comments. Otherwise, without a specific reference to the error line, we didn’t find the mentioning to four groups of variables, notwithstanding we reread carefully the methods chapter.

 

  • Comment: The percentage of cloud cover referred seems wrong (less than 90%), we suppose the authors wanted to say less than 10% of cloud cover.

Reply: With respect to this comment, we confirm that the cloud cover threshold adopted to select Sentinel-2 MSI acquisitions to be used for the analysis is 90%. Even tough high cloud cover percentage may affect the satellite acquisitions, there are still cloud free pixels containing surface reflectance information that can be used to densify time series, supplying additional information for temporal statistics calculation.

 

  • Comment: It would be interesting if the authors would address conditions required to reply the proposed approach, such as availability of satellite data and high resolution existing land cover maps.

Reply: Thanks for this suggestion. In order to address conditions required to reply the proposed approach, we add the following sentence and respective reference at line 499: “Considering that Sentinel-2 satellite mission systematically acquires data worldwide over land, the proposed approach can be reproduced and extended to all the vegetated geographical areas in the earth, thanks to the existing high resolution layers (i.e. Copernicus product) and the availability of plant species archive (i.e. EVA). Moreover, in absence of high resolution land cover and thematic maps, they could be generated from the time series analysis of satellite derived vegetation indices.”

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Great job on the revisions. I am sure this will be a valuable paper.

Back to TopTop