Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

The Accuracy of Land Use and Cover Mapping across Time in Environmental Disaster Zones: The Case of the B1 Tailings Dam Rupture in Brumadinho, Brazil

Sustainability 2023, 15(8), 6949; https://doi.org/10.3390/su15086949

by Carlos Roberto Mangussi Filho¹, Renato Farias do Valle Junior¹

, Maytê Maria Abreu Pires de Melo Silva¹, Rafaella Gouveia Mendes¹, Glauco de Souza Rolim², Teresa Cristina Tarlé Pissarra²

, Marília Carvalho de Melo³, Carlos Alberto Valera⁴

, Fernando António Leal Pacheco^2,5,*

and Luís Filipe Sanches Fernandes⁶

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Abdur Raziq

Reviewer 4: Anonymous

Sustainability 2023, 15(8), 6949; https://doi.org/10.3390/su15086949

Submission received: 13 December 2022 / Revised: 24 March 2023 / Accepted: 17 April 2023 / Published: 20 April 2023

(This article belongs to the Special Issue Land and Water Degradation in Catchments: The Role of Remote Sensing for Assessment and Management)

Round 1

Reviewer 1 Report

The core of this code is evaluating the satellite images accuracies to identify the changes in LULC caused by tailing dam using machine learning on the google earth engine platform. The first and biggest concerns is that this model due to limitation of GEE is very tiny, small-scale work. In the case of coding, as person with solid and deep programming and coding experiences with C++, python, C#, Java and … I couldn’t actually follow you with L192-193. GEE primarily focuses on closed-source platforms i.e. Web-based editor with JavaScript wrapper and their own frameworks, and this means that using Java is an obligation and NEVER EVER can be considered as contribution. You talked about python as keyword while end to end nothing concerning that can be found. What is the role of python??? GEE has functionality for open-source fashion with python API but under the hood, it calls closed-source proprietary frameworks and their own objects. This is a very big obscure which must be clarified. However, using Geemap python package may solve the problem, but this obviously means that the effort for coding stands in very low-level of proficiency.

Just as an example, give exact comparison with other works like https://www.mdpi.com/2072-4292/14/11/2654, …

There is long discussion on this paper, where according to the core of this paper, it cannot be recommended because:

- Authors in Abstract claimed for ML models while here they just have RF for three satellite images.

- According to L29-30, different resolution satellite images has been used, where the main prerequisite of applied RF is the unified pixels size. RF cannot simultaneously work with different resolutions. This obviously means that the input data must have and provide similar resolution. How the unified pixel size has been provided? With final resolution??? Any image fusion?? image enhances processing?? A novel approach to solve this problem can be found at https://www.sciencedirect.com/science/article/abs/pii/S0341816219303674.

- I was lost with the modeling process, you have used the satellite images for the interval of 2018-2021, so what was the output???? A very simple comparison using GIS can show the changes in LULC. I cannot find what the RF is going to predict or classify??? You must exactly clarify the used features and attributes.

- Another well recognized concern but totally lacked here is referred to the applied time dependent data, where their analyses due to subsurface heterogeneity cannot provide any generalization from a single study. Therefore, it suffers from accurately identifying the correct model to represent the data but also the past data time data are not enough to predict or classify the future. Multiple additional features should be taken into account. How did you judge the efficiently dealing with outliers?

- There is no clear evidence to show how the RF has been adjusted, the authors gave a shape file to RF!!! but for classification you must express the slide and none slide points, pixel extractions, normalizing, the method that the authors became assure for the optimum topology (How, with what criteria, according to which comparison, the method to overcome on overfitting, early convergence, trapping into local minima, …), termination criteria (and what will happen if it is not achieved), …. I curiously would like to see how the error improvement during the training process has been monitored and evaluated. Something like carried out at https://www.sciencedirect.com/science/article/pii/S0341816222002752, …

- From L217-221 the authors claimed for some metrics in which definitely need to have the confusion matrix. Where and how it has been produced???? For binary or multi-class????

- It is incredible that that the authors cited the F-score to 2018???????

- F-score due to formulation shows higher accuracy than usual. What was your interpretation??

- The whole Introduction and related work are more like lecturing with shallow technical description on the reviewed literatures. It doesn’t have distinguished outline to show the problem statement and presumed goal. What is the main gap and limitation of previous works? Which limitation and according to what method is going to be filled and progressed? What motives for? What is the main novelty of this work? What is the main advantage of this method rather than previous ones? Significant of contributions?

- This work absolutely needs to be analyzed for uncertainty which totally is missed. How the uncertainty of results and inputs as it is well recognized in geo-hazard engineering analyses. How the uncertainty involved in the datasets and predicted values have been evaluated? What about the reliability-based analysis? Have a looking at state-of the-art techniques like https://link.springer.com/article/10.1007/s11053-022-10051-w … are highly essential.

- Without any doubt very poor discussion in terms of accuracy performance, validation data, verification metrics, ROC curves, what about comparing with other models, … any discussion on limitations of presented method? Can you prove the convergence or stability of the proposed method???? Any explicit discussion to illustrate the limitations, pitfalls and practical difficulties of applied models under certainty???? The work lacks for a prior impact assessment and cost-benefit analysis and did not anticipate solutions for the possible consequences in advance.

- Definitely, the conclusion must be justified.

In addition of above mentioned the following concerns also must be addressed:

1. You are working with images why none of the used metrics for image processing like IoU, mIoU, … are used??? This obviously means that the semantic segmentation accuracy, computing the ratio between ground truth or predicted segmentations are unclear. Further error analysis is needed to gather actional insights that can be used to inform model improvements. This paper therefore suffers from lack of accuracy performance.

2. long discussion on the used datasets, 1) Whether you are building inference models or predictive models, you will achieve better results by first verifying that you have chosen the optimal set of features to train your model with. 2) any analyses in considering the features selection a) based on missing values, b)based on variance, c) based on correlation with other features, d) based on model performance and … 3) the most important part of data analysis is data visualization, which refers to the process of creating graphical representations of data and trends or patterns identification as well as outliers. 4) properties of dataset including a) center of data, b) skewness of data, c) spread among the data members, d) presence of outliers, e) correlation among the data, f) type of probability distribution that the data follows and … 5) feature augmentation and many other characteristics are obviously missed and lacked.

3. Concerning the given link, I think something doesn’t work properly as I tried for several times to have look at code and see the logics behind but failed to do so.

4. As mentioned in #3, I couldn’t access to the code, but based on the Fig 3 due to framework limitations, this work can be quite challenging, as the GEE choose the framework based on maps and reduced operations and thus enables massive parallelism but increases complex coding style. Your solution???

5. From coding point of view, GEE framework has significant limitation in the programing, where map-and-reduce programming is not suitable to solve all remote-sensing problems. This is very obvious concerns which I strongly would like to know your response to see how did you solve this problem.

6. Fig 1 id from authors?? Figure 2 is from authors??? What does it mean authors archive???

7. You have used GEE, where due to several problem like applicability for only noncommercial and non-production use with processing and storage limits.

8. The English of the work due to several linguistic flaws must be proofread by native expert.

9. Introduction is dull with inappropriate citations. For example, L42-43, The effect of mine exploration on economy and environment is recognized in 2020???? Or L43-44 in 2019 and 2020???? The citation for the novel used approaches or applied method must be updated but you are not allowed to cite the pretty well-known concepts to recent researches. This obviously doesn’t show the depth of literature review.

10. I couldn't see any discussion for when objects are partially obscured by other objects or colored in eccentric ways. In such condition, how the vision system uses cues and other pieces of knowledge to fill in the missing information and reason about what we're seeing.

11. How can this model get the features??

12. The used keywords should be representative and available in both Abstract and context. Definitely none of the used keywords can show any specificity of this work. What can be depicted from JavaScript, python machine learning, … Which one of them is reflective to show what the readers can expect to find???? Without any doubt must be reformulated.

Author Response

Reply Letter (the responses are also provided as PDF file, where questions and answers are presented with different colors and references to text added to the revised version are highlighted in yellow; please consult for a better view of our replies).

First of all, we thank the Editor and the four reviewers for their decision to give us the opportunity to review the manuscript. Both reviewers were very competent and provided valuable comments and suggestions, which helped us to improve the manuscript as we hope will be clear to them when they start reading the revised version.

In the following paragraphs, we present a point-by-point response to the editor and reviewers' comments. Our responses are written in blue and text added to the revised manuscript is shaded yellow, to help the editor and reviewers see the extent and where improvements have been made in the manuscript.

Due to numerous questions raised by the reviewers, we made a profound change in the introduction to answer the doubts raised.

We hope that the editor and reviewers are satisfied with the revised manuscript and that this version can be published in this prestigious journal.

Reviewer #1

Answer) Thank you very much for the comment. We agree with your observation that the selected keywords did not identify the main subject of the article, and therefore were replaced to favor the visibility of the text. The GEE worked as a tool to facilitate the collection and classification of images, and based on your valuable observations, we redid the introduction of the article, presenting the hypotheses and desired objectives more clearly.

Keywords: Remote sensing; Random Forest classifier; Google Earth Engine; socio-environmental impacts; soil cover change; environmental degradation

There is long discussion on this paper, where according to the core of this paper, it cannot be recommended because:

- Authors in Abstract claimed for ML models while here they just have RF for three satellite images.

Answer) Thanks for the feedback. However, Random Forest is a machine learning algorithm that belongs to the group of supervised learning algorithms. It is widely used for classification and regression tasks. The algorithm builds a forest of decision trees, where each tree is trained on a random sample of the training set and predictions are made by polling the tree results. This process allows the Random Forest model to capture uncertainty in its predictions, which makes it a popular choice in many machine learning problems. We added various applications to the revised Introduction section, as follows:

The method has been successfully applied in many studies, namely to predict time-temperature transformation diagrams of high-alloy steel [32, 33], estimate the phase transformation temperature and the hardness of low alloy steel [34], as well as in the construction of continuous cooling transformation diagrams in welding steels [35].

Answer) Thank you for your comment. Using bands of images with the same pixel resolution to match a common pixel size is important to ensure accurate classification and analysis of images. This is because the machine learning algorithm used in image classification, such as Random Forest, requires that all input bands have the same spatial resolution. If different bands have different resolutions, they will have different areas represented in each pixel, which can affect the accuracy and consistency of the results. In this work, it was necessary to mention that we did not convert images into a common pixel size, but used bands with the same spatial resolution in each satellite selected in the script.

In this way, we add this detail to the wording of the article in the introduction and methodology. In the Introduction, we added the following

“Besides the numerous applications indicated in the previous paragraph, there were several studies that aimed to characterize LULC changes caused by dam failures, using high resolution satellite images. In general, the approach was to compare an image before with an image after the failure. For example, changes along watercourses were detected after the rupture of Bento Rodrigues dam in Mariana, based on that type of analysis made over Landsat 8 (OLI) [5] and Sentinel 2 [53] images. For the characterization of changes resulting from the rupture of B1 dam in Brumadinho, Landsat 8 and Sentinel 2 images were also used [54]. However, no study has evaluated LULC changes resulting from dam failures, based on a integrated analysis of a sequence of short-spaced images. This has been introduced in the present study, whereby LULC change characterizations based on the before-after images approach were replaced by a characterization based on a single image aggregated from a time-series of before-after images, using the Pixel Reduction to Median tool as did Noi Phan et al [55]. According to this author, this fusion technique proved to be more precise and accurate than the traditional assessment. The attention to spatial resolution and spectral domain was also took into account in this work, because these features can interfere with the results of LULC classification. Remote sensing based on the processing of images in the visible as well as in the near infrared bands of color spectrum has been used to assess surface sediment concentration [56,57,58], but those were relatively uncommon examples.”

In the methodology, we added the following:

“To improve the accuracy of LULC classifications, besides the use of clustered images as mentioned above, it is essential to use image bands with the same pixel resolution. This is due to the fact that machine learning algorithms such as Random Forest, which were used in the classification, require that all input bands have the same spatial resolu-tion. In this work, images from different satellites were used, each with its own spatial resolution. However, no resampling of images was carried out, meaning that no changes were made to the original pixel sizes. Instead, only the bands with the same spatial res-olution on each satellite were selected. Thus, to properly classify the images from the Landsat 8, Sentinel-2 and PlanetScope satellites, the bands (2,3,4,5,6,7), (2,3,4,8) and (R,G,B), respectively, were used (Table 1).”

Answer) To clarify the doubt, the fusion of several images over the course of a year is used to generate a single median image using the Google Earth Engine script. This procedure is efficient because it allows obtaining an image with less noise and better information quality. One of the main advantages of using this single median image for supervised image classification is the reduction of the impact of temporal variations in classification. When using multiple images over time, classification can be influenced by seasonal changes or temporal variations, which can lead to classification errors. The median image helps minimize these effects and produce a more accurate and consistent image for supervised classification. Furthermore, the median image can also help to reduce the amount of data required for supervised classification, which can make the process more efficient and cost-effective in terms of computational resources and time.

Such information was added in the introduction and methodology, as exemplified above.

Answer) Failure to remove outliers can negatively affect the accuracy and precision of image classification results. However, in this work, we did not visually detect many discrepant values in the image that make the final result unfeasible.

- From L217-221 the authors claimed for some metrics in which definitely need to have the confusion matrix. Where and how it has been produced???? For binary or multi-class????

Answer) Multi-class! And the doubt is resolved by presenting all the confusion matrices in the Annex (Figure 1S)

- It is incredible that that the authors cited the F-score to 2018???????

Answer) Mab there are other older (original) references, but this one is readily assessable to everyone. That was why we used it. Besides, the use of F-score was not a fundamental part of our work, just a performance indicator, the reason why we found no necessity to go back to the original work.

- F-score due to formulation shows higher accuracy than usual. What was your interpretation??

Answer) The high F-scores indicate that the RF model was robust enough to estimate the classes, other works did not use RF.

Answer) We rewrote the introduction in order to solve the doubts raised. Now, the motivation and objective of the study are clearly presented, as follows:

In an attempt to fill in the aforementioned gaps related to classification procedures of satellite images in landscapes affected by tailings dam ruptures, the aim of this study was to evaluate LULC changes in the sub-basin of the Ferro-Carvão Stream, after the breach of B1 dam in Brumadinho. To this end, we sought to assess the accuracy of LULC mapping using the pixel reduction technique to the median of several annual images, in order to generate a single temporal image. Satellite images with different spatial resolu-tions were used, together with the Random Forest (RF) model in Google Earth Engine (GEE) to classify the affected areas. In addition, the performance of the model was eval-uated without the use of image resampling, using bands with the same spatial resolution and including information from the visible and near infrared spectrum bands in the identification of tailings along the river channel. The resulting maps provide a perspec-tive on the changes in land use and occupation resulting from the dam failure and on their potential implications for the local economy and ecosystem integrity, thus creating a basis for the socio-environmental recovery of the region. Overall, this study is expected to contribute to simplify the application of remote sensing techniques in the monitoring of areas impacted by environmental disaster events.”

Answer) The objective of this work is not to analyze the uncertainty, but rather the performance of the models based on the F-score and the Matthews correlation. In the work presented above, the authors evaluated the uncertainty with Monte Carlo simulation, which is another approach. However, this uncertainty assessment suggestion can be carried out in another work

Answer) We believe that available published information has been discussed. Verification metrics were presented and discussed based on BP, UA, F-score and MCC values. However, the ROC curves were not presented as there were too many graphs and tables. Thus, following his guidance, the limitations of the RF model were included in the text.

- Definitely, the conclusion must be justified.

Answer) We redid part of the conclusion based on the remodeled objectives presented in the introduction of the work. The new paragraphs added to the conclusion were:

“The evaluation of accuracy of LULC mapping on a time scale in areas affected by tailings dam failures showed that the PRM technique associated with satellite images with different spatial resolutions and the Random Forest (RF) model in the GEE can be successfully used to the classification of these areas. In addition, the performance of the model was accurate using visible and near-infrared spectrum bands and without the need for image resampling, using bands with the same spatial resolution. This action is shown to simplify the data pre-processing process specifically applied to the LULC di-agnosis, enabling the identification of changes to the environment caused by the rupture of a mining tailings dam.

The use of the same training dataset in dam failure zones presented a viable alter-native to classify temporal images in different spatial resolutions. This can reduce the time and cost involved in collecting new training data and favor faster diagnosis. How-ever, it is important to emphasize that this use can affect the accuracy of the classification mainly in areas with subtle changes in land cover. Therefore, it is important to evaluate accuracy in new research. This involves adopting different time intervals and spatial resolutions, checking the results in different scenarios, before adopting this approach as common practice in disaster areas.”

You are working with images why none of the used metrics for image processing like IoU, mIoU, … are used??? This obviously means that the semantic segmentation accuracy, computing the ratio between ground truth or predicted segmentations are unclear. Further error analysis is needed to gather actional insights that can be used to inform model improvements. This paper therefore suffers from lack of accuracy performance.

Answer) Thank you for your comments. However, when using the confusion matrix, we believe it is a useful alternative to evaluate the performance of the image segmentation model in relation to the IoU and mIoU metrics, especially when the segmentation involves unbalanced classes. The confusion matrix allows you to identify false positives, false negatives, true positives and true negatives for each class, which allows you to calculate the evaluation metrics for each class individually. However, the confusion matrix does not completely replace the IoU based evaluation metrics and mIoU, which are particularly useful when multiple classes are involved in the segmentation. In the next work to be developed, we intend to use the IoU and the mIoU as aggregated metrics to provide a general measure of performance for the segmentation, regardless of the number of classes involved.

long discussion on the used datasets, 1) Whether you are building inference models or predictive models, you will achieve better results by first verifying that you have chosen the optimal set of features to train your model with. 2) any analyses in considering the features selection a) based on missing values, b)based on variance, c) based on correlation with other features, d) based on model performance and … 3) the most important part of data analysis is data visualization, which refers to the process of creating graphical representations of data and trends or patterns identification as well as outliers. 4) properties of dataset including a) center of data, b) skewness of data, c) spread among the data members, d) presence of outliers, e) correlation among the data, f) type of probability distribution that the data follows and … 5) feature augmentation and many other characteristics are obviously missed and lacked.

Answer) The discussion has been adjusted. However, to verify that the ideal set of features was chosen to train the Random Forest model, we adopted a more complete approach and baseline analysis on the confusion matrix. The confusion matrix provides detailed information about the performance of the model in each class. Based on this matrix, it was possible to calculate the accuracy by class, the precision, which provide more precise information about the performance of the model in each class. Allied to this, visual analysis was used to evaluate the consistency of the class limits in the GIS, to be compared even with the LULC generated by MapBiomas.

MapBiomas. Land Cover and Land Use Map of Brazil - Collection 6.0. 2021. Available at: https://mapbiomas.org/

Concerning the given link, I think something doesn’t work properly as I tried for several times to have look at code and see the logics behind but failed to do so.

Answer) The link has been tested and is working

As mentioned in #3, I couldn’t access to the code, but based on the Fig 3 due to framework limitations, this work can be quite challenging, as the GEE choose the framework based on maps and reduced operations and thus enables massive parallelism but increases complex coding style. Your solution???

Answer) In the specific case in which the limitation of the GEE framework was mentioned, a possible solution would be the adoption of programming strategies that allow dealing with parallelism and code complexity. Some of the strategies that could be adopted include: a) breaking the code into smaller tasks: this strategy involves breaking the code into smaller and more manageable tasks, which can be executed in parallel and integrated later. This can help reduce code complexity and facilitate error detection, b) use the ee.Reducer library: this GEE library allows you to perform large-scale reduction operations, such as sum, average, standard deviation, among others. The use of this library can facilitate the implementation of more complex calculations in large-scale images, c) Adopt the "filter-map-reduce" strategy: this strategy involves applying a filter function to a dataset, followed by the applying a mapping function to each element that meets the filter criteria, and finally applying a reduction function to the mapping results. This approach can help reduce the code complexity and allow the use of GEE parallelism,d) use other programming languages: if the complexity of the GEE JavaScript code becomes too high, it is possible to use other programming languages, such as Python or R, to perform data processing and later integrate them into the GEE.

From coding point of view, GEE framework has significant limitation in the programing, where map-and-reduce programming is not suitable to solve all remote-sensing problems. This is very obvious concerns which I strongly would like to know your response to see how did you solve this problem.

Answer). Although GEE offers a series of tools and libraries that facilitate the processing of remote sensing data, it is not always possible to solve all problems efficiently using only these tools. . One of them is the use of external programming libraries that offer additional resources, such as NumPy or SciPy, which can be integrated into GEE to perform more complex mathematical operations. In addition, it is possible to use the JavaScript programming language, which is the basis of GEE, in a more flexible way, combining with other languages, such as Python, to create more personalized solutions adapted to the specific needs of the analysis in question. Another strategy that can be adopted is the development of own algorithms for processing remote sensing data, using the libraries and tools available at GEE as a basis. This approach requires more advanced programming knowledge, but can lead to more accurate and efficient results for solving specific problems.

Fig 1 id from authors?? Figure 2 is from authors??? What does it mean authors archive???

Answer) Note already removed from the text.

You have used GEE, where due to several problem like applicability for only noncommercial and non-production use with processing and storage limits.

Answer) To answer this question, it is important to first clarify some points about Google Earth Engine (GEE). GEE is a remote sensing data processing and analysis platform that offers a wide range of tools and resources for satellite image analysis. Although GEE has several advantages and is widely used in research and academic studies, like any platform, it also has some limitations.

One of the limitations of GEE is that its commercial and productive use is restricted, which can be an obstacle for some applications. In addition, GEE has processing and storage limits, which can limit the amount of data that can be processed or stored in a single analysis.

However, despite these limitations, GEE can be a very useful tool for analyzing and processing remote sensing data in several areas, including environmental studies, monitoring of natural resources, among others. In the specific case in which GEE was used to classify images to identify land use and occupation in an ore dam rupture zone, it was chosen because it is an adequate and efficient tool for this purpose.

The English of the work due to several linguistic flaws must be proofread by native expert.

Answer) The article was reviewed by a native expert. A certificate was added to the cover letter

Introduction is dull with inappropriate citations. For example, L42-43, The effect of mine exploration on economy and environment is recognized in 2020???? Or L43-44 in 2019 and 2020???? The citation for the novel used approaches or applied method must be updated but you are not allowed to cite the pretty well-known concepts to recent researches. This obviously doesn’t show the depth of literature review.

Answer) I hope that with the new approach presented in the introduction we have clearly presented the scope of the work.

I couldn't see any discussion for when objects are partially obscured by other objects or colored in eccentric ways. In such condition, how the vision system uses cues and other pieces of knowledge to fill in the missing information and reason about what we're seeing.

Answer) Thank you for your comment. After the respective adjustments made in the introduction to clarify the relevance of the work, we made improvements in the discussion of the results. In this way, we hope to have adjusted the discussion making it clearer. Specifically, we added some text on the limitations of our approach, as follows:

“In spite of these good results, there are factors that can negatively affect the performance of the RF model in the LULC classification task. Among these factors, dependence on sample data stands out, which is considered a critical factor in the performance of RF in the LULC classification, since it is influenced by the selection of training data. A recent study applied to Sentinel-2 satellite images with the classification steps performed in the GEE, carried out by Avci et al. [94], showed that increasing the number of training da-tasets can significantly improve classification accuracy. According to Story and Congal-ton [95], at least 30 samples are necessary to properly fill in the error matrix, which is exceeded in all classes evaluated in this work, contributing to the high accuracy meas-ured. Another factor that can negatively affect the performance of the RF model is the sensitivity to spatial resolution, which is common in areas with lower spatial resolution, especially in urban areas where heterogeneity is high. Finally, another factor to be con-sidered is the underestimation of rare classes in which the RF model may present a bias towards the most frequent classes, not being suitable for sparse features and biased when dealing with categorical features [96]. Thus, higher F-Score values, which are calculated as the harmonic mean of precision and sensitivity, indicate that the classification model has a good ability to correctly identify the class samples. In our study, all F-Score values calculated for the evaluated classes were greater than 0.8, which indicates that the model has good precision and sensitivity in all evaluated classes, as shown in Table 1S of Sup-plementary Materials.”

How can this model get the features??

Answer) Thank you for your comment. We added to the work methodology the necessary resources for the execution of Random Forest. The added text was as follows:

“In order to use the Random Forest classification model in the GEE platform, some resources were prepared and embedded in a JavaScript script specifically written to produce the classification. Initially, the vector file (shapefile) of Ferro-Carvão Stream watershed was delineated in SIG QGIS software and added to the script. Within this area and bearing on the time frames defined in Table 2, a record of images from Sentinel-2, Landsat 8 and PlanetScope was assembled and also embedded in the script. Next, the pixel was reduced to the median (PRM method), which generated a single temporal im-age for each analyzed year. The training dataset containing samples of each land use and land cover class was elaborated for the aggregated images. The selection sampling points was carried out manually based on MapBiomas, which is the official map of land use and occupation in Brazil (www.mapbiomas.org.br), and also images from Google Earth in the periods under analysis. In total, 2100 training points were randomly collected in order to represent the classes of land use and land cover to be classified. The classes were forest formations, agriculture, urban area, pasture, mining/waste and water, with 900, 180, 200, 355, 400 and 65 sampling points, respectively. These points were imported into the script in the GEE and 70% were randomly allocated to the training samples, while 30% were allocated to the validation samples. Finally, the parameter corresponding to the bands to be used and other specific parameters from the classification algorithm were defined. The image classification was generated from a routine prepared in JavaScript in the GEE code editor menu, comprising three specific scripts for Landsat 8, Sentinel-2 and PlanetScope images, respectively, freely available at https://code.earthengine.google.com/6655d49a2aaeb7a2c6320b81f93e6668; https://code.earthengine.google.com/a2bc8ba3851d22c07a87cedb71bd56d7; https://code.earthengine.google.com/e22aa23d3f9836bc660138aaa3365266.”

The used keywords should be representative and available in both Abstract and context. Definitely none of the used keywords can show any specificity of this work. What can be depicted from JavaScript, python machine learning, … Which one of them is reflective to show what the readers can expect to find???? Without any doubt must be reformulated.

Answer) Thank you for your comment. The keywords were adequate to the abstract and context of the text. The new keywords were

Keywords: Remote sensing; Random Forest classifier; Google Earth Engine; socio-environmental impacts; soil cover change; environmental degradation

Author Response File: Author Response.pdf

Reviewer 2 Report

Review of the manuscript ID sustainability-2127752 entitled ‘Spatial and spectral resolution in the remote sensing assess to land use and cover changes in areas of tailings dam failure’

The authors address a relevant community issue related to the environmental degradation effects of mining damage, specifically the results of the damage to the tailings dam. The release of contaminated post-mining waters as a result of it causes negative consequences for both communities and the environment. This is accompanied by changes in land use and land cover. The authors analyzed changes in these elements and related them to two dam failures in the state of Minas Gerais, Brazil, which occurred over a 5-year period (2015 - 2019) in the Ferro-Carvão Creek catchment. The study used Random Forest, a machine learning algorithm, to construct temporal and spatial scenarios, classify their relevance and identify morphological changes. The algorithm analyzed satellite imagery from different sources and with different resolutions: Landsat (30m), Sentinel-2 (10m) and PlanetScope Dove (4.77m). The 2018-2021 imagery was processed on the Google Earth Engine (GEE) platform. This minimized the range of uncertainty associated with the difference in resolution between the imagery. The authors examined the effectiveness of applying machine learning models to different remote sensing data and their suitability for identifying changes in land cover. They showed that although the spatial resolution of the imagery affects the accuracy of the classification, all of the satellite data sources evaluated allowed correct classification of land use and land cover.

This is a useful, good conceptualized and described study. However, I find some editorial shortcomings. In my opinion, the structure of the text should be improved. The results and discussion chapter should be separated into two different sections. It should be clear to the reader what is the achievement of this work and the main authors' finding. This richly illustrated material, in turn, should be the basis for a discussion in that the authors should relate their own results to the published achievements of other authors. Currently, we have a mix of results and discussion in the paper, and this makes the authors' findings blurry.

I believe that after minor editorial changes, this study will be an interesting contribution to Sustainability Journal.

Author Response

Reviewer #2 (the responses are also provided as PDF file, where questions and answers are presented with different colors and references to text added to the revised version are highlighted in yellow; please consult for a better view of our replies).

Review of the manuscript ID sustainability-2127752 entitled ‘Spatial and spectral resolution in the remote sensing assess to land use and cover changes in areas of tailings dam failure’

Answer) Thank you for your comment. As suggested, we improved the structure of the text, including new information in the introduction, to favor the understanding of the scope, hypotheses and objectives of this work. Modifications were made in the discussion and conclusion of the article. However, we chose to keep the results and discussion in a single section of the article, to reduce redundancy and repetition of information, as the separation could compromise the structure and interfere with the understanding of the article as a whole.

The text added to the Introduction section that helps understanding the scope of the problem, the objectives, etc, was as follows:

I believe that after minor editorial changes, this study will be an interesting contribution to Sustainability Journal.

Answer) We passed the revised manuscript through a native English speaker (certificate attached)

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments for author File: Comments.pdf

Author Response

Reviewer #3

Figure 1 – L. 122

The contents of this figure are not legible. In order to convert a clear PDF document, whilst retaining its high quality, we kindly request the provision of figures and schemes at a sufficiently high resolution (min. 1000 pixels width/height, or a resolution of 300 dpi or higher).

Answer) Thank you for your comment. The figures were readjusted and saved with a resolution of 600 dpi.

172 30N and 30 S

Space and degree symbol is missing.

Answer) Thank you for your comment. It has been corrected

L.349

Figure 5 a 9

Answer) Thank you for your comment. The figures were readjusted and saved with a resolution of 600 dpi.

Author Response File: Author Response.pdf

Reviewer 4 Report

Data-driven machine learning has attracted significant attention for its great advantage to solve multivariate nonlinear problems, which is realized by building mathematical models to describe relationships between input (influence factors) and output. Machine learning can be applied both for classification and regression and it has been successfully introduced in several fields. In the manuscript ‘Spatial and spectral resolution in the remote sensing assess to land use and cover changes in areas of tailings dam failure', the authors investigated remote sensing accuracy associated with machine learning models to identify changes in land cover caused by tailings dam failure under different spatial and spectral resolutions.

Given the expediency of providing a report, my comments are somewhat limited, though I hope they are still useful to the editors and authors:
This manuscript can be considered after some revisions suggested below:
1. Please improve your title. It is confusing, too long, and quite complicated to follow. Try to be as simple as possible.
2. The overall quality of the paper, and the implementation aspect can further be improved.
3. The novelty of this work given the published research on the topic must be clearly highlighted. What is real novel on this research? I noticed several studies used similar techniques for such purposes in the literature, authors need to clearly distinguish the merits and de-merits of the current study with respect to the existing studies.
4. Introduction section can be further enhanced. I noticed that the authors have tried well, however, to show its wide applications and importance, it is suggested to add some general lines about the importance of machine learning in other areas of science and engineering as well with some examples/references. Some examples in other areas may be, https://doi.org/10.1016/j.commatsci.2019.109282, https://doi.org/10.1016/j.jallcom.2020.153694, https://doi.org/10.1016/j.jmst.2021.07.038, https://doi.org/10.1007/s11837-020-04057-z. Hence these and some others may be briefly discussed in the introduction section.
5. The significance of the design carried out in this paper can further be well explained relative to other important studies in this area. It will further enhance the quality of this work if the current results are compared with some more latest relevant studies. It is suggested to review, comment, and compare more recent published works.
6. Authors need to clarify, what makes the proposed method most suitable for this unique task? What new development to the proposed method have the authors added compared to existing studies?

7. In machine learning, specifically in classification, the accuracy could also be high due to class imbalance in the dataset. Could the authors provide more information on the distribution of classes by providing detailed information about confusion matrix. In case there is significant class imbalance, it might also aid their analysis if they perform under/over sampling or selectively penalize misclassification of the minority class.
9. The article suffers from some language problems. E.g. even in the abstract line 33 it has been written ‘learning machine models’, should it not be ‘machine learning models’?? Similarly, I noticed several language problems/ grammatical/ sentence structuring mistakes throughout the paper. It is suggested to thoroughly check the manuscript and correct such type of mistakes.

Author Response

Reviewer #4

Given the expediency of providing a report, my comments are somewhat limited, though I hope they are still useful to the editors and authors:

This manuscript can be considered after some revisions suggested below:

Please improve your title. It is confusing, too long, and quite complicated to follow. Try to be as simple as possible.

Answer) Thank you for your comment. Following your suggestion, we created a new title.

Accuracy of land use and cover mapping acrross time in envi-ronmental disaster zones: the case of B1 tailings dam rupture in Brumadinho, Brazil

The overall quality of the paper, and the implementation aspect can further be improved.

Answer) Thank you for your comment. We made improvements in the introduction, methodology, discussion and conclusion. A number of paragraphs are highlighted in yellow in the revised text, where the reviewer can see the improvements made.

The novelty of this work given the published research on the topic must be clearly highlighted. What is real novel on this research? I noticed several studies used similar techniques for such purposes in the literature, authors need to clearly distinguish the merits and de-merits of the current study with respect to the existing studies.

Answer) Thank you for your comment. In the introduction we report as suggested the merits and novelties of the current study in relation to existing studies. The specific motivation of the study was now clearly stated, as follows:

Introduction section can be further enhanced. I noticed that the authors have tried well, however, to show its wide applications and importance, it is suggested to add some general lines about the importance of machine learning in other areas of science and engineering as well with some examples/references.

Some examples in other areas may be, https://doi.org/10.1016/j.commatsci.2019.109282, https://doi.org/10.1016/j.jallcom.2020.153694, https://doi.org/10.1016/j.jmst.2021.07.038, https://doi.org/10.1007/s11837-020-04057-z. Hence these and some others may be briefly discussed in the introduction section.

Answer) Thanks for the feedback. We made adjustments to the introduction as suggested (see answer to question 3) and added the suggested bibliographies in the context of the text, as follows:

“The method has been successfully applied in many studies, namely to predict time-temperature transformation diagrams of high-alloy steel [32, 33], estimate the phase transformation temperature and the hardness of low alloy steel [34], as well as in the construction of continuous cooling transformation diagrams in welding steels [35].”

The significance of the design carried out in this paper can further be well explained relative to other important studies in this area. It will further enhance the quality of this work if the current results are compared with some more latest relevant studies. It is suggested to review, comment, and compare more recent published works.

Answer) Thanks for the comment. We tried to present, as suggested, the meaning of the project more clearly in the changes made in the introduction and discussion of the results.

Authors need to clarify, what makes the proposed method most suitable for this unique task? What new development to the proposed method have the authors added compared to existing studies?

Answer) The proposed method uses the median of annual time series as input for the Random Forest classification model in the GEE, making it possible to obtain a more accurate representation of temporal changes in land use and land cover over time. This is because the median of the time series reduces the influence of extreme values and noise, allowing temporal patterns of change to be more clearly identified. By obtaining a more accurate representation of temporal changes, the Random Forest classification model is able to use this information to classify different land use and land cover classes more accurately. This is because the model is able to capture the subtle changes in the spectral characteristics of the images that can indicate the presence or absence of a certain class. Furthermore, by using the median of annual time series as input to the ranking model, it is possible to obtain a more consistent ranking over time, as the model takes into account temporal changes in the images. This is particularly important for classifying areas that show significant changes in land use and land cover over time, such as areas where mining dams have failed. However, no study has yet been carried out to evaluate the changes resulting from the disruption using as a tool the reduction of the pixel to the median of several annual images to generate a single temporal image that allows the diagnosis of changes with greater temporal precision.

Such explanations were added in the introduction and methodology. The amendments to the Introduction we already reproduced above (answer to question 3). In the Methodology section, we added the following text to improve the revised manuscript.

In machine learning, specifically in classification, the accuracy could also be high due to class imbalance in the dataset. Could the authors provide more information on the distribution of classes by providing detailed information about confusion matrix. In case there is significant class imbalance, it might also aid their analysis if they perform under/over sampling or selectively penalize misclassification of the minority class.

Answer) Thank you for the comment. A confusion matrix was made available in the supplementary material (Figure S1) to show the values

The article suffers from some language problems. E.g. even in the abstract line 33 it has been written ‘learning machine models’, should it not be ‘machine learning models’?? Similarly, I noticed several language problems/ grammatical/ sentence structuring mistakes throughout the paper. It is suggested to thoroughly check the manuscript and correct such type of mistakes.

Answer) Article has been reviewed by native expert (a certificate was provided with the cover letter).

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I read the responses and appreciated authors for their attempts. Tried to be positive and thus ignored to open new discussion on the responses to respect the authors as my colleagues. However, in some cases actually couldn’t be convinced because of obvious deviation of your responses from the core of comments. It is recommended that use the line numbers of the template of Sustainability to assign the places that you have modified. As above mentioned, I just leave some of the points which the authors needs to process them in another round of revision:

About the ‘The core of this code… with other works like https://www.mdpi.com/2072-4292/14/11/2654, …’, I didn’t get appropriate feedback. Please read it again and response point by point.

‘Authors in Abstract claimed for ML models while here they just have RF for three satellite images.’, Dear author I know RF is a type of ML, the main problem is that you talked about ‘ML models’ (Plural form) but just RF (single form) is presented. The applicability of RF has long been understood but also its limitations. Therefore, your response obviously is REJECTED.

Concerning your response to comment ‘According to L29-30, …’, you are strongly emphasized to reanalyze the comment. The core is assigned to pixel resolution which is an obvious problem in image processing and thus it is not related to bands. Look at the mentioned reference or similar ones to get more insights. Therefore, your response definitely must be polished. You at the next comment exactly wrote ‘To clarify the doubt, the fusion of several images over the course of a year is used to generate a single median image’ which I emphasized on.

Concerning the ‘I was lost with the modeling process… and attributes’, using term ‘doubt’ in response is not recommended. As a colleague I don’t doubt your work, the problem is assigned to clarification which further can be used by other scholars who may interested to your work. Your response therefore technically doesn’t satisfy my comment.

In the comment ‘Another well recognized concern… dealing with outliers?’, you should discuss on subsurface heterogeneity. Something for example has been discussed at https://doi.org/10.3390/ijgi10050341, …

‘From L217-221… or multi-class????’, as previously above mentioned, there isn’t doubt on your work, the problem is clarification.

‘It is incredible that … F-score to 2018???????’, definitely agree with you as it is an indicator, but if it is not necessary why you cited it and if you want cite it ethically the original work beside available ones can be used.

Obvious misunderstanding ‘F-score due to formulation … What was your interpretation?? ‘. I asked due to formulation F-score give higher accuracy than other metrics. This is a very obvious mathematical concept. I asked how you interpreted it because F-score doesn’t reveal mutual information among features. It is very obvious, the higher accuracy metric the better model.

Concerning the comment ‘This work absolutely needs… are highly essential’, I couldn’t handle your response. You mentioned that uncertainty was not the objective but just performance and then you have done uncertainty with MC!!! One of the main reasons for using the accuracy performance is to show the precision and reliability of the model and prove how it can decrease the uncertainty in prediction. The core of this comment strongly is assigned to use of AI-based uncertainty approaches for AI models like you have here while MC can be carried out for any other models. If, we consider your response then the problem of computation expense and time then will be bolded. You must open windows for the readers to know about the possibility of new approaches in their field not just repetition of know materials. Hope that you get my point.

Dear colleagues, when I asked, ‘Without any doubt … presented method’, it is expected to see the model performance for ground truth data in compare with the prediction. It is expected to see error analysis, sensitivity, evidential analysis, comparing with other scholars, … limitations of presented model??? are you able to prove the convergence or stability of that???? Any explicit discussion to illustrate the limitations, pitfalls and practical difficulties of applied models under certainty???? The work lacks for a prior impact assessment and cost-benefit analysis and did not anticipate solutions for the possible consequences in advance. Model calibration must be carried out through the assigned weight database.

‘4. As mentioned in #3,…’, provide one or two descriptive statement in the context.

In the case of ‘5. From coding point of view…’, something like #4 is required.

Concerning the keywords, for example, tailing dam, Brazil, Ferro-Carvão Stream watershed, …

OBS1: Despite of revised English, it is recommended to use third passive voice and avoid using ‘we’, our’, …

OBS2: You are recommended to condense and truncate the conclusion to show the bolded summary of your work.

OBS3: concerning the data availability and supplementary material,

OBS4: Still have problem to access the code, where the link doesn’t work with any of the used accounts.

Author Response

Reply Letter Reviewer #1 Thank again for your review. We have made an effort to change all the points suggested. The responses were provided and we hope to have improved the article. In the following paragraphs, we provide a point-by-point responses to the reviewer's comments. Our responses are written in blue and text added to the revised manuscript is shaded in yellow to help the editor and reviewer see the extent and where improvements have been made to the manuscript. Best regards I read the responses and appreciated authors for their attempts. Tried to be positive and thus ignored to open new discussion on the responses to respect the authors as my colleagues. However, in some cases actually couldn’t be convinced because of obvious deviation of your responses from the core of comments. It is recommended that use the line numbers of the template of Sustainability to assign the places that you have modified. As above mentioned, I just leave some of the points which the authors needs to process them in another round of revision: 1- About the ‘The core of this code… with other works like https://www.mdpi.com/2072-4292/14/11/2654, …’, I didn’t get appropriate feedback. Please read it again and response point by point. Answer) Thank you very much for the comment. In fact, Google Earth Engine (GEE) is a cloud platform that mainly focuses on closed source platforms, using its own framework and JavaScript wrapper. While the platform requires the use of the Java programming language to interact with its API, it also has open source functionality with a Python API. It's true that under the hood, GEE calls proprietary closed-source frameworks and its own objects. However, the use of the Geemap Python package can facilitate the development of scripts in GEE, allowing users to access and manipulate geospatial data efficiently using the Python language. Although the Geemap Python package is a powerful and useful tool, it is important to remember that its use is not exclusive to beginners or low proficiency. The Geemap Python package can be used by users of all skill levels to simplify and automate complex tasks in GEE, allowing them to focus on data analysis and decision making. GEE resources, perform data pre-processing tasks, selection of training samples, adjustment of model parameters, evaluation of model performance and visualization of results in an interactive map. This can save time and effort in the script development process and allow users to focus on analyzing the results. We agree with your observation that the selected keywords did not identify the main subject of the article and, therefore, were replaced to favor the visibility of the text. The GEE worked as a tool to facilitate the collection and classification of the images and, based on your valuable observations, we redid the introduction of the article, presenting the hypotheses and the desired objectives with more clarity. 2- ‘Authors in Abstract claimed for ML models while here they just have RF for three satellite images.’, Dear author I know RF is a type of ML, the main problem is that you talked about ‘ML models’ (Plural form) but just RF (single form) is presented. The applicability of RF has long been understood but also its limitations. Therefore, your response obviously is REJECTED. Answer) Accordingly, we have made the respective corrections to our Abstract. 3- Concerning your response to comment ‘According to L29-30, …’, you are strongly emphasized to reanalyze the comment. The core is assigned to pixel resolution which is an obvious problem in image processing and thus it is not related to bands. Look at the mentioned reference or similar ones to get more insights. Therefore, your response definitely must be polished. You at the next comment exactly wrote ‘To clarify the doubt, the fusion of several images over the course of a year is used to generate a single median image’ which I emphasized on. Answer) In this study, we chose to generate LULC maps for each satellite without performing pixel resampling, resulting in maps with different spatial resolutions. We avoided combining maps with different spatial resolutions, as this could lead to inconsistencies and difficulties in interpreting the results. Thus, in the research objectives, we sought to evaluate the LULC and its differences in each satellite, without involving direct comparisons or map combinations. By maintaining the original resolutions and appropriate scales, we seek to preserve the original data, since pixel resampling can involve interpolation of values. In this way, each satellite can capture information at specific scales, allowing data analysis at the most appropriate scale for each sensor. For example, high-resolution Planet data may be more appropriate for local-scale studies, while Landsat 8 and Sentinel-2 data may be more suitable for regional-scale analysis. By maintaining the original resolutions, we ensure greater transparency and reproducibility in our research. To address possible inconsistencies and difficulties in interpreting the results, we adopted strategies such as the use of quality metrics, including overall accuracy, user accuracy and producer accuracy for each LULC map generated from the different satellites. These metrics help in evaluating ranking performance and quantitatively comparing results across satellites. In addition, we use qualitative analyses, such as visual interpretation of LULC maps and comparison with satellite images, to identify areas where the classification presents inconsistencies or interpretation difficulties. This approach favors the understanding of the limitations and challenges associated with each dataset and satellite. To promote the replicability of the work, we report details of the classification process, including band selection, parameters and validation procedures. This documentation contributes to the reliability of the results obtained. In this way, we add this detail to the wording of the article in the introduction and methodology. In the Introduction, we added the following In an attempt to fill in the gaps above related to satellite image classification procedures in landscapes affected by tailings dam failures, the objective of this study was to evaluate the LULC changes in the sub-basin of Córrego Ferro-Carvão, after the dam failure. B1 in Brumadinho. Therefore, we sought to evaluate the accuracy of the LULC mapping using the technique of reducing pixels to the median of several annual images, in order to generate a single temporal image. Satellite images with different spatial resolutions were used, together with the Random Forest (RF) model in Google Earth Engine (GEE) to classify the affected areas. Furthermore, the performance of the model was evaluated without using pixel resampling, resulting in maps with different spatial resolutions. We avoided combining maps with different spatial resolutions, as this could lead to inconsistencies and difficulties in interpreting the results. Thus, we seek to evaluate the LULC and its differences in each satellite, without involving direct comparisons or map combinations. By maintaining the original resolutions and appropriate scales, we seek to preserve the original data, since pixel resampling can involve interpolation of values. In this way, each satellite can capture information at specific scales, allowing data analysis at the most appropriate scale for each sensor. By maintaining the original resolutions, we ensure greater transparency and reproducibility in our research. To address possible inconsistencies and difficulties in interpreting the results, we adopted strategies such as the use of quality metrics, including overall accuracy, user accuracy and producer accuracy for each LULC map generated from the different satellites. These metrics help in evaluating ranking performance and quantitatively comparing results across satellites. In addition, we use qualitative analyses, such as visual interpretation of LULC maps and comparison with satellite images, to identify areas where the classification presents inconsistencies or interpretation difficulties. This approach favors the understanding of the limitations and challenges associated with each dataset and satellite. To promote the replicability of the work, we report details of the classification process, including band selection, parameters and validation procedures. This documentation contributes to the reliability of the results obtained. The resulting maps provide a perspective on changes in land use and occupation resulting from the dam failure and their possible implications on the local economy and ecosystem integrity, thus creating the foundations for socio-environmental recovery in the region. Overall, it is expected that this study will contribute to simplify the application of remote sensing techniques in monitoring areas impacted by environmental disaster events. 4- Concerning the ‘I was lost with the modeling process… and attributes’, using term ‘doubt’ in response is not recommended. As a colleague I don’t doubt your work, the problem is assigned to clarification which further can be used by other scholars who may interested to your work. Your response therefore technically doesn’t satisfy my comment. Answer) Therefore, our objective is to evaluate land use and land cover (LULC) on each satellite, excluding direct comparisons or map combinations. By preserving inherent resolutions and proper scaling, we seek to maintain raw data integrity. Thus, we adopted the merging of several images over a year for each satellite, generating a single median image through the Google Earth Engine script. This method is efficient, as it allows obtaining an image with less noise and higher informational quality. The main advantage of using this single median image for supervised classification is the attenuation of the impact of temporal variations on classification. When using multiple images over time, classification can be impacted by seasonal changes or temporal variations, resulting in classification errors. The median image helps minimize these effects, providing a more accurate and consistent image for supervised classification. Furthermore, the median image can contribute to reducing the amount of data needed for supervised classification, making the process more efficient and economical in terms of computational resources and time.The Random Forest (RF) algorithm as a supervised learning method based on decision trees, can be used to predict or classify the LULC from the generated median images. The algorithm uses a set of decision trees to perform the classification, improving the accuracy and robustness of the model. During data pre-processing, the median image obtained from Google Earth Engine must be processed and organized into an appropriate dataset for training the classification model, including information on different types of land use and land cover. soil present in the image. Subsequently, the data set must be divided into training and validation for performing the LULC classification. When evaluating the model, appropriate metrics should be used, such as the confusion matrix and general accuracy, among others, to assess the accuracy and quality of the classification performed by the RF model. 5- In the comment ‘Another well recognized concern… dealing with outliers?’, you should discuss on subsurface heterogeneity. Something for example has been discussed at https://doi.org/10.3390/ijgi10050341, … Answer) In geospatial analyses, the identification of outliers can be performed using univariate and multivariate statistical methods. Additionally, spatial analysis, including spatial autocorrelation, can help detect anomalous patterns in the data. Outliers can be caused by measurement errors and/or inconsistencies in the data. Understanding the source of outliers will help determine the most appropriate approach to dealing with them. Depending on the cause and characteristics of the outliers, different strategies can be adopted, such as removing, correcting or transforming the data, in order to improve the accuracy of the results in image classification. In this study, we did not visually observe the significant presence of outliers in the outliers. image that would compromise the final result. However, in future research in the area under analysis, we will seek to implement the analysis of outliers in order to carry out a comparative evaluation of the precision in LULC. 6- ‘It is incredible that … F-score to 2018???????’, definitely agree with you as it is an indicator, but if it is not necessary why you cited it and if you want cite it ethically the original work beside available ones can be used. Answer) The F-score is a metric derived from the work of van Rijsbergen (1979) in the field of information retrieval. The metric was developed to evaluate the effectiveness of information retrieval systems, considering both precision and recall. CJ van Rijsbergen, in his book "Information Retrieval", highlighted the need for a measure that could balance precision and recall to provide a more comprehensive evaluation of the performance of an information retrieval system. Precision and recall are complementary metrics, and emphasis on one can degrade the other. Therefore, it was necessary to find a balance between the two. From this need, the F-score was introduced as a weighted harmonic mean between precision and recall. The harmonic mean is better suited than the arithmetic or geometric mean for combining the two metrics, as it is less sensitive to extreme values and helps to balance the importance of both metrics. popular for evaluating the performance of classification models, especially on problems with imbalanced classes such as fraud detection, spam detection, and medical diagnosis. It is also widely used in land use and land cover (LULC) classification tasks, where some classes may be much more frequent than others. Because of this, we chose to keep it in the article to allow comparison with the literature. Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London: Butterworths. 7- Obvious misunderstanding ‘F-score due to formulation … What was your interpretation?? ‘. I asked due to formulation F-score give higher accuracy than other metrics. This is a very obvious mathematical concept. I asked how you interpreted it because F-score doesn’t reveal mutual information among features. It is very obvious, the higher accuracy metric the better model. Answer) The F-score can provide a measure of how well a ranking model, such as Random Forest, performs in the context of LULC ranking compared to other metrics, because it takes into account both accuracy and recall. This is especially important in scenarios where classes are unbalanced, which is common in LULC sorting problems. Metrics such as accuracy can be misleading in situations of class imbalance. A model can have high accuracy simply by correctly predicting the majority class, even if it fails to correctly identify the minority class. This is not desirable in LULC problems, where it is important to identify and differentiate between multiple classes, including less frequently occurring ones. The F-score, on the other hand, provides an overview of the model's performance, taking into account the model's ability to correctly identify classes (accuracy) and the ability to identify all instances of a class (recall). By balancing these two metrics, the F-score provides a fairer assessment of model performance in LULC classification problems. Higher F-score values indicate better model performance in the classification, determined by the model's performance in correctly predicting land use and land cover classes. These classes can include urban areas, agricultural areas, forests, water bodies and other types of land cover.F-Score values greater than 0.77 can be considered high for the different classes analyzed, indicating that the discrimination between classes was carried out satisfactorily by the optical sensors used in the study [90]. Such information was aggregated in the results and discussion Thus, higher F-Score values, which are calculated as the harmonic mean of precision and sensitivity, indicate that the classification model has a good ability to correctly identify the class samples. In our study, all F-Score values calculated for the evaluated classes were greater than 0.8, which indicates that the model has good precision and sensitivity in all evaluated classes, as shown in Table 1S of Supplementary Materials. F-Score values great-er than 0.77 can be considered high for the different classes analyzed, indicating that the discrimination between classes was carried out satisfactorily by the optical sensors used in the study [90]. 8- Concerning the comment ‘This work absolutely needs… are highly essential’, I couldn’t handle your response. You mentioned that uncertainty was not the objective but just performance and then you have done uncertainty with MC!!! One of the main reasons for using the accuracy performance is to show the precision and reliability of the model and prove how it can decrease the uncertainty in prediction. The core of this comment strongly is assigned to use of AI-based uncertainty approaches for AI models like you have here while MC can be carried out for any other models. If, we consider your response then the problem of computation expense and time then will be bolded. You must open windows for the readers to know about the possibility of new approaches in their field not just repetition of know materials. Hope that you get my point. Answer) Thank you for your valuable comment. I understand the importance of considering uncertainty when evaluating AI model prediction. Although our work originally focused on the F-score and the Matthews correlation, which are two performance metrics commonly used to assess how well a model performs classification, we recognized that incorporating an uncertainty measure can provide a more complete picture of confidence. of the forecast. Thus, in a future work we intend to carry out an additional analysis to evaluate the uncertainty in the datasets and predicted values to provide an indication of the reliability associated with a forecast. 9- Dear colleagues, when I asked, ‘Without any doubt … presented method’, it is expected to see the model performance for ground truth data in compare with the prediction. It is expected to see error analysis, sensitivity, evidential analysis, comparing with other scholars, … limitations of presented model??? are you able to prove the convergence or stability of that???? Any explicit discussion to illustrate the limitations, pitfalls and practical difficulties of applied models under certainty???? The work lacks for a prior impact assessment and cost-benefit analysis and did not anticipate solutions for the possible consequences in advance. Model calibration must be carried out through the assigned weight database. Answer) Thank you for the reviewer's comment. To evaluate the performance of the Random Forest model applied to the LULC classification, the data were divided into training sets, a proportion of 70% was randomly allocated to the training samples, while 30% was allocated to the validation samples. From the trained model, predictions are made on the test set and the performance is evaluated by comparing the model's predictions with the true values of the test set using evaluation metrics such as accuracy, precision, recall and F-score. For LULC classification problems with unbalanced classes, the F-score is an especially useful metric as it balances precision and recall. Subsequently, a confusion matrix was created to visualize the performance of the model in each class. The confusion matrix shows the number of true positives, false positives, true negatives, and false negatives for each class. This helps to identify areas where the model is performing well and areas where there may be room for improvement (found in the methodology in the supplementary material). With regard to model convergence and stability, we understand that when the validation period metrics show good , these points are met. The limitations of the model are related to the type and amount of input data. If the input data comprises a high range the model is more ‘universal’. 10- ‘4. As mentioned in #3,…’, provide one or two descriptive statement in the context. Answer) We appreciate the feedback and the opportunity to provide a descriptive statement in the context of our work. In this study, we faced limitations in using the Google Earth Engine (GEE) framework to deal with code complexity and parallelism. To address these limitations and improve the efficiency and effectiveness of our work, we propose the following strategies:a) Decomposing the code into smaller, parallel tasks, allowing the simultaneous execution of smaller, more manageable tasks, simplifying error detection and improving general processing efficiency; b) Use of the ee.Reducer library to favor large-scale reduction operations, such as sum, mean and standard deviation, facilitating the implementation of complex calculations in large-scale images; c)Implementation of the "filter-map-reduce" function, by applying filter, mapping and reduction functions sequentially to the data, we can reduce the code complexity and take advantage of the parallelism offered by the GEE, resulting in a more efficient data processing. these strategies, we believe we are able to overcome the limitations encountered when using the GHG framework, thus improving the quality and applicability of our study results. 11- In the case of ‘5. From coding point of view…’, something like #4 is required. Answer) Thank you for raising this concern regarding the limitations of the GHG framework for solving remote sensing problems. While it is true that the map and reduce approach may not be suitable for all cases, there are several strategies that can be adopted to overcome these limitations. Combining GEE with other libraries and tools like GDAL, using GEE's Python API to address limitations with JavaScript implementation, implementing custom algorithms outside of GEE to address specific remote sensing issues and then integrating them into the framework as needed, among others .In our work, we employed a combination of these strategies to deal with the limitations of the GHG framework as also presented in the item above. 12- OBS1: Despite of revised English, it is recommended to use third passive voice and avoid using ‘we’, our’, … Answer) Thanks for the feedback and the recommendation. I understand that, when writing technical and academic texts, it is recommended to use a formal and impartial language, avoiding the use of the first person plural, such as "we" and "our", and using the third passive voice. We performed the adaptation suggested in the text. 13- OBS2: You are recommended to condense and truncate the conclusion to show the bolded summary of your work. Answer) Thank you for the feedback and for the suggestion to condense and truncate the conclusion of the work. However, the large volume of work done and the corresponding outcomes hamper to shorten the conclusions as requested. We hope you understand that. Thanks again for the feedback and the opportunity to improve the quality of the work. 14- OBS3: concerning the data availability and supplementary material, Answer) Thank you for your observation about the availability of data and supplementary material in relation to the work. It is important to ensure that the data used in the work are available to other researchers so that they can reproduce and expand the results. In our case, we are committed to making the data and complementary materials available and accessible to all interested parties. The data used in our work will be made available through a public repository, and the link will be included in the work for reference. 15- OBS4: Still have problem to access the code, where the link doesn’t work with any of the used accounts. Answer) Thank you for your inquiry. It is important to note that the successful execution of a script written in Google Earth Engine (GEE) depends on the satisfaction of several technical criteria. Among the main ones, authentication and authorization on the platform stand out, which require a valid Google account and the necessary permissions to access the data and resources available on GEE. After carrying out tests to verify the functioning of the links provided, we found that they are operating correctly. This way, we believe it is possible to access and execute the script properly.

Author Response File: Author Response.pdf

Reviewer 4 Report

The authors have well addressed the comments and incorporated the suggested changes. The article has been much improved, and is now acceptable for publication after a thorough and careful grammatical check.

Author Response

Many thanks for the positive appreciation of our study. We thoroughly revised the grammer using a native speaker and now the text is likely free of grammer or syntax errors

Round 3

Reviewer 1 Report

Thanks for your responses.

Article Menu

The Accuracy of Land Use and Cover Mapping across Time in Environmental Disaster Zones: The Case of the B1 Tailings Dam Rupture in Brumadinho, Brazil

Further Information

Guidelines

MDPI Initiatives

Follow MDPI