Next Article in Journal
The Trend Inconsistency between Land Surface Temperature and Near Surface Air Temperature in Assessing Urban Heat Island Effects
Previous Article in Journal
An Introduction to the Geostationary-NASA Earth Exchange (GeoNEX) Products: 1. Top-of-Atmosphere Reflectance and Brightness Temperature
 
 
Article
Peer-Review Record

Improved Inference and Prediction for Imbalanced Binary Big Data Using Case-Control Sampling: A Case Study on Deforestation in the Amazon Region

Remote Sens. 2020, 12(8), 1268; https://doi.org/10.3390/rs12081268
by Denis Valle 1,*, Jacy Hyde 1, Matthew Marsik 2 and Stephen Perz 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2020, 12(8), 1268; https://doi.org/10.3390/rs12081268
Submission received: 10 March 2020 / Revised: 5 April 2020 / Accepted: 14 April 2020 / Published: 17 April 2020
(This article belongs to the Section Forest Remote Sensing)

Round 1

Reviewer 1 Report

The authors have presented a novel approach to infer and predict deforestation using big data in the Amazon forest. I am happy to read this article which is very written with brevity and clarity. Better performance of the case-control approach is understandable and this will help to reduce the uncertainty induced by the random approach.  I have a minor comment:

 - It would be interesting to see a risk prediction map of the study area, if possible. 

 

Best Wishes. 

Author Response

We appreciate the comments from reviewer 1.

In relation to the minor comment of displaying a prediction map of the study area, we have included an additional figure with the predictions of the random forest model. We believe that this figure has helped to improve our article and we thank the reviewer for this suggestion.

Reviewer 2 Report

In the manuscript “Improved inference and prediction for imbalanced binary big data using case-control sampling: a case study on deforestation in the Amazon region” authors outline an accurate big data analysis method(s) as a case study in the Amazon region using time-series Landsat data. It seems that authors consider big data accounting the large number of pixels in a scene. However, in urban growth modeling it consider large regional data. Therefore, the term big-data seems a bit out for the manuscript.

Below I have provided comments by sections.

Abstract

Keywords: satellite imagery seems a broad term, ‘landsat data’ may be relevant

  1. Introduction

It is good to give brief background information on deforestation states in the Amazon region

L 38: CO2

L 41-42 and satellite imagery….. Very general statement all we know that satellite imagery consists of millions of pixels depending on spatial resolution. All other listed item above has applications given by reference except this. To make consistency, include facts with reference.

L 42-43: dilute sentence delete

L 45: urban growth simulation

Pijanowski, B. C., Tayyebi, A., Doucette, J., Pekin, B. K., Braun, D., & Plourde, J. (2014). A big data urban growth simulation at a national scale: configuring the GIS and neural network based land transformation model to run in a high performance computing (HPC) environment. Environmental Modelling & Software, 51, 250-268.

L72: Space[31,but

  1. Materials and Methods

L86-97: It seems that the case-control sampling method is new to remote sensing applications in forestry, which is widely used for epidemiological studies. Then how the model assumptions affect the current study. How outlier pixels effect for the model.

L116: indicates

Fig. 2. You do not need to write the word "Legend" in the legend box.

Fig. 2. Give insert locator map for the first image (1986)

Some area seems cloud cover. Is that effect for you model?

Make transparency on legend and scale bar areas to see the ground

What’s the extend of the study area.

Table 1. Is this the correct format of table?

L 262: Sub heading too long

L 298 : 2000 is year correct? If so ‘2000 predictor variables’ read confusing. Suggest revising the sentence. Similarly make the changes for L 296

Fig. 3. Add red line to a legend item?

 

 

 

Author Response

Please see the attached document

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper shows as a case-control (CC) sampling approach,  in which all pixels with the outcome of interest and a subset of the pixels without this outcome are selected, can yield much better inference and prediction than random sampling in the case that the estimated parameters and probabilities are calibrate in suitable way. A case study in the Amazon region focused on deforestation is considered to describe the higher level of performance that the case-control sampling approach can introduce.

 

Major comments

  1. In the paper the following sentence is reported (raws 248-249)

“The forest/non-forest land cover classification analyzed here is based on Landsat imagery and is

described in detail in [19].”

I suggest to explain in detail in this paper the dataset and the methodological aspects concerning classification. The reference is not  enough and the authors must facilitate the reader to understand what has been done.

  1. The formulas are images: the resolution is poor, at least so it happens in the pdf version that I have analyzed.
  2. In table 1 you report the proportion of deforested pixels for each year and road segment while in the text you report the percentage. I suggest to use in both cases the same unit, i.e. percentage also in the table.
  3. Two maps are reported in Figure 2, but no information is supplied about their geographic/cartographic coordinates: please specify in the text datum (WGS84?), cartographic projection (UTM?) and coordinate system (plane or geographic?) and report geographic or cartographic grid also on these images.
  4. I suggest to better explain in the conclusions the suitability of the proposed approach to other applications.
  5. Moreover, you should improve the manuscript widening the bibliography. Other references may be included, i.e. on the topics that are probably not sufficiently known to the regular reader of a remote sensing journal.

Author Response

Please see the attached document

Author Response File: Author Response.pdf

Reviewer 4 Report

1. The case-control (CC) approach used in this manuscript is more effective for imbalanced data than random sampling. However, CC is a representative sampling technique already proposed by Ref. [33]. If so, it is not clear what the originality of this paper is. The advantage is not revealed except that it is applied to remote sensing dataset.

2. Fig. In 2, the legend does not specify which class the white area represents. In addition, Landsat images were used to classify the forest and deforest area. The results and accuracy of image classification should be described.

3. The random sampling technique is the most traditional sampling technique. Therefore, it is considered that a comparative evaluation with other sampling methods should be included in order to verify the superiority of this technique. Especially, In Fig. 4, the CC method and the RS technique are considered to have similar trends. Therefore, it is considered that a comparison with other methods is necessary.

4. As a result, it can be agreed that the technique using CC shows higher accuracy than the RS technique. However, the originality is not clear in that the CC technique is an existing technique. In addition, a comparative evaluation with other sampling techniques should be included.

Author Response

Please see the attached document

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The revised MS addressed most suggested comments I pointed out, but agree with reviewer 4 concerns on sampling technique. 

Reviewer 4 Report

I recommend this manuscript for publication.

Back to TopTop