Next Article in Journal
Uncertainties in Prediction of Streamflows Using SWAT Model—Role of Remote Sensing and Precipitation Sources
Next Article in Special Issue
Using Hyperspectral Remote Sensing to Monitor Water Quality in Drinking Water Reservoirs
Previous Article in Journal
Mapping Floristic Composition Using Sentinel-2A and a Case Study Evaluation of Its Application in Elephant Movement Ecology in Sagalla, Kenya
Previous Article in Special Issue
Can Mangrove Silviculture Be Carbon Neutral?
 
 
Technical Note
Peer-Review Record

The Multi-Satellite Environmental and Socioeconomic Predictors of Vector-Borne Diseases in African Cities: Malaria as an Example

Remote Sens. 2022, 14(21), 5381; https://doi.org/10.3390/rs14215381
by Camille Morlighem 1,2,*, Celia Chaiban 1,2, Stefanos Georganos 3,4, Oscar Brousse 5,6, Jonas Van de Walle 5, Nicole P. M. van Lipzig 5, Eléonore Wolff 4, Sébastien Dujardin 1,2 and Catherine Linard 1,2,7
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Remote Sens. 2022, 14(21), 5381; https://doi.org/10.3390/rs14215381
Submission received: 19 September 2022 / Revised: 13 October 2022 / Accepted: 21 October 2022 / Published: 27 October 2022
(This article belongs to the Special Issue Innovative Belgian Earth Observation Research for the Environment)

Round 1

Reviewer 1 Report (Previous Reviewer 1)

High spatial resolution is great but low accuracy in prediction is problematic in real applications. The authors may need to discuss how their data products with high uncertainty could be used for making reliable decision in disease control. 

Author Response

Reviewer comment: High spatial resolution is great but low accuracy in prediction is problematic in real applications. The authors may need to discuss how their data products with high uncertainty could be used for making reliable decision in disease control. 

Author response: We thank you for raising this good point. As you suggested, we further discussed it in the revised version of the manuscript by adding the following sentences in the discussion: “At the moment, predictive maps such as the ones created in this study cannot directly be used to target malaria control interventions given their low accuracy, but they could be used in combination with other decision-making tools and local expert knowledge on the field as they already provide insights into where high-risk areas tend to be located.”. We thank you again for your thorough review of our manuscript.

Reviewer 2 Report (Previous Reviewer 2)

I appreciate the thorough response to reviewers. The only suggestion I have for the authors is to include the information: "We thank you for this comment. Each set of covariates (LULC, LCZ, CCLM) was produced in the frame of the REACT project (Remote Sensing for Epidemiology in African Cities) with the goal of improved accuracy and improved spatial resolution to serve for intra-urban epidemiological applications, in comparison to existing products already available. As an example, using Google Earth Engine and further processing, the LULC covariates in Kampala have a spatial resolution of 0.5 m (LC) and 20 m (LU) with an accuracy of 86% (LC) and 81% (LU) (Grippa et al., 2017 [37], Grippa et al., 2018 [38]). In comparison, the Copernicus Global Land Cover maps have a spatial resolution of 100 m with 80% accuracy on average (Buchhorn et al., 2021), which suits better large-scale mapping applications. As the production of each set of covariates was in itself a different topic, leading each to their own paper, we don’t describe in details here how they were produced."

The information included in this reviewer response was very interesting to me, and served as a clear justification for their choice of data sources and covariates. 

Author Response

Reviewer comment: I appreciate the thorough response to reviewers. The only suggestion I have for the authors is to include the information: "We thank you for this comment. Each set of covariates (LULC, LCZ, CCLM) was produced in the frame of the REACT project (Remote Sensing for Epidemiology in African Cities) with the goal of improved accuracy and improved spatial resolution to serve for intra-urban epidemiological applications, in comparison to existing products already available. As an example, using Google Earth Engine and further processing, the LULC covariates in Kampala have a spatial resolution of 0.5 m (LC) and 20 m (LU) with an accuracy of 86% (LC) and 81% (LU) (Grippa et al., 2017 [37], Grippa et al., 2018 [38]). In comparison, the Copernicus Global Land Cover maps have a spatial resolution of 100 m with 80% accuracy on average (Buchhorn et al., 2021), which suits better large-scale mapping applications. As the production of each set of covariates was in itself a different topic, leading each to their own paper, we don’t describe in details here how they were produced."

The information included in this reviewer response was very interesting to me, and served as a clear justification for their choice of data sources and covariates.

Author response: We thank you for this suggestion, and we are pleased to read that you have appreciated our response to the reviewers. As you suggested, we have added this information about the choice of data sources and covariates in the revised version of the manuscript in section “2.1.2 Predictor data”. We thank you again for your thorough review of our manuscript.

Reviewer 3 Report (Previous Reviewer 3)

The manuscript has been much improved during revision and my original concerns have been sufficiently addressed..

Author Response

Reviewer comment: The manuscript has been much improved during revision and my original concerns have been sufficiently addressed.

Author response: We are pleased to read that we have addressed your concerns and that you find the manuscript much improved. Thank you again for your thorough review of our manuscript.

Reviewer 4 Report (Previous Reviewer 4)

I feel that the authors have adequately addressed my comments.

Author Response

Reviewer comment: I feel that the authors have adequately addressed my comments.

Author response: We are pleased to read that we have adequately addressed your concerns, and we thank you again for your thorough review of our manuscript.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

In this research, the authors used high resolution satellite products to predict 1km malaria prevalence in two African cities. From conceptual and methodological perspectives, I did not see much novelty to be published. The followings are my major concerns: 

1. Using satellite images to predict malaria is not new, so how this study add value to the literature? A sizable discussion to identify knowledge gap is needed in the introduction to justify this study. For example, are there new datasets not being used before but may significantly help model prediction?

2. The authors attempted to use multiple high resolution satellite sensor datasets, but I wonder how much improvement can be achieved as compared to existing studies, such as the Malaria Atlas Project, the work by Kabaria 2016, etc. 

3. The authors mentioned socio-economic predictors in the title, but in their models, most predictors are predominantly natural and built environment variables. I did not see true socio-economic factors being included in the model, such as housing quality, income, education, and mobility. 

4. The authors' conclusion is a bit self-contradictory. It seems that the refining satellite products' resolution might not be the right way to improve modeling accuracy, which devalued their research. For intra-urban scale modeling, I think more detailed census data and GPS tracking data can greatly improve the predicting power, as there are already several studies doing this now.  

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Introduction:

- The introduction is very well written and very clear. 

- Line 67-69: I would like a little bit more information on why these variables have not yet been included in analysis, especially as the introduction outlines that they are readily available. Is is because of advancing analytical approaches that allow for wide datasets and collinear variables (i.e., random forest)? Is is because the remote sensing data is now being produced on temporal and spatial scales useful for these questions / aims?

Methods: 

Line 83-85: What is the spatial scale of the malaria prevalence data? Is the prevalence data collected within a 1k grid cell? 

- Line 104-105: Please describe what is means for the urban scheme to be activate.

- Line 107-110: What is is a temperature suitability index for? Malaria transmission? 

- Line 167: The recursive feature selection is not clear to me. Was feature selection conducted within each group of variables (e.g., climate?)? Was this to choose the variables that go in the final five models?

Results:

- Lines 191-201: It would help to remind the reader in the text that you are presenting the average OOB error across 50 iterations. It would also help to make the full comparison between models by presenting the confidence intervals in the text and Table 2.

Discussion

Line 257: I am having a hard time understanding what the state-of-art techniques are, which relates to the depth in which everything was explained in the results. Why was LULC data pulled from Google Earth Engine and then modified in certain aways, as opposed to just directly aggregating and downloading the Copernicus Global Land Cover data (or a similar LULC map that has already been processed)? 

Line 298: Elevation is stationary so shouldn't need to be temporally aggregated?

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Reviewer Comments

Journal: Remote Sensing

Manuscript ID: remotesensing-1807833

Manuscript Title: “Multi-satellite environmental and socio-economic predictors of vector-borne diseases in African cities: malaria as an example” 

 

General Comments: The authors leveraged remotely sensed climate, landcover, and other environmental data to model malaria risk in Uganda and Tanzania.  The study incorporated parasite prevalence estimates to train RandomForest models.  Overall, the manuscript is well written, included a detailed description of the analyses, and will be of interest to readers. 

 

The manuscript is exceptional on two fronts.  First, the combination of a detailed methods section and example GitHub code ensured that the technical aspects of the analyses were clear and easily interpretable.  Second, the authors were careful in articulating differences between causal analysis and prediction, as well as, how these two research objectives potentially impact results interpretation.  Well done!

 

Although the manuscript is of high quality, I do have a couple concerns and suggestions for author consideration:

 

Concerns:

1.     The title suggest that socioeconomic predictors are used in the analysis, but it is unclear which model inputs the authors consider as related to social factors. Line 64 of the manuscript describes socioeconomic as relating to housing quality, mobility, education, and human behavior, however these terms are not used again in the manuscript and do not appear in Table 1, which lists predictor variables.  Please clarify what socioeconomic variables are used in the analysis.

 

2.     Related to item 1 above, I encourage the authors to include some measure of population density in the analysis (human factor).  Although the observation data representing parasite prevalence is standardized to account for per capita rates and landcover captures some of these trends, it is also likely the population density influences testing and reporting of parasite detections and therefore potentially biases prevalence estimates.  It is also the case that human populations are distributed differently in Dar es Salaam than in Kampala.  On a related note, I was somewhat surprised that Fig3 maps show highest risk near the center (most populated) portion of Dar es Salaam, whereas risk in Kampala showed hotspot areas of elevated prevalence around peripheral portions of the city (least populated).  The pattern in Kampala is what I’d expect, elevated risk near the junction of residential and natural areas (vector habitat).  Inclusion of human population density, may change the pattern in Dar es Salaam to be more dispersed from the City’s geographic center.  If analyses are not rerun to include additional variables, I suggest the authors discuss the issue somewhere in the manuscript.

 

3.     As a technical note, several of the citations in the manuscript are showing as “Error! Reference source not found”.  These will need to be corrected.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

This is an interesting and overall well executed paper. However, the are two methodological points which need to be addressed, as well as a number of smaller changes as detailed below.

 

The authors chose to use random forest, which is a versatile method able to produce many interesting insights into the phenomenon studied. However, it is arguably less interpretable as compared to linear regression where one can deduce something along the lines of "each additional unit of input is associated with an average y units increase in output. I am not saying the authors need to spend too much time on defending their choice (I believe RF is as valid as any other method here) but they do need to mention these issues.

 

I find the models well executed and the results well presented. However, I am not convinced by their use of Wilcoxon test.

 

It is unclear how Wilcoxon pairwise test was used to "compare models". The authors need to be more explicit as to what quantities were actually compared? My understanding of the situation (which may not be correct since it is not described well) is that the authors compare the model metrics corresponding to the cross-validation folds. In general, the power of the test depends on the sample size. I.e., in general the greater the sample the smaller will the p-value be. Since you can artificially make your sample as large as you want (going for 10-fold, 20-fold, 50-fold etc. 1-out cross-validation), I do not believe significance testing is appropriate for this setup. In any case, I do not think using Wilcoxon test to compare RFs is a standard procedure, therefore this warrants more careful explanation, and a reference to validate this approach. I am also not sure whether significance testing in this case tells me anything in terms of practical interpretation. I do not believe in testing just for the sake of obtaining a few (often spurious) p-values.

 

I am not sure how "normalisation" facilitates visual interpretation. The results are all on the same scale (original units of the response), so why would they not be comparable. Moreover, since the Wilcoxon test was done on the unnormalised values, the letters on the plot are somewhat confusing since they contradict what can be seen in normalised values.

 

In the discussion the authors mention that due to high correlation between covariates, the main predictors may vary between both cities. This may also mean that the variable importance in RF is spread out over several correlated variables. Preliminary PCA might be one way to address it. Another is to decide to use the cheaper or more easily available covariate of the possible set of highly correlated ones. This is something worth expanding upon in the discussion.

 

Some smaller suggestions are listed below:

 

Line 35. ...help identify... and predict

Line 57 consider using "in addition to" instead of besides.

Lines 74-76 ... aim at "comparing... and evaluating".

Line 82 "national surveys AND health surveys".

Line 86: "This metric is age-standardised over a children age range..." is unclear.

 

Lines 88-91: it is unclear why data points were subselected. It is also unclear why only surveys for the 2005-2016 period were selected.

 

Lines 101 and 110: there are compilation errors in reference soures.

 

The readability of Figure 1 must be improved. Firstly, I do not believe that ordering the violin plots by the median is helpful. It is better to keep them ordered by the model, makes it easier to read across the graphs. Also, adding some fill to both, the boxplot and the violin plot( perhaps, darker and lighter gray) will probably make the graphs easier on the eye.

 

Figure 3. It is good practice to have the north arrow and scale on geographic maps. Also, please superimpose administrative boundaries on the raster map and make them more visible (thicker?) The coverage of map (a) appears to be quite different from that of map (b) while maps (c) and (d) appear to cover the same territory. Why is that?

Author Response

Please see the attachment. 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Using satellite images only to predict intra-urban malaria risk is a great idea. But given the lower R2 (around 0.3) it can achieve as compared to other studies at the same scale, it means the proposed method has some major limitations to approach complex urban environment. I doubt its wide application by researchers.  

Back to TopTop