Next Article in Journal
A Recursive Hull and Signal-Based Building Footprint Generation from Airborne LiDAR Data
Previous Article in Journal
A Registration-Error-Resistant Swath Reconstruction Method of ZY1-02D Satellite Hyperspectral Data Using SRE-ResNet
 
 
Article
Peer-Review Record

A Prediction Model for the Outbreak Date of Spring Pollen Allergy in Beijing Based on Satellite-Derived Phenological Characteristics of Vegetation Greenness

Remote Sens. 2022, 14(22), 5891; https://doi.org/10.3390/rs14225891
by Xinyi Yang 1,2, Wenquan Zhu 1,2,* and Cenliang Zhao 1,2
Reviewer 1:
Reviewer 2: Anonymous
Remote Sens. 2022, 14(22), 5891; https://doi.org/10.3390/rs14225891
Submission received: 7 September 2022 / Revised: 27 October 2022 / Accepted: 16 November 2022 / Published: 21 November 2022

Round 1

Reviewer 1 Report

 

 

The paper untitled "A prediction model for the outbreak date of spring pollen allergy in Beijing based on satellite-derived phenological characteristics of vegetation greenness" aims to develop a method to predict in advance the period of pollen allergy in Beijing based on remote sensing of vegetation indices over forest around the city.

Thee idea of the paper is interesting (in particular the use the social networks to determine the date for the beginning of the allergenic period), and the results, if we look the the RMSE, seems surprisingly good for detecting the outbreak of pollen allergy in Beijing considering that is based only from monitoring of vegetation indexes.

However my main concern it that the description of the method is very hard to understand and then it is in fact very difficult to estimate if it can really works. For this reason the paper should be revised before considering possible publication

 

From figure 2 we understand that method aims to determine a threshold Vegetation index (VI) value that correspond to the VI at the mean day of the observed outbreak on a linear of cumulative linear function fitted from the multiyear average daily VI.

The equation (6) and (7) give then the fitted functions on the multi year average from which the threshold value of VI threshold corresponding to the outbreak is estimate. Which is totally unclear is how the method is applied each year.

First of all in explanation of equation (6) and (7) , it is stated that Y1/Y2 is the daily value of EVI2. Unfortunately we don’t know if daily value means raw EVI2, EVI2 of the smoothed curve or EVI2 estimated from the linear fit ?

Then looking to figure 2d we understand that each year a linear curve (i.e f(t) of eq 3) is fitted on the actual smoothed VI time series of the year . Then the actual estimated outbreak date for a given year is the date when the linear fit of the year reach the VI threshold.

In this case there is a major issue with the method as fitting the curve require to have the smother VI curve during the whole period and then cannot be used to forecast the outbreak date.

On line 287 to 291 it is explained that “to test the performance of each prediction model,

the vegetation index on the 50th day of each year during 2011-2021 (the DOY of the 20

days before the earliest outbreak date of spring pollen allergy in these 11 years) was put

into every prediction model to calculate the countdown to the outbreak date of spring

pollen allergy in Beijing, and then the predicting outbreak date of pollen allergy for each

year in Beijing was determined.” So which models, those defined on equation (6) and (7) or models fitted from each year ?. Does it means that only the VI value at day 50 is used to calculate W from eq (5) and then determine the outbreak date as 50-W ? In this case even if Y is the value from the smoothed curve it should be very sensitive to clouds and more generally conditions just before day 50 ? So all the methodological aspects should be clarified.

 

I have also some more specific remarks:

For calculation of correlation coefficients and RMSE, the parameters of eq 6 and 7 are based on the multiyear average that it is influenced by each year. So to have a better evaluation of the real performance of the model it would be important for instance to do to fit 11 time excluding one year and then estimate the outbreak date of the excluded year. You could then evaluate both the sensitivity of fitted parameters which would allow to determine the robustness of the fit and calculate the RMSE considering the comparison between observed and simulated outbreak date of each excluded date which would give a better estimate of the predictive capacity of the model. The table 3 present the correlation coefficient for each vegetation types but not the RMSE. RMSE is presented in figure 4. We can guess from the rest of the text that is is for evergreen forest but it is not fully clear . Even if evergreen forest give the best result, it would be important to present the results from deciduous forest

I don't fully understand the objective of calculating the sensitivity coefficient calculated in (2). Indeed it reflect how much the vegetation index increase during the period from 20 days before earliest date of allergy to the mean day of allergy. It is just related to the dynamics of the VI chosen and did not means than a more dynamic VI can better predict the allergy date. What's really matter is the correlation coefficient and the RMSE. Also looking in table 3 it is the NDVI which gives the highest coefficient, so why the EVI2 was chosen in 3.2 ?

 

In the discussion is interesting indeed to notice that the threshold of 0.136 for outbreak also correspond to the beginning of growing season. As obviously allergy is related the flowering it should means that flowering arrive very soon after need leave flush. Is it really the case ?

Looking to figure 5 we see a relatively large amplitude of the EVI2 whereas it is evergreen vegetation which is supposed to have a relatively small amplitude. Obviously this depend of the vegetation but then it would be interesting to also have the EVI2 of deciduous forest for comparison as we can expect a larger variation.

 

In the discussion (line 380 to 384) it is stated that previous models only predict vegetation characteristics related to pollen allergies and do not directly predict the prevalence of pollen allergies. But in fact this is also the case of the model presented as NDVI only gives information about vegetation characteristic and not prevalence of allergy. Also the model allow the determine the beginning of the allergy pollen not the prevalence of the allergy.

 

The method used to define the beginning of allergy outbreaks based on Weibo data is very interesting ! Indeed it is very difficult to have direct medical data to be able to follow allergy and then using social networks seems to be a very good alternative. Especially it is probably difficult to use it the define the severity of the allergy season but to define the timing of the season is probably more robust. As it is shown in figure A1 there was a large change in the number of data after 2017. As what is extracted is the date of the records it is probably, as mentioned not to sensitive to the number of record. Unfortunately the scale of the figure not allow the determine the effective number of valid microblogs data before 2017 but to have a statistical significance it should not be too small, so what is the number of valid data before 2017 ?

Also at is is a new method to be should it is really a good indicator of the beginning of the allergy season it would be important to compare the estimated outbreaks with other source of medical data. I acknowledge that is probably not an easy task but there is probably some papers published about allergy in medical literature that should allow the check is estimated outbreak is coherent with actual period of allergy ?

 

 

On minor comments, there is a lot of typo and errors in the text that should be corrected I cannot list all of them but here are few examples:

l 275 Y is not the multiyear average daily fit curve (which is a function) is the value of the fit curve at instant t

figure 3 (a): the red curve is named "logistic fitted curve" but it is clearly not a logistic function so it is linear fitted curve I guess?

 

On figure 5 the threshold of 0.136 in not place at the right level on the vertical axis (below the 0.13 mark)

 

 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Manuscript “A prediction model for the outbreak date of spring pollen allergy in Beijing based on satellite-derived phenological characteristics of vegetation greenness” presents an interesting approach to predict the time of spring allergy outbreaks based on satellite-derived vegetation indices. Authors justify this approach as the development of plants, in this study described by five different Vegetation Indices data series, determines the time of pollen release which is reflected in microblogs data when inhabitants inform on their symptoms and when they started. In that way, authors skip the most important part in pollen allergy forecasts, but also the most difficult to measure: pollen concentrations. The proposed approach reduces error regarding with pollen measurements, as frequently measurements are performed only in a few sites but the real pollen concentrations markedly differ between sites even within the same city. This is interesting and clear approach, therefore I’ll give the chance for the authors to improved their manuscript. As I have many different doubts that I listed below, I recommend major revisions. The list of comments below.

First doubt is why to predict pollen allergy events as there are many people who suffer from pollen of different tree species. There is no information in the model which pollen type is considered. People could be sensitized to different tree pollen types and not necessarily to all of them at once. Also, there is different level of symptom severity in the case of different  pollen types. This issues should be much more elaborated on.   

Also, the use of only forest area, the many urban trees that for sure also contribute to spring pollen allergy. Moreover urban trees may have larger influence on allergy symptoms than those in forests because I suspect that mainly the city inhabitants post information on pollen allergy symptoms. In that case independent variables (EVI) are very poorly connected with the dependent variable (allergy symptoms). How the authors account for that? This must be clarified.

Moreover, I do not understand the Figure 2 especially part c, d. Why to show the average date of pollen allergy outbreak on the background of average multi-year satellite-derived VI curve? The authors aim was to predict pollen allergy outbreak each year and not average. So I think that showing one year as an example in this Figure would give much more reliable assessment for the reader how the procedure really looked like. Average pattern is always clear and smoothed but more informative would be to include how the procedure was performed on individual years data.

Furthermore, too little information is given how fitting algorithms were applied. There is no information on the VI curves pattern in individual years and what type of smoothing was applied and how. What was the quality of time series, where the data were lacking due to cloud cover which caused inhomogeneity of data series from year to year. How comparable are the VI patterns between different years? All these issues must be definitely explained in details.

Another issue is could you put all the years into the same analysis? Years 2011-2017 very low number of observations in dependent variable, and then 2018-2021 very high number of observations. I suspect that older data are not representative for the whole city so centering the allergy outbreak dates given in the older dataset will not be comparable with newer data starting with 2018 and I have doubts if this could be included in the same outcome.  The detailed explanation should be included in the corrected version of the manuscript.

Land cover map low resolution and based only on newest data. Ten years ago it may look differently for example there could be more areas covered by forests. How do you account for changing land use during the study period?

Also the selection of Vegetations Indices. Please, justify the selection, why these Vis were selected and not others such as SAVI, NDRE or others?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The response given by the authors removed the doubts I had about the method . So Now I am convinced by the approach. The cross-evaluation done now which evaluate results on years not used in calibration give clearly more robustness and demonstrate that method gives surprisingly very good results considering the simplicity of the method ! In particular it is interesting to note as we could expected that the approach based on direct EVI2, as it is based only on value of the DOY 50 does not gives good results where method with cumulative EVI2 on the opposite works well !! So I am now convinced that paper should be published. However methodological description is still unnecessarily complicated making it very hard to understand. Also there is still a lot of error and typo in the text and figures (see hereafter in detail). So eve if some effort have been done to improve the text, it is not sufficient to let the paper publisher in its current form. Is the reason why I still quoted major revision even if I think it is mainly efforts to do in the form of the text not is scientific contents.

for instance on 2.3.2 there is a very complex description of the forecast model making which is very hard to understand. It is just necessary to explain that prediction model aims to determine at day 50th (I.e 20 days before the earliest date of allergy outbreak). If m and n are the coefficient of the linear fit of equation (1), then the estimated date of the actual date of allergy outbreak for year y W is define as W=Y/n-m-50, Y been the EVI2 value on the smoothed or cumulative smoothed curve at DOY 50 ! This is just an example but it is true for more of the methodological description that should be simplified

 

others more specific comments:

 

- l 63: it is important here to indicate that what is said before is only true for species where budburst and flowering and concomitant. I would not be the case for species where flowering can appear several weeks after leaves

- On the new table 5 why do you compare NDVI of evergreen with EVI2 of deciduous ? You already explained before that EVI2 gives the best results. Then you should compare EVI2 for both.

 

for specific points related to response to my review

 

-point 6: Obviously I understood that estimation of sensitivity and correlation was used to screen the vegetation feature. But here again I think that sensitivity coefficient that represent the dynamic of the signal is not a good criteria to choose the best VI to select. Having a higher dynamics does not means that it will better estimate the outbreak date. The only criteria that should be considered is the indice that gives the lower RMSE when comparing estimated and actual date of outbreaks

 

-point 9: the allergy prevalence means the fraction of the population which is affected by allergy. So I do not agree on the fact that method allow the estimate the allergy prevalence, it allow to determine the date when people will be affected (which is already something very important for sensitized people) but not the number of people not the intensity of the allergy that cannot be deduced just from determination of phenological parameters!

Which is true, however is tthat, compared to method estimating only flowering date it that it can takes account for hidden effects (mean climate of other factors) that can delay the symptoms froù the beginning of the flowering period….

 

- Point 11: Ok the data from table A2 allow to see that the period of allergy determined by Weibo data is coherent with medical data. But Medical data is only given at monthly time step and on average ! So it doesn’t help to see if estimated Weibo interannual variation of outbreak allergy date is coherent with medical data. For instance when Weibo data determine an early (resp. late) date, does we find the same information reported from medical data ?

 

Here a some minor comments or typo and error in the text or figures:

 

 

- Figure 2 legend: the DOY corersponding to VI is named tn in the legend and ta in the figure

- Figure 3: the reported value of y (about 8.5) in figure (d) is not coherent with y estimated from figure (c) (around 6.8) ?

-Figure 3 (c) : Do not report here the linear fitted curve based on multiyear average (same as on fig (a) I guess) it induce confusion on the fact that what is considered here is only the cumulative curve that allow the report the y value in fig 3 (d)

- Figure 4 (a): it is strange, the linear fitted curve doesn’t seems to be the fit of the smoothed curve ?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear Authors,

Although you changed the manuscript very little, I can accept your explanations why you have not changed several issues because the explanations are clear. Apart from it I think that Figure 3 b and 3 d do not help the reader, so probably better if you do not include these charts with a straight line only, this does not contribute any new information. Also the zeros in coordinates in the map (Figure 1) should be removed.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop