Next Article in Journal
Forest Fires: Silvicultural Prevention and Mathematical Models for Predicting Fire Propagation in Southern Italy
Next Article in Special Issue
Assessing the Effect of Community Preparedness on Property Damage Costs during Wildfires: A Case Study of Greece
Previous Article in Journal
Transferability of Empirical Models Derived from Satellite Imagery for Live Fuel Moisture Content Estimation and Fire Risk Prediction
Previous Article in Special Issue
Effects of Fuel Removal on the Flammability of Surface Fuels in Betula platyphylla in the Wildland–Urban Interface
 
 
Article
Peer-Review Record

Data-Driven PM2.5 Exposure Prediction in Wildfire-Prone Regions and Respiratory Disease Mortality Risk Assessment

by Sadegh Khanmohammadi 1,*, Mehrdad Arashpour 1, Milad Bazli 2 and Parisa Farzanehfar 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 19 June 2024 / Revised: 31 July 2024 / Accepted: 5 August 2024 / Published: 7 August 2024
(This article belongs to the Special Issue Forest Fuel Treatment and Fire Risk Assessment)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Overview of the manuscript:

This is an important and interesting work that highlights the potential effects of wildfires on the human health. Given that these wildfires are likely to increase in frequency and intensity, and possibly involve other geographical areas due to climate changes globally, the findings from this study will be very helpful in making predictions and support health system plans and readiness to handle the effects of wildfires. One key strength is the use of data from well established surveillance and data capture systems. The use of various approaches to analyse the data is also a key strength.

 

General comments

Definition of terms: It is not clear what pre- and post-exposure means in terms of time period. Air pollution has both short- and long-term impact on health, and how this was determined in this study. For example, if the wildfire occurred on 1st of June, what period before and after 1st June is considered as pre-exposure and post-exposure respectively. It is also not clear how this is related to the health data/diagnoses/healthcare visits.

Whereas COPD exacerbations are clear/well known, it is not clear what a lung cancer exacerbation means and how this was diagnosed and captured/recorded in the health data. It is recommended that the authors provide a summary on these exacerbations and mechanisms through which PM2.5 is involved.

The actual number of diagnoses/ patients (sample size) that were used in the models was not stated. It would be great to provide a summary of the basic demographics of the patients that contributed to the sample size and if all of them were from Wagga Wagga area. This would be helpful in associating the effects of the wildfires to people who were potentially exposed to the resultant air pollution.

Asthma exacerbations are missing among the health conditions that were considered. Air pollution is one of the major factors for asthma exacerbations and it is not clear why this key diagnosis is missing in this study. In the same way, there is literature linking air pollution exposure to incidence of respiratory tract infections including among children. It is not clear what guided the choice of health conditions that were considered, leaving out asthma and respiratory infections that are more common and affect all age groups.

 

Specific comments

Line 68-72: The authors state that fuel moisture, air temperature, wind speed, rainfall, solar radiation, and relative humidity were considered and that the modelling results highlight the contribution of each input variable to the PM2.5 levels and provide actionable recommendations for considering air pollution resulting from future wildfires by effectively managing pre-disaster parameters. My under-standing is that the pre-disaster parameters referred to are the weather conditions/variables. The recommendation that these can be managed and how this is possible is not clear, considering that these are naturally occurring ambient conditions that may not be manipulated. Perhaps the authors could provide more clarity on this statement.

 

Line 102-103: Please clarify on what you mean by ‘extended period’. In the methods section, the period stated is one year and it is not clear what extended period means, how long this was, and why this was done for lung cancer.

 

Line 144-145: This statement indicates that household (HAP) was also considered. Throughout the previous sections, the authors have focused on wildfires which is essentially ambient air pollution. It is not clear how HAP was introduced and measured. How was the HAP measurements obtained and linked to the health outcomes of the patients that had adverse health outcomes during the study period?

 

Study limitations

There are some potential limitations of the study that the authors could consider.

·       One common limitation with the air pollution studies is assuming that the participants/patients were indeed exposed, without personal air quality monitoring data. The use of data from routine ambient monitoring does not necessarily indicate the patients were exposed to the PM2.5 levels as stated.

·       There could be other causes of exacerbations or deaths for the study population other than air pollution but could not be controlled for given the retrospective nature of the study.  

 

 

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study utilizes machine learning algorithms to forecast PM2.5 concentrations in regions susceptible to wildfires by considering pre-wildfire variables such as weather conditions and fuel conditions. The study assesses the influence of wildfire smoke on the mortality rate of respiratory diseases, specifically chronic obstructive pulmonary disease (COPD) and lung cancer. The study demonstrates that ensemble models, particularly NGBoost, accurately forecast PM2.5 levels and their corresponding health consequences. The key predictors are solar radiation, temperature, and fuel moisture. The results provide practical and effective information for the management of wildfires and the implementation of interventions to protect public health. Before publication, it is crucial to address several issues through a manuscript.

 

Introduction

-               The introduction provides a concise overview of the machine learning models employed. Readers would benefit from a more elaborate explanation regarding selecting these particular models, such as Support Vector Regression, Multi-layer Perceptron, and tree-based ensemble algorithms, to better understand their suitability for this study.

-               The introduction should provide a more detailed explanation of the interconnections between the pre-fire and post-fire conditions and the reasons why this thorough method is essential.

Methodology

-               This section would be enhanced by the inclusion of more details regarding the specific steps involved in data preprocessing. Provide specific details on how missing data were addressed or how variables were normalized.

-               The section mentions hyperparameter tuning but lacks specifics. Including details on the hyperparameter optimization process (e.g., grid search, random search) and the ranges tested would provide a clearer picture of how model parameters were fine-tuned.

-               While the models used are described, a brief justification for choosing these specific models over others (e.g., why ensemble methods were preferred) would strengthen the rationale behind the methodology.

-               The section on removing outliers is clear, but consider providing more details on how Cook’s distance was calculated and applied.

-               Incorporate a section on sensitivity analysis to assess the robustness of the models to changes in key input parameters. This can provide insights into which variables most significantly impact model performance.

-               Several specific should be addressed following:

·      Lines 83-95: The dataset description for Wagga Wagga is detailed and relevant. Ensure consistency in referring to data sources and metrics throughout the section.

·      Lines 148-152: The explanation of SVR could be enhanced by adding a brief description of its advantages and limitations in the context of this study.

·      Lines 169-174: The explanation of ensemble methods is clear, but consider providing more context on why these methods were chosen over other potential models.

·      Lines 248-254: The use of SHAP values for model interpretation is commendable. Ensure that the explanation of SHAP is clear for readers who may not be familiar with this concept.

 

Results and discussion

-               While the impact of outliers on model performance is mentioned, it would be helpful to provide more context or examples of how specific outliers were identified and removed using Cook's distance.

-               The explanation of feature importance, particularly the use of SHAP values, is clear but could be expanded. For instance, providing specific examples of how feature importance impacts predictions in different scenarios would enhance understanding.

-               The results section could benefit from a more integrated discussion on how the predictions of PM2.5 levels directly relate to the health outcomes.

-               While the discussion touches on public health interventions, it could be expanded to provide more specific recommendations or examples of how the findings could inform public health policies and practices, particularly in wildfire-prone regions.

-               The limitations are briefly mentioned but could be elaborated. Discussing potential biases, data limitations, and the generalizability of the findings would provide a more balanced view.

-               In Figure 1,the correlation matrix is effective, but consider adding a brief interpretation of key relationships observed in the figure.

-               In Table 1, The table comparing model performances is clear, but adding a row or section summarizing the key takeaways from this comparison would be beneficial.

-               In Figure 2, the prediction performance of NGBoost across seasons is well-presented. Ensure that the legends and labels are clear and consistent to avoid any confusion.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Data-Driven PM2.5 Exposure Prediction in Wildfire-Prone Regions and Respiratory Disease Mortality Risk Assessment

 

This manuscript presents the results of a study where machine learning models combining pre-wildfire parameters [such as weather and fuel] and post-wildfire health effects were used to primarily present the relative importance of these parameters to the predictions of the PM2.5 levels in Wagga Wagga, Australia.

The work contributes to existing literature and is interesting.

In summary, it is recommended that the authors:

·        Better demonstrate the “novelty” of the work presented,

·        Correct some minor spelling/grammar issues,

·        Tidy up some equations and figures,

·        Quantify some of the qualitative assertions made in the discussions, and

·        Justify the selection of the NGBoost model.

In details, these are some of the recommendations:

1.      The authors should consider correcting some of the spelling and grammar issues in the manuscript, for example in lines 29, 111, 203, 228, 240, etc.

2.      The authors need to better demonstrate the “novelty” of this work. See lines 49, 375 to 377, 403. What exactly is novel about this work? How does it differ from previous work? How are the results materially different from results from previous work?

3.      Line 11: Equation (1) – please define the parameters/constants used in this equation.

4.      Line 154: Equation (3) should be Equation (2)? The numbering appears to be wrong here. Equation (2) is on page 5 [line 223] after Equation (8).

5.      Line 223: Equation numbering. Please see item [4.] above.

Results:

6.      Line 269: “In Fig. 1, darker points are related to fall and spring.” Should this not read “In Fig. 1, darker points are related to fall and winter.”?

7.      Lines 272 – 273: “…which fall below the daily mean threshold of 12.5 μg/m3.” This is not clear from Figure 1.

a.      Would the authors consider providing finer scales on the figures in Figure 1? For example, for PM2.5, the scale is for 0 and 500 μg/m3, can’t this be shown in gradients of 50 or 100 μg/m3?

8.      Lines 275 – 276: “However, there are a few exceptions observed during days with significant wildfires, where PM2.5 levels exceed the normal range.” Where are these few exceptions?

9.      Lines 277 – 281: “In addition, winter points have a narrow distribution with high fuel moisture (Mdl) low temperature (T) and solar radiation (S). The Mdl / solar radiation plot clusters data records based on the season, where summer and winter data records are on the two opposite sides. Summer data records demonstrate lower fuel moisture and higher solar radiation than winter, while fall and spring records exhibit intermediate values.”

a.      Please quantify these observations. Use quantitative data to qualify your qualitative observations.

10.   Figure 1: Please consider using smaller gradients/divisions on the scales on the axes of the figures (see point [7.] above).

11.   Line 307: “a few data records” Please be specific, what is a few? Can you quantify this? From figure 2 (b), it appears that there are more than a few data records with high PM2.5 associated with summer.

12.   Pages 8 and 9. Figure 2: It would be useful for the reader if Figures 2 (a) to (d) also include Spring (for (a)) to Winter (for (d)) in their captions on the graphs. As it is presented, the reader must read the details on the Figure titles.

13.   Lines 329 – 330: “Table 1 illustrates that the NGboost model has reasonable prediction performance for COPD as well as TB&L, so it is selected for the next step (Fig. 3).” Can the authors justify this selection? From Table 1, it appears that the Random Forest (RF) model is more suitable for predicting the performance of COPD and TB&L. Have the authors used the NGboost model for convenience?

14.   Page 10. Figures 3 (a) to (d): the authors should consider including “COPD” and “TB&L” in the titles of the relevant figures to avoid making the reader read the details on the figure titles when they are visually comparing the graphs.

Discussion

15.   Lines 383 to 384: “…significantly higher…”, “…a relatively similar pattern.” Please be less vague with this. Can you quantify these statements?

16.   Lines 386 to 394: Would you consider quantifying your arguments? You have the data to do this, and this avoids the vagueness in using qualitative terms in your assertions.

Conclusions

17.   Line 403: “ a novel framework” how different/novel? See point [2.] above.

18.   Lines 412 to 420: Can the authors consider better and more specific examples in this paragraph?

Comments on the Quality of English Language


Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you very much for revising the manuscript. The authors have addressed several points; however, most of the review comments need more detailed explanations. Please revise carefully and provide detailed responses for each comment.


·      In Methodology, this section would be enhanced by the inclusion of more details regarding the specific steps involved in data preprocessing. Provide specific details on how missing data were addressed or how variables were normalized. The authors have now included a statement about interpolating missing data and using actual values without normalization. This addresses the initial comment, but please providing more detail on the interpolation method used and any potential impact this might have on the results.

 

·      The section mentions hyperparameter tuning but lacks specifics. Including 3 details on the hyperparameter optimization process (e.g., grid search, random search) and the ranges tested would provide a clearer picture of how model parameters were fine-tuned. The author do the addition of the statement regarding grid search for hyperparameter tuning is a good start. However, please including details on the specific hyperparameters tested and their ranges.

 

·      While the models used are described, a brief justification for choosing these specific models over others (e.g., why ensemble methods were preferred) would strengthen the rationale behind the methodology. The authors added the rationale for choosing ensemble methods is good. However, please adding a brifly description  about the superior performance of ensemble methods with reference.

 

·      The section on removing outliers is clear, but consider providing more details on how Cook’s distance was calculated and applied. The authors added the explanation of Cook's distance calculation using the NumPy library is unadequate. Please including a brief explanation or formula for Cook's distance to make it more comprehensible.

 

·      Incorporate a section on sensitivity analysis to assess the robustness of the models to changes in key input parameters. This can provide insights into which variables most significantly impact model performance. Please adding a more detailed discussion on how these methods compare with traditional sensitivity analysis would strengthen the section.

 

·      The use of SHAP values for model interpretation is commendable. Ensure that the explanation of SHAP is clear for readers who may not be familiar with this concept. The revised explanation of SHAP values is good. However, please including an example or a visual representation of how SHAP values impact model predictions would enhance reader understanding.

 

·      Results and discussion - While the impact of outliers on model performance is mentioned, it would be helpful to provide more context or examples of how specific outliers were identified and removed using Cook's distance. The additional details provided on the use of Cook's distance to identify and remove outliers are not enough. Please including an example of an outlier identified and its impact on the model before and after removal.

 

 

·      The explanation of feature importance, particularly the use of SHAP values, is clear but could be expanded. For instance, providing specific examples of how feature 5 importance impacts predictions in different scenarios would enhance understanding.

 

·      The results section could benefit from a more integrated discussion on how the predictions of PM2.5 levels directly relate to the health outcomes. The added context linking PM2.5 predictions to health outcomes, particularly respiratory issues, is valuable. Further elaboration on specific health outcomes and their relation to different levels of PM2.5 exposure with explaination of mechanism briefly would provide a more comprehensive discussion.

 

·      The limitations are briefly mentioned but could be elaborated. Discussing potential biases, data limitations, and the generalizability of the findings would provide a more balanced view. The expanded discussion on data limitations and generalizability of findings is very short. Please including potential biases and discussing how they were mitigated (or not) would add depth to this section.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Data-Driven PM2.5 Exposure Prediction in Wildfire-Prone Regions and Respiratory Disease Mortality Risk Assessment

SECOND ASSESSMENT

 

This manuscript presents the results of a study where machine learning models combining pre-wildfire parameters [such as weather and fuel] and post-wildfire health effects were used to primarily present the relative importance of these parameters to the predictions of the PM2.5 levels in Wagga Wagga, Australia.

The work contributes to existing literature and is interesting.

Thanks to the authors for considering the comments in the first review. The manuscript is now better; however, a few comments were not adequately addressed in the revisions.

In details, these are revisions inadequately addressed [the lines refer to the original manuscript]:

1.      A few spelling and grammar issues remain but I guess these will be picked up by the editors.

2.      The authors need to better demonstrate the “novelty” of this work. See lines 49, 375 to 377, 403. What exactly is novel about this work? How does it differ from previous work? How are the results materially different from results from previous work? THIS HAS NOT BEEN ADDRESSED BY THE AUTHORS, AT ALL. IT WAS EXPLAINED IN THEIR RESPONSE NOT BUT NOT IN THE MANUSCRIPT.

3.      THERE TWO EQUATION 5s ON PAGE 5 OF THE REVISED MANUSCRIPT.

4.      Lines 272 – 273: “…which fall below the daily mean threshold of 12.5 μg/m3.” This is not clear from Figure 1.

a.      Would the authors consider providing finer scales on the figures in Figure 1? For example, for PM2.5, the scale is for 0 and 500 μg/m3, can’t this be shown in gradients of 50 or 100 μg/m3? THIS WAS NOT ADDRESSED. THE AUTHORS SHOULD CONSIDER USING GRADUATED SCALES [FOR EXAMPLES, STEPS OF 50 OR 100 μg/m3 FOR THE PM2.5 SCALE].

5.      Lines 275 – 276: “However, there are a few exceptions observed during days with significant wildfires, where PM2.5 levels exceed the normal range.” Where are these few exceptions? NOT ADDRESSED.

6.      Lines 277 – 281: “In addition, winter points have a narrow distribution with high fuel moisture (Mdl) low temperature (T) and solar radiation (S). The Mdl / solar radiation plot clusters data records based on the season, where summer and winter data records are on the two opposite sides. Summer data records demonstrate lower fuel moisture and higher solar radiation than winter, while fall and spring records exhibit intermediate values.”

a.      Please quantify these observations. Use quantitative data to qualify your qualitative observations.

NOT ADDRESSED.

7.      Figure 1: Please consider using smaller gradients/divisions on the scales on the axes of the figures (see point [3.] above).

8.      Lines 329 – 330: “Table 1 illustrates that the NGboost model has reasonable prediction performance for COPD as well as TB&L, so it is selected for the next step (Fig. 3).” Can the authors justify this selection? From Table 1, it appears that the Random Forest (RF) model is more suitable for predicting the performance of COPD and TB&L. Have the authors used the NGboost model for convenience? I AM NOT CONVINCED BY THE AUTHORS’ RESPONSE. IS THERE A REFERENCE IN LITERATURE FOR THIS COMMENT “The selection of the NGBoost model was based on its performance metrics, particularly Mean Absolute Error (MAE), which is more suitable for models with outliers compared to Mean Squared Error (MSE).”? PLEASE PROVIDE IT/THEM.

Discussion

9.      Lines 386 to 394: Would you consider quantifying your arguments? You have the data to do this, and this avoids the vagueness in using qualitative terms in your assertions. COULD THE AUTHORS IMPROVE THEIR RESPONSE?

Conclusions

10.   Line 403: “ a novel framework” how different/novel? See point [2.] above.

Comments on the Quality of English Language

 Minor editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop