Next Article in Journal
Free Speech, Green Power: The Impact of Freedom of Expression on Renewable Energy
Previous Article in Journal
Sustainable Entrepreneurial Process in the Deep-Tech Industry
 
 
Article
Peer-Review Record

Development of Air Pollution Forecasting Models Applying Artificial Neural Networks in the Greater Area of Beijing City, China

Sustainability 2024, 16(19), 8721; https://doi.org/10.3390/su16198721
by Panagiotis Fazakis 1, Konstantinos Moustris 1,* and Georgios Spyropoulos 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sustainability 2024, 16(19), 8721; https://doi.org/10.3390/su16198721
Submission received: 24 August 2024 / Revised: 17 September 2024 / Accepted: 8 October 2024 / Published: 9 October 2024
(This article belongs to the Section Pollution Prevention, Mitigation and Sustainability)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. Originality and Relevance in the Field
This paper is not entirely original. Research into the application of this technology has been done many times, in some cases on the same material or similar. References:
Feng, X.; Li, Q.; Zhu, Y.; Hou, J.; Jin, L.; Wang, J. Artificial Neural Networks Forecasting of PM2.5 Pollution Using Air Mass Trajectory Based Geographic Model and Wavelet Transformation. 
Atmos. Environ. 2015
Ghazali, S.; Ismail, L.H.. Air Quality Prediction Using Artificial Neural Network. J. Adv. Res. 2012, 3, 309-314. doi:10.1016/j.jare.2011.05.001.
Cordova, C.H., Portocarrero, M.N.L., Salas, R. et al. Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru. Sci Rep 11,
2. Contribution Compared to Published Material
The work does not provide any fundamentally new solutions compared to already published material. It is based on an open dataset, and works with it using previously known methods. However, there is novelty in the choice of approach to the material, allowing the result to be considered more broadly.
3 Methodological Improvements and Further Controls
The research methodology is relevant to the task set. Nevertheless, it is recommended to clarify the choice of model architecture (neural network) in order to clarify the reasons for choosing this particular solution architecture.
4. Consistency of Conclusions
The conclusions correspond to the objectives of the research and describe the obtained result accurately and clearly enough.
Suggestion for improvement: expand the conclusions with an additional paragraph, which will address the issue of future application of the research result.
5. Appropriateness of References
The references are relevant and appropriate for the purpose of the study.
6. Additional Comments on Tables and Figures
Figures and tables are relevant and appropriate. It is suggested to improve the formatting: reduce the size of some figures (1, 3, 5, 6) so that they take up less space. Also, figure 4 is at the end of the section, which worsens the perception and does not allow it to unfold fully
7. Specific Comments
Type of the Paper - not filled

 

Comments on the Quality of English Language

Minor language revision recommended.

Author Response

Reviewer 1

First, we would like to thank the reviewer for his constructive comments and the time he took to study our research paper.

  1. Originality and Relevance in the Field
    This paper is not entirely original. Research into the application of this technology has been done many times, in some cases on the same material or similar. References:
    Feng, X.; Li, Q.; Zhu, Y.; Hou, J.; Jin, L.; Wang, J. Artificial Neural Networks Forecasting of PM2.5 Pollution Using Air Mass Trajectory Based Geographic Model and Wavelet Transformation. Atmos.Environ.2015
    Ghazali, S.; Ismail, L.H.. Air Quality Prediction Using Artificial Neural Network. 
    J. Adv. Res. 2012, 3, 309-314. doi:10.1016/j.jare.2011.05.001.
    Cordova, C.H., Portocarrero, M.N.L., Salas, R. et al. 
    Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru. Sci Rep 11

Response

We would like to thank Reviewer for his valuable comment. Although there are studies with similar techniques and technological background, this research deals beyond the prediction of pollutants in a specific region/city through the creation of a universal prediction model, coordinating multiple ANNs forecasting models. The final "product" is an integrated system that trains, controls and evaluates a multitude of ANNs, extracting the best result. Also, the research aims to further evaluate both the architectural structures of the ANN as well as secondary factors such as the creation of scenarios and the zoning of the wider area. Furthermore, the final product predicts the concentration of six (6) pollutants in contrast to most studies, which focus on one up to four pollutants at once.

  1. Contribution Compared to Published Material
    The work does not provide any fundamentally new solutions compared to already published material. It is based on an open dataset, and works with it using previously known methods. However, there is novelty in the choice of approach to the material, allowing the result to be considered more broadly.

Response

We would like to thank Reviewer for his valuable comment. The results of the specific work may not contribute "new" knowledge and methods. However, it is a characteristic example of the possibility of simplifying the methods used so far, leading to the creation of models immediately usable by the general public, without compromising final results. We agree with the Reviewer that the novelty of this study is apparent in database management applications and in particular it proposes new combinatorial methods for dealing with problems in large volume databases (Time Series).

  1. Methodological Improvements and Further Controls
    The research methodology is relevant to the task set. Nevertheless, it is recommended to clarify the choice of model architecture (neural network) in order to clarify the reasons for choosing this particular solution architecture.

Response

We would like to thank Reviewer for his valuable comment. The reasons for choosing the particular solution are described in lines 280- 296.

  1. Consistency of Conclusions
    The conclusions correspond to the objectives of the research and describe the obtained result accurately and clearly enough.
    Suggestion for improvement: expand the conclusions with an additional paragraph, which will address the issue of future application of the research result.

Response

We would like to thank Reviewer for his valuable comment. The additional paragraph that is request has been added in lines 405-409.

  1. Appropriateness of References
    The references are relevant and appropriate for the purpose of the study.

Response

We would like to thank Reviewer for his valuable comment.

  1. Additional Comments on Tables and Figures
    Figures and tables are relevant and appropriate. It is suggested to improve the formatting: reduce the size of some figures (1, 3, 5, 6) so that they take up less space. Also, figure 4 is at the end of the section, which worsens the perception and does not allow it to unfold fully

Response

We would like to thank Reviewer for his valuable comment. The appropriate changes have been made as requested.

  1. Specific Comments
    Type of the Paper - not filled

Response

We would like to thank Reviewer for his valuable comment. The specific paper is a Research Paper.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

1. Language check is needed for this manuscript..

2. Lines 54-57: Please provide reference.

3. Lines 57-58: “The percentages reach up to 15%, while it is predicted that they will reach the height of 23% in the coming years [11].”

What percentages? Is it reduction of the amount of grain yielded? The research in the reference was done before 2015, do the results still apply in the coming years?

4. Line 68: Provide full name of MLP, as this is the first appearance of it in the manuscript.

5. Line 149: Height of rain change to precipitation. Please provide the units for all data as well.

6. Table 2 & 3: Capitalize wd.

7. Lines 314-317: show equation numbers as eq. N, instead of (n), as you use the total number of the equations (8) same as the equation number (8), this is confusing.

8. For equations 2-9, what are the standard criteria for evaluating these indices?

9. For equations 2-5, they all describe the performance of the models, whats the difference between each of them? Why use all of them to describe the performance? Similarly, why use equations 6-9 to show predictive ability of each developed ANN model? What are the differences?

10. Lines 333-335: This may lead to the conclusion that location 8 could be the base for air pollution forecasting of the other locations.

What do the authors mean that location 8 could be the base for air pollution forecasting of the other locations? Have you done the forecasting of of the other locations from the data of location 8? The authors should provide the results for all locations, and then the readers will be able to tell the the location which shows the best ANNs performance.

11. Line 478: Reference 24 is lack of information, e.g. web address.

 

Comments on the Quality of English Language

Language check is needed for this manuscript.

Author Response

Reviewer 2

First, we would like to thank the reviewer for his constructive comments and the time he took to study our research paper.

  1. Language check is needed for this manuscript.

Response

We would like to thank Reviewer for his valuable comment. This document has been read and the necessary interventions made by a person with a good knowledge of the English language. We did our best.

  1. Lines 54-57: Please provide reference.

Response

We would like to thank Reviewer for his valuable comment. Appropriate references have been added.

  1. Lines 57-58: “The percentages reach up to 15%, while it is predicted that they will reach the height of 23% in the coming years [11].” What percentages? Is it reduction of the amount of grain yielded? The research in the reference was done before 2015, do the results still apply in the coming years?

Response

We would like to thank Reviewer for his valuable comment. In order to clarify the word “percentages” an additional phrase was added “The percentages of the amount of grain yielded”. Furthermore, additional bibliography was added that verifies the accuracy of said predictions regarding the grain yield loss. Specifically, the papers added are the following:

[12] Du, C.; Pei, J.; Feng, Z. Unraveling the Complex Interactions between Ozone Pollution and Agricultural Productivity in China’s Main Winter Wheat Region Using an Interpretable Machine Learning Framework. Science of The Total Environment 2024, 176293, doi:10.1016/j.scitotenv.2024.176293.

[13] Wang, Y.; Wang, Y.; Feng, Z.; Yuan, X.; Zhao, Y. The Impacts of Ambient Ozone Pollution on China’s Wheat Yield and Forest Production from 2010 to 2021. Environmental Pollution 2023, 330, 121726, doi:10.1016/j.envpol.2023.121726.

  1. Line 68: Provide full name of MLP, as this is the first appearance of it in the manuscript.

Response

We would like to thank Reviewer for his valuable comment. The appropriate changes have been made as requested.

  1. Line 149: Height of rain change to precipitation. Please provide the units for all data as well.

Response

We would like to thank Reviewer for his valuable comment. The appropriate changes have been made as requested.

  1. Table 2 & 3: Capitalize wd.

Response

We would like to thank Reviewer for his valuable comment. The appropriate changes have been made as requested.

  1. Lines 314-317: show equation numbers as ‘eq. N’, instead of ‘(n)’, as you use the total number of the equations ‘(8)’ same as the equation number (8), this is confusing.

Response

We would like to thank Reviewer for his valuable comment. The appropriate changes have been made as requested. Furthermore, some appropriate changes due to typo false in the formula of eq.3, eq.4 and eq.5 have been made.

  1. For equations 2-9, what are the standard criteria for evaluating these indices?

We would like to thank Reviewer for his valuable comment. The standard criteria for evaluating these indices were as follows:

  • MAE, RMSE: The acceptable value is the minimum value. The closer the values ​​are to zero, the better the predictive model.
  • R, IA: Both indices take values between 0 (0%) and 1 (100%). The closer the values are to 1 (100%), the better the predictive model.
  • TPR: Takes values between 0% and 100%. The closer the values are to 100%, the better the predictive model.
  • FPR, FAR: Both indices take values between 0% and 100%. The closer the values are to 0%, the better the predictive model.
  • SI: Takes values between 0% and 100%. The closer the values are to 100%, the better the predictive model.

At this point has to be mentioned that for reasons of reducing the volume of the manuscript, all of the above are not mentioned in the text of the specific paper. These statistical indicators are widely used and known in the international literature and in addition we provide relevant references for the readers.

  1. For equations 2-5, they all describe the performance of the models, what’s the difference between each of them? Why use all of them to describe the performance? Similarly, why use equations 6-9 to show predictive ability of each developed ANN model? What are the differences?

Response

We would like to thank Reviewer for his valuable comment. Equations 2-5 are used to evaluate the predictive capability of the models. They are all used together in order to evaluate all the forecasting possibilities. The model that combines the best values for most of these statistical indices is also the one that has the best predictive ability. Equations 6-9 describe the ability of the developed forecasting models to correctly predict the cases of exceedances, in other words the cases where the concentration of the pollutant is greater than or equal to a specific value (threshold value) according to EU and WHO. In order to prevent the present study from being too extensive, we have chosen to cite relevant literature that describes in detail each statistical evaluation index and its importance.

  1. Lines 333-335: ‘This may lead to the conclusion that location 8 could be the base for air pollution forecasting of the other locations.’

What do the authors mean that location 8 could be the base for air pollution forecasting of the other locations? Have you done the forecasting of of the other locations from the data of location 8? The authors should provide the results for all locations, and then the readers will be able to tell the the location which shows the best ANNs performance.

Response

We would like to thank Reviewer for his valuable comment. A section of the answer was omitted by mistake. The complete phrase is “This may lead to the conclusion that sites with similar zoning to location 8 could provide more reliable data, thus assisting the better training of ANN forecasting models be the base for air pollution forecasting of the other lo-cations”.

  1. Line 478: Reference 24 is lack of information, e.g. web address.

Response

We would like to thank Reviewer for his valuable comment. The appropriate changes have been made as requested.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript-review entitled “Development of air pollution forecasting models applying artificial neural networks in the greater area of Beijing city, China”, which authors are: Panagiotis Fazakis, Konstantinos Moustris and Georgios Spyropoulos,

developed a universal forecasting model of air pollutants concentrations, specifically PM2.5 and PM10, as well as SO2, NO2, CO and O3. For that, they used ANNs models com several architectures.

 

This manuscript is very important in the area of pollution control in nature, but several issues must be clarified before the review can be published.

 

1) the authors collected the data to be studied from 3/1/2013 to 2/28/2017. With the best ANN configuration obtained, didn't the authors test the model's performance for the following years (2018-2023)?

This prediction would be important to check whether the model is really good or not.

2) In line 138 the authors detail what the input variables were. What were the output variables?

3) The authors mention that they used 70% of the data for training, 15% of the data for testing and 15% for the cross-validation. What was the criteria for using these percentages? What would be cross-validation?

4) What is the total data volume?

5) A list of symbols, abbreviations, etc. would be important to add in the manuscript.

 

Author Response

Reviewer 3

First, we would like to thank the reviewer for his constructive comments and the time he took to study our research paper.

1) the authors collected the data to be studied from 3/1/2013 to 2/28/2017. With the best ANN configuration obtained, didn't the authors test the model's performance for the following years (2018-2023)?

This prediction would be important to check whether the model is really good or not.

Response

We would like to thank Reviewer for his valuable comment. As mentioned in the paper, the available data exclusively related to the period 3/1/2013 up to 2/28/2017. So, we did not have the ability to use data from the specific locations for another time period e.g 2018-2023. At this time there are not any available data, but if in the future we have these data it would be a nice opportunity to taste the forecasting ability of the developed ANNs models on a new data set in the future.

2) In line 138 the authors detail what the input variables were. What were the output variables?

Response

We would like to thank Reviewer for his valuable comment. As described in lines 262-264, the outputs are the 24hourly concentration values of each pollutant one day ahead.

3) The authors mention that they used 70% of the data for training, 15% of the data for testing and 15% for the cross-validation. What was the criteria for using these percentages? What would be cross-validation?

Response

We would like to thank Reviewer for his valuable comment. It is the usual methodology taking into account the international literature and methodology in the training of ANNs. Cross-Validation is an internal control process during training. In particular, the model during training applies the acquired knowledge until then, to the percentage of 15% of the data used for this process. Next, it calculates the error and modifies the training coefficients and variables accordingly, aiming to reach the best level of predictive performance. Nothing more is mentioned in the text about Cross-Validation, since it is a common process known to everyone who is involved in ANN training as well as to reduce the magnitude of the specific work.

4) What is the total data volume?

Response

We would like to thank Reviewer for his valuable comment. The data consist of 35065 rows and 15 columns for spreadsheet with a total of 9 spreadsheets. The total data volume is 525975 values.

5) A list of symbols, abbreviations, etc. would be important to add in the manuscript.

Response

We would like to thank Reviewer for his valuable comment. Abbreviations have been placed after the first time when the full phrase that describes them appears in the paper.

Author Response File: Author Response.pdf

Back to TopTop