Next Article in Journal
An Environmentally Friendly Technology of Metal Fiber Bag Filter to Purify Dust-Laden Airflow
Next Article in Special Issue
Correlating Traffic Data, Spectral Noise and Air Pollution Measurements: Retrospective Analysis of Simultaneous Measurements near a Highway in The Netherlands
Previous Article in Journal
Regional VOCs Gathering Situation Intelligent Sensing Method Based on Spatial-Temporal Feature Selection
Previous Article in Special Issue
Outdoor Atmospheric Microplastics within the Humber Region (United Kingdom): Quantification and Chemical Characterisation of Deposited Particles Present
 
 
Article
Peer-Review Record

The Relationship between PM2.5 and PM10 in Central Italy: Application of Machine Learning Model to Segregate Anthropogenic from Natural Sources

Atmosphere 2022, 13(3), 484; https://doi.org/10.3390/atmos13030484
by Carlo Colangeli 1,2, Sergio Palermi 1, Sebastiano Bianco 1, Eleonora Aruffo 3,4, Piero Chiacchiaretta 2,4,* and Piero Di Carlo 3,4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Atmosphere 2022, 13(3), 484; https://doi.org/10.3390/atmos13030484
Submission received: 21 January 2022 / Revised: 24 February 2022 / Accepted: 7 March 2022 / Published: 16 March 2022

Round 1

Reviewer 1 Report

Particulate matter monitoring is very important, especially for human health. In this manuscript, air quality monitoring stations collected PM2,5 and PM10 data were used with meteorological data.  A Machine learning algorithm was applied. There are many sections that need to re-organizing and re-writing. The authors should provide the new tables, graphs, and figures.  Because of these reasons my recommendation is REJECT. 

1) Extensive editing of English language and style required 

2) The authors should provide a flowchart for the study. The flowchart will be helpful to understand and follow the study steps. 

3) For Fig. 1, a new base map should be produced with high quality. Therefore, all of the used air quality monitoring stations can be added on the new map. 

4) Please check line 98. There is deleted word with red color. 

5) Data analysis section should be in the Methodology section not in the Data section. 

6) The authors should provide new tables for Table 1, Table 2, and Table 3. 

The authors should provide a table that includes the name of the station, city, coordinates, min, max., mean, std, and data availability as a percentage. 

Combination of T2 and T3 can be very good. In one table, not only average value also min, max, mean and std for PM2.5 and PM10. 

7) Is Box plot created by using daily average PM2.5 and PM10 with respect to 8 ground based stations in the selected study area? The plots should  provide infortaion about median, mean values of PMs. A line for EU limits can be very good to interprate the data.  

8) Why authors prefere to use summer semester and winter semester not the season ? 

9) There is no information about PM2.5/PM10 in Fig.3. Please cehck this. Most probably it should be Fig.4 (that one is missing). 

10) Please check the name of the month in Fig.4.  The authors should provide information about wind in the data section. 

11) There are no Table 7 and Table 8 in the manuscrit as reference. 

12) Why Table 7, Table 8 and Table 9 ... are in Discussion section. These should be in RESULTS section. 

The authors should check the sections of the manuscript. 

11) Are the average values calculated from daily PMs in Table 4? Information is missing. 

12) In model analysis section, the authors mentioned about inputs such as RH, temperature, rainfall, CO. The authors should provide these parameters and related informations in the data section not in the model analysis section (at the end of the methodology). 

13) The authors should provide detailed information about ANN and selected FNN. 

14) There are serious flaws in the manuscript. 

 

 

Author Response

Responses to the Editor and Reviews of Colangeli et al., "The relationship between PM2.5 and PM10 in central Italy: application of machine learning model to segregate anthropogenic from natural sources", (Atmosfere-1586695) submitted to Atmosphere.

 

The manuscript has been revised in consideration of the Reviewers’ comments.

We thank the Reviewer his/her additional comments to our revised manuscript.  We have included his/her comments in italics, followed by our responses.

 

Reviewer #2:

 

Particulate matter monitoring is very important, especially for human health. In this manuscript, air quality monitoring stations collected PM2,5 and PM10 data were used with meteorological data.  A Machine learning algorithm was applied. There are many sections that need to re-organizing and re-writing. The authors should provide the new tables, graphs, and figures.  Because of these reasons my recommendation is REJECT.

 

1) Extensive editing of English language and style required

In accordance with the requirements, all the sections of the manuscript have been revised and corrected.

 

2) The authors should provide a flowchart for the study. The flowchart will be helpful to understand and follow the study steps.

Following the advice indicated, a flowchart of the study phases (Fig 1a) was created and inserted in the introduction section (new lines 86, 462-463 and 514-515).

3) For Fig. 1, a new base map should be produced with high quality. Therefore, all of the used air quality monitoring stations can be added on the new map.

As requested, a higher resolution map was produced in Fig 1 with the indication of all the monitoring stations of the regional network (new lines 459-460 and 512-513).

 

4) Please check line 98. There is deleted word with red color.

As has rightly been pointed out, the word crossed out in red has been removed (new line 104).

 

5) Data analysis section should be in the Methodology section not in the Data section.

In accordance with the requirements, the data analysis section has been included in the Methodology section (new line 148).

6) The authors should provide new tables for Table 1, Table 2, and Table 3. The authors should provide a table that includes the name of the station, city, coordinates, min, max., mean, std, and data availability as a percentage. Combination of T2 and T3 can be very good. In one table, not only average value also min, max, mean and std for PM2.5 and PM10.

As required, new tables have been elaborated: in Table 1 the name of the station, city and coordinates have been inserted, in Table 2 and Table 3 the min, max, median, mean, std deviation data availability as a percentage for PM2.5 and PM10 have been inserted (new lines 423-437).

The idea of joining tables 2 and 3 was excellent, but due to the large amount of data we preferred to leave them separate.

 

7) Is Box plot created by using daily average PM2.5 and PM10 with respect to 8 ground based stations in the selected study area? The plots should provide infortaion about median, mean values of PMs. A line for EU limits can be very good to interprate the data. 

Yes, Box plot was created by using daily average PM2.5 and PM10 with respect to 8 ground based stations in the selected study area: information with numerical data (including the median) have been reported in tables 2 and 3 (new lines 423-437). As required in box plots (Fig. 2 and 3) has been inserted a line for EU limits (new lines 467-468, 472, 516-519).

8) Why authors prefere to use summer semester and winter semester not the season ?

We prefer to use summer semester and winter semester and not the season, because the observation periods object of our study that have been found to have a homogeneous behavior in terms of atmospheric particulate concentration are not seasonal but six-monthly. In this regard, we have chosen to call the period from October to March (included) and the summer semester the period from April to September (included).

 

9) There is no information about PM2.5/PM10 in Fig.3. Please check this. Most probably it should be Fig.4 (that one is missing).

The observation is correct, in fact the comment referred to figure 4 and a correction was made on the text of the manuscript (new line 179).

 

10) Please check the name of the month in Fig.4.  The authors should provide information about wind in the data section.

After your correct observation, the names of the months and data series in Fig 4 have been changed in the graph and the legends on the axes have been added (new lines 520-521). As required the information about the winds has been entered on the data section (new lines 187-207).

11) There are no Table 7 and Table 8 in the manuscript as reference.

We thank the Reviewer for this comment. This is our mistake from a previous version of the manuscript. We removed Tables 7 and 8 and Table 9 is, in the revised manuscript, Table 7 (new lines 240, 243, 295 and 304).

 

12) Why Table 7, Table 8 and Table 9 ... are in Discussion section. These should be in RESULTS section. The authors should check the sections of the manuscript.

In accordance with the requirements, all the sections of the manuscript have been revised and corrected, in particular Table 7 has been inserted in the new section called Results and discussion (new lines 295 and 304). Table 8 and 9 referred to a previous version of the manuscript and have been deleted.

 

11) Are the average values calculated from daily PMs in Table 4? Information is missing.

Yes, the missing information has been added (new line 439).

 

12) In model analysis section, the authors mentioned about inputs such as RH, temperature, rainfall, CO. The authors should provide these parameters and related informations in the data section not in the model analysis section (at the end of the methodology).

As requested inputs such as RH, temperature, rainfall, CO and related informations have been inserted in the data section: the new 2.3 section named Sampling and data analysis (new lines 232-246).

 

13) The authors should provide detailed information about ANN and selected FNN.

Insert new information (new lines 113-146)

 

 

14) There are serious flaws in the manuscript.

Following your observation, the manuscript has been completely reorganized and in several parts reworked for the purpose to fill all the flaws observed.

Reviewer 2 Report

There are few points:-

  1. Look at the Table 1, it is image. It must be drawn in paper itself.
  2. All figures and graphs must be drawn with high resolution and quality of images must be clear and all text inside it must be visible with high resolution. Look at the Fig. 2 to 9 all figures are with low resolution.
  3. Improve the state of art work and identify few latest challenges. Papers in related work must be from high indexed journals like ISI, Webof Science with good impact factor only. Look at few papers which could also be cited in the revised version of the  manuscript:- (a) Sustainability | Free Full-Text | Indoor Air Quality Monitoring Systems for Enhanced Living Environments: A Review toward Sustainable Smart Cities (mdpi.com) (b) Sustainability | Free Full-Text | A Systematic Study on the Analysis of the Emission of CO, CO2 and HC for Four-Wheelers and Its Impact on the Sustainable Ecosystem (mdpi.com)
  4. At the end of discussion on the results. You need to emphasize more on the application of the work.

Author Response

Responses to the Editor and Reviews of Colangeli et al., "The relationship between PM2.5 and PM10 in central Italy: application of machine learning model to segregate anthropogenic from natural sources", (Atmosfere-1586695) submitted to Atmosphere.

 

The manuscript has been revised in consideration of the Reviewers’ comments.

We thank the Reviewer his/her additional comments to our revised manuscript.  We have included his/her comments in italics, followed by our responses.

 

Reviewer #1:

 

There are few points:

 

  • Look at the Table 1, it is image. It must be drawn in paper itself.

As rightly requested the Table 1 from the image format has been converted into a table (new lines 423-426).

 

  • All figures and graphs must be drawn with high resolution and quality of images must be clear and all text inside it must be visible with high resolution. Look at the Fig. 2 to 9 all figures are with low resolution.

As required, all figures from number 2 to number 9 are reproduced in high resolution (new lines 512-539).

 

  • Improve the state of art work and identify few latest challenges. Papers in related work must be from high indexed journals like ISI, Webof Science with good impact factor only. Look at few papers which could also be cited in the revised version of the manuscript:- (a) Sustainability | Free Full-Text | Indoor Air Quality Monitoring Systems for Enhanced Living Environments: A Review toward Sustainable Smart Cities (mdpi.com) (b) Sustainability | Free Full-Text | A Systematic Study on the Analysis of the Emission of CO, CO2 and HC for Four-Wheelers and Its Impact on the Sustainable Ecosystem (mdpi.com)

Following the advice, the state of the art has been improved by also referring to the above-mentioned articles which are the subject of more recent studies (new lines 22-25 and 345-350).

 

  • At the end of discussion on the results. You need to emphasize more on the application of the work.

As requested, this aspect has been further clarified especially in the conclusions section (new lines 320-328 and 331-334).

Reviewer 3 Report

In my opinion, this manuscript does not show any novelty, is written in bad English and poor scientific quality. There is a lot of theoretical mistakes. In this way, I need to reject the paper.

1 - Introduction: Line 3 – “Such us” should be changed for “such as”.

2 - Table 1 shows data from other pollutants and chemical compounds than PM10 and PM2.5. The goal of your study is predict PM10 and PM2.5, than the other information is unnecessary and may lead to confusion.

3 - line 258: ANN "are" based on...;

4 – The main goal of the paper is the application of machine learning model, as the title and the whole text points out. However, eight pages of Material and methods were dedicated to compared and discuss the air pollutants and meteorological data and, only one page for the ANN description.

5 - Section 2.2: the authors present much information about ANN but they did not provide references. They only mention one work in the whole section.

6 - line 259: "identification and classification of models" -> the ANN can identify or classify process or datasets. They are the models;

7 - line 260 - ANN network? Artificial Neural Network network?

8 - line 261: about the structure, it is not necessarily true. It depends on the architecture selected. There are many propositions in the literature;

9 - line 262: i-th input -> all variables must be in italics. Correct all;

10 - The bias is considered an input receiving the value of +1. Also it presents a weight which must be adjusted together with the other weights of the network;

11 - There is only one paragraph in section 2.2. IT must be splited;

12 - line 268: "hidden single-layer feedforward neural network" -> this is not an architecture, but a class of ANN; including MLP, RBF, ELM, and others. I believe the authors are using an MLP, but it is not clear in the text;

13 - line 274: there is no transfer function, but activation function;

14- line 275: a great mistake: the authors cannot use the same random seed because the cost function is unknown. The correct procedure is initialize the network 30 independent times with 30 distinct random seeds and then perform a dispersion analysis of the results;

15 - The authors should consider other error metrics beyond MSE, such as MAPE and MAE;

16 - Did the authors use holdout cross validation?

17 - The authors claim they used MSE and NMSE, but in tables 7 and 8 they present the RMSE;

18 - Are the results in Tables 7 and 8 regarding the test set?

19 - In figure 10, the authors are presenting W as the weight matrix for the hidden layers and output layer;

20 - The authors did not present how many neurons the ANN presented;

21 - The authors did not mention the training algorithm;

22 - There is no comparative analysis among distinct models;

23 - In summary, the application of ANN did not present some of the most relevant information. Also, many definitions are wrong. The authors must read Haykin's book to learn the correct terms used in the area.

Author Response

Responses to the Editor and Reviews of Colangeli et al., "The relationship between PM2.5 and PM10 in central Italy: application of machine learning model to segregate anthropogenic from natural sources", (Atmosfere-1586695) submitted to Atmosphere.

 

The manuscript has been revised in consideration of the Reviewers’ comments.

We thank the Reviewer his/her additional comments to our revised manuscript.  We have included his/her comments in italics, followed by our responses in red.

 

Reviewer #3:

In my opinion, this manuscript does not show any novelty, is written in bad English and poor scientific quality. There is a lot of theoretical mistakes. In this way, I need to reject the paper.

 

1 - Introduction: Line 3 – “Such us” should be changed for “such as”.

The mistake was remedied by changing "us" to "as" (new line 47)

 

 

2 - Table 1 shows data from other pollutants and chemical compounds than PM10 and PM2.5. The goal of your study is predicted PM10 and PM2.5, than the other information is unnecessary and may lead to confusion.

As rightly observed, all pollutants and chemical compounds have been removed from table 1 except PM10 and PM2.5 object of our study (new lines 423-426).

 

3 - line 258: ANN "are" based on...;

The mistake was remedied by changing "is" to "are" (new line 133)

 

4 – The main goal of the paper is the application of machine learning model, as the title and the whole text points out. However, eight pages of Material and methods were dedicated to compared and discuss the air pollutants and meteorological data and, only one page for the ANN description.

(new lines 113-146)

In recent years ANNs that use multiple stages of nonlinear computation (aka “deep learning”) have been able obtain outstanding performance on an array of complex tasks ranging from visual object recognition to natural language processing. However, I’ve found that most of the available tutorials on ANNs are either dense with formal details and contain little information about implementation or any examples, while others skip a lot of the mathematical detail and provide implementations that seem to come from thin air. This post aims at giving a more complete overview of ANNs, including (varying degrees of) the math behind ANNs, how ANNs are implemented in code, and finally some toy examples that point out the strengths and weaknesses of ANNs.

The simplest ANN takes a set of observed inputs, multiplies each of them by their own associated weight, and sums the weighted values to form a pre-activation. Oftentimes there is also a bias that is tied to an input that is always +1 included in the preactivation calculation. The network then transforms the pre-activation using a nonlinear activation function to output a final activation.

There are many options available for the form of the activation function, and the choice generally depends on the task we would like the network to perform.

For instance, if the activation function is the identity function: which outputs continuous values, then the network implements a linear model akin to used in standard linear regression. Another choice for the activation function is the logistic sigmoid:

which outputs values. When the network outputs use the logistic sigmoid activation function, the network implements linear binary classification. Binary classification can also be implemented using the hyperbolic tangent function, which outputs values (note that the classes must also be coded as either -1 or 1 when using. Single-layered neural networks used for classification are often referred to as “perceptrons,” a name given to them when they were first developed in the late 1950s.

 

5 - Section 2.2: the authors present much information about ANN but they did not provide references. They only mention one work in the whole section.

Artificial neural network (ANN) approaches are commonly used in many applications of atmosphere science (Gardner and Dorling, 1999; Grimes et al., 2003; Kassomenos et al., 2010; Cabanerosa et al., 2019; Aruffo et al.,2020 ) (new lines 64-65 and 380-387)

  1. Gardner, M.W.; Dorling, S.R. Neural network modeling and prediction of hourly NOx and NO2 concentrations in urban air in London. Atmos. Environ. 1999, 33, 709–719.
  2. Mclean Cabanerosa, S.; Calautitb, J.K.; Hughesa, B.R. A review of artificial neural network models for ambient air pollution prediction. Environ. Model. Softw. 2019, 119, 285–304.
  3. Grimes, D.I.F.; Coppola, E.; Verdecchia, M.; Visconti, G. A neural network approach to real-time rainfall estimation for Africa using satellite data. J. Hydrometeorol. 2003, 4, 1119–1133.
  4. Aruffo, E.; Di Carlo, P.; Cristofanelli P.; Bonasoni, P. Neural Network Model Analysis for Investigation of NO Origin in a High Mountain Site. Atmosphere 2020, 173,1-11

 

6 - line 259: "identification and classification of models" -> the ANN can identify or classify process or datasets. They are the models;

It has been changed “models” with “processes” (new line 134)

 

7 - line 260 - ANN network? Artificial Neural Network network?

The mistake was remedied by deleting the word network (new line 135)

8 - line 261: about the structure, it is not necessarily true. It depends on the architecture selected. There are many propositions in the literature; (new lines 135 and 393-400)

The basic architecture of an ANN includes three parts (May Tzuc, O., et al, 2019, Sairamya et al., 2019): the input layer (containing neurons or nodes), one or more hidden layers (where other neurons are present) and the output layer (with the respective output neurons).

  1. May Tzuc, O.; Bassam, A.; Ricalde, L.J.; Cruz May, E. Sensitivity Analysis With Artificial Neural Networks for Operation of Photovoltaic Systems. In A. Y. Alanis, N. Arana-Daniel; C. López-Franco (Eds.). Artificial Neural Networks for Engineering Applications Academic Press. 2019, 10, 127–138.
  2. Sairamya, N.J.; Susmitha, L.; Thomas George, S.; Subathra, M.S.P. Hybrid Approach for Classification of Electroencephalographic Signals Using Time–Frequency Images With Wavelets and Texture Features. In D. J. Hemanth, D. Gupta, V. Emilia Balas (Eds.), Intelligent Data Analysis for Biomedical Applications. Academic Press. 2019, 12, 253–273.

 

 

10 - The bias is considered an input receiving the value of +1. Also it presents a weight which must be adjusted together with the other weights of the network;

All the biases of the FNN were initialized to a small constant, i.e., 0.1, whereas the weights were initialized in a pseudo-random manner employing a truncated normal distribution (standard deviation = 0.1).

 

11 - There is only one paragraph in section 2.2. IT must be splited;

As required (and as also required by Rev # 2 – 12), paragraph 2.2 has been divided into two parts, one of which is included in the section called “2.3 Sampling and data analysis” (new lines 226-246).

 

12 - line 268: "hidden single-la22yer feedforward neural network" -> this is not an architecture, but a class of ANN; including MLP, RBF, ELM, and others. I believe the authors are using an MLP, but it is not clear in the text; (new lines 142-143 and 401-405)

The most common architecture used for a wide range of applications is the feed-forward Multi-layer perceptron (FF-MLP). The MLP has a relatively simple architecture in which each node receives output only from nodes in the preceding layer and provides input only to nodes in the subsequent layer. In the new lines 144-145 the use of MLP (Guo Z. et al. 2006, Castro W. et al. 2017) architecture is highlighted.

  1. Castro, W.; Oblitas, J.; Santa-Cruz, R.; Avila-George H. Multilayer perceptron architecture optimization using parallel computing techniques. PLoS One. 2017, 12, 0189369.
  2. Guo, Z.; Chai, Q.; Maskell, D.L. FCMAC-AARS: A Novel FNN Architecture for Stock Market Prediction and Trading. IEEE International Conference on Evolutionary Computation 2006, 2375-2381.

 

13 - line 274: there is no transfer function, but activation function;

Changed in activation function (new line 228)

 

14- line 275: a great mistake: the authors cannot use the same random seed because the cost function is unknown. The correct procedure is initialize the network 30 independent times with 30 distinct random seeds and then perform a dispersion analysis of the results;

(new lines 229-232)

We thank the Reviewer. We have probably not been clear explaining our procedure. We ran 30 different simulations with 30 different initial weights and biases, in order to have 30 randomness ANNs. After that, we have been able to choose the ANN that showed the better agreement with the measurements by using statistical parameters. We set the seed in order to make our results repeatable when the machine is restarted, or the model is ran in a different machine. We included the following sentence in the manuscript to make it easier to understand for the reader.

The FNN was run by varying the number of neurons in the hidden layer (from 1 to 35 neurons) to find the best simulation performance; 30 tests of the model were performed, and the FFN was ran 30 times during which the weights and bias were varied in turn. To make reproducible after restarting the machine or in a different machine, we fixed the seed.”

 

15 - The authors should consider other error metrics beyond MSE, such as MAPE and MAE;

 

MAE and MAPE are measures that indicates about the mean of the dispersion between predicted and observed value, for each one with the linear model (absolute difference). MSE is a measure of model error, it is more complete.

 

16 - Did the authors use holdout cross validation?

(new lines 237-240)

We have used holdout method to verify the model accuracy on the new dataset (i.e. validation dataset).

Training (70%), validation (15%) and testing (15%). selected using indices initially generated randomly, and then kept fixed for all simulations: in this way, we fixed the selection of the dataset for all simulations leaving only the weights and the bias variable.

 

17 - The authors claim they used MSE and NMSE, but in tables 7 and 8 they present the RMSE;

We thank the Reviewer. This is a mistake from a previous version: Table 7 and 8 have been removed from the manuscript. See also our answer to Comment 11).

 

18 - Are the results in Tables 7 and 8 regarding the test set?

Yes

 

19 - In figure 10, the authors are presenting W as the weight matrix for the hidden layers and output layer;

Figure 10 has been changed and WH (matrix for hidden layers) and WO (matrix for output layers) have been inserted (new lines 541-542)

 

20 - The authors did not present how many neurons the ANN presented;

See new lines 232-234 and Figure 10 (new lines 540-541)

 

21 - The authors did not mention the training algorithm;

We used backpropagation algorithm which is a widely used for training feedforward neural networks.

 

22 - There is no comparative analysis among distinct models; (new lines 368-370)

We thank the Reviewer for this comment. Is it possible to find in literature this analysis (Biancofiore et al., 2017). Moreover, the purpose of our study is to define a methodology to classify the air masses through PM2.5 and PM10 ANNs simulations.

Biancofiore, F.; Busilacchio, M.; Verdecchia, M.; Tomassetti, B.; Aruffo, E.; Bianco, S.; Di Tommaso, S.; Colangeli, C.; Rosatelli, G.; Di Carlo, P. Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmospheric Pollution Research 2017, 8, 652-659.

 

23 - In summary, the application of ANN did not present some of the most relevant information. Also, many definitions are wrong. The authors must read Haykin's book to learn the correct terms used in the area. Overall, the manuscript is improved but some minor points should be still fixed before publication.

Thanks for the suggestion, we read the Haykin's book and in the revised version we change the definitions according to that book. In any case other information about the ANN, as reported in the manuscript, can be found in our previous papers: Biancofiore et al., 2015, Biancofiore, et al. 2017 and Aruffo et al 2021.

 

 

 

Round 2

Reviewer 1 Report

The authors made all the corrections that I mentioned in my first review. Because of this reason, my recommendation is to Accept in present form. 

Reviewer 2 Report

Good work.

Back to TopTop