Big-Data-Driven Machine Learning for Enhancing Spatiotemporal Air Pollution Pattern Analysis
Abstract
:1. Introduction
2. Materials and Methods
2.1. Localization
2.2. Big Data Machine-Learning Workflow
2.3. Clustering Methods
2.3.1. Unsupervised Machine Learning Technique Using K-Means
2.3.2. Spatially Constrained Clustering Using SKATER
2.3.3. Spatial Clustering in ArcGIS
2.3.4. Evaluation Metrics for Optimal Clusters Number
3. Results
3.1. Clusters Number Evaluation
3.2. K-Means with DTW and Skater Clustering
3.3. Spatial Clustering
4. Discussion
Limitations of the Study
5. Conclusions
- The K-means and SKATER clustering algorithms revealed distinct differences between average and maximum values of pollutant concentrations.
- The SKATER algorithm was found to be suboptimal for analyzing rapidly and spatially varying data, highlighting the importance of selecting appropriate clustering algorithms for specific data types. However, it does not mean that SKATER is generally unsuitable for such studies, but its effectiveness may depend on the specific nature of the data being analyzed.
- The application of the K-means algorithm with DTW produced more accurate results in identifying yearly patterns and it seems to be a superior method for identifying clusters in this particular case of the spatiotemporal fast-changing data.
- ML techniques together with Moran and Getis-Ord hot-spots and cold-spots analysis provided a holistic problem overview. Furthermore, the clustering analysis of data after kriging greatly facilitated the interpretation of the results, suggesting that this approach can—in some cases—be preferable to clustering on real sensor positions.
- The use of machine learning and big data analysis can provide valuable insights into the spatial and temporal distribution of air pollution.
- The identification of hot-spots and cold-spots can inform policy decisions regarding urban planning, traffic management, and public health interventions.
- A holistic data approach is needed to fully understand the complex spatiotemporal nature of air pollution in urban environments.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
LCS | Low-Cost Sensors |
PM | Particulate matter |
WMO | World Meteorological Organization |
EU | European Union |
ML | Machine Learning |
AI | Artificial Intelligence |
References
- Thurston, G.; Kipen, H.; Annesi-Maesano, I.; Balmes, J.; Brook, R.; Cromar, K.; De Matteis, S.; Forastiere, F.; Forsberg, B.; Frampton, M.; et al. A joint ERA/ATS policy statement: What constitutes an adverse health effect of air pollution? An analytical framework. Eur. Respir. J. 2017, 49, 1600419. [Google Scholar] [CrossRef] [PubMed]
- Raaschou-Nielsen, O.; Andersen, Z.; Beelen, R.; Samoli, E.; Stafoggia, M.; Weinmayr, G.; Hoffmann, B.; Fischer, P.; Nieuwenhuijsen, M.; Brunekreef, B.; et al. Air pollution and lung cancer incidence in 17 European cohorts: Prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). Lancet Oncol. 2013, 14, 813–822. [Google Scholar] [CrossRef] [PubMed]
- Kuzma, L.; Roszkowska, S.; Swieczkowski, M.; Dabrowski, E.; Kurasz, A.; Wanha, W.; Bachorzewska-Gajewska, H.; Dobrzycki, H. Exposure to air pollution and its effect on ischemic strokes (EP-PARTICLES study). Sci. Rep. 2022, 12, 17150. [Google Scholar] [CrossRef]
- Manisalidis, I.; Stavropoulou, E.; Stavropoulos, A.; Bezirtzoglou, E. Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 2020, 8, 14. [Google Scholar] [CrossRef]
- Pedersen, M.; Giorgis-Allemand, L.; Bernard, C.; Aguilera, I.; Andersen, A.; Ballester, F.; Beelen, R.; Chatzi, L.; Cirach, M.; Danileviciute, A.; et al. Ambient air pollution and low birthweight: A European cohort study (ESCAPE). Lancet Respir. Med. 2013, 1, 695–704. [Google Scholar] [CrossRef]
- Bokwa, A. Environmental Impacts of Long-Term Air Pollution Changes in Kraków, Poland. Polish J. Environ. Stud. 2008, 17, 673–686. [Google Scholar]
- Change, I.P.C. Climate Change 2013: The Physical Science Basis, Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
- Danek, T.; Weglinska, E.; Zareba, M. The influence of meteorological factors and terrain on air pollution concentration and migration: A geostatistical case study from Krakow, Poland. Sci. Rep. 2022, 12, 11050. [Google Scholar] [CrossRef] [PubMed]
- Danek, T.; Zareba, M. The Use of Public Data from Low-Cost Sensors for the Geospatial Analysis of Air Pollution from Solid Fuel Heating during the COVID-19 Pandemic Spring Period in Krakow, Poland. Sensors 2021, 21, 5208. [Google Scholar] [CrossRef] [PubMed]
- Kuzma, L.; Kurasz, A.; Dabrowski, E.J.; Dobrzycki, S.; Bachorzewska-Gajewska, H. Short-Term Effects of “Polish Smog” on Cardiovascular Mortality in the Green Lungs of Poland: A Case-Crossover Study with 4,500,000 Person-Years (PL-PARTICLES Study). Atmosphere 2021, 12, 1270. [Google Scholar] [CrossRef]
- Czerwinska, J.; Wielgosinski, G.; Szymanska, O. Is the Polish Smog a New Type of Smog? Ecol. Chem. Eng. S 2019, 26, 465–474. [Google Scholar] [CrossRef]
- Zareba, M.; Danek, T. Analysis of Air Pollution Migration during COVID-19 Lockdown in Krakow, Poland. Aerosol Air Qual. Res. 2022, 22, 210275. [Google Scholar] [CrossRef]
- Krakowa, U.M. I Stopień zagrożEnia Zanieczyszczeniem Powietrza. Available online: https://www.krakow.pl/aktualnosci/218420,29,komunikat,i_stopien_zagrozenia_zanieczyszczeniem_powietrza.html (accessed on 20 March 2023).
- Parliament, E. Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on Ambient Air Quality and Cleaner Air for Europe. 2008. Available online: http://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32008L0050 (accessed on 29 September 2021).
- For Environmental Protection, C.I. PMs Measuring in the Air. 2021. Available online: http://www.gios.gov.pl/pl/aktualnosci/391-pomiary-pylu-zawieszonego-w-powietrzu (accessed on 29 September 2021).
- Peltier, R.E.; Castell, N.; Clements, A.L.; Dye, T.; Hüglin, C.; Kroll, J.H.; Lung, S.C.C.; Ning, Z.; Parsons, M.; Penza, M.; et al. An Update on Low-Cost Sensors for the Measurement of Atmospheric Composition; World Meteorological Organization: Geneva, Switzerland, 2020; p. 1215. [Google Scholar]
- Abdalla, H.B. A brief survey on big data: Technologies, terminologies and data-intensive applications. J. Big Data 2022, 9, 1–36. [Google Scholar] [CrossRef]
- Hamerly, G. Learning Structure and Concepts in Data Through data Clustering. Ph.D. Thesis, University of California, San Diego, CA, USA, 2003. [Google Scholar]
- Zareba, M.; Danek, T.; Stefaniuk, M. Unsupervised Machine Learning Techniques for Improving Reservoir Interpretation Using Walkaway VSP and Sonic Log Data. Energies 2023, 16, 493. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning. In Information Science and Statistics; Jordan, M., Kleinberg, J., Scholkopf, B., Eds.; Springer Science+Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Assunção, R.M.; Neves, M.C.; Câmara, G.; da Costa Freitas, C. Efficient regionalization techniques for socio-economic geographical units using minimum spanning trees. Int. J. Geogr. Inf. Sci. 2006, 20, 797–811. [Google Scholar] [CrossRef]
- ESRI Learning Center, Redlands. ArcGIS Pro [Computer Software]: Release 2.8, 2021; ESRI: Redlands, CA, USA, 2021. [Google Scholar]
- Anselin, L. Local Indicators of Spatial Association—LISA. Geogr. Anal. 1995, 27, 93–115. [Google Scholar] [CrossRef]
- Getis, A.; Ord, J. The Analysis of Spatial Association by Use of Distance Statistics. Geogr. Anal. 1992, 24, 189–206. [Google Scholar] [CrossRef]
- Banthia, A.; Jayasumana, A.; Malaiya, Y. Data size reduction for clustering-based binning of ICs using principal component analysis (PCA). In Proceedings of the 2005 IEEE International Workshop on Current and Defect Based Testing, Palm Springs, CA, USA, 1 May 2005; pp. 24–30. [Google Scholar] [CrossRef]
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE PAMI 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
- Celeux, G.; Fruhwirth-Schnatter, S.; Robert, C. Model Selection for Mixture Models-Perspectives and Strategies. In Handbook of Mixture Analysis; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Fischer, P.H.; Marra, M.; Ameling, C.B.; Hoek, G.; Beelen, R.; de Hoogh, K.; Breugelmans, O.; Kruize, H.; Janssen, N.A.; Houthuijs, D. Air pollution and mortality in seven million adults: The Dutch Environmental Longitudinal Study (DUELS). Environ. Health Perspect. 2015, 123, 697–704. [Google Scholar] [CrossRef] [PubMed]
- Lu, D.; Mao, W.; Xiao, W.; Zhang, L. Non-Linear Response of PM2.5 Pollution to Land Use Change in China. Remote. Sens. 2021, 13, 1612. [Google Scholar] [CrossRef]
- Jankowski, P. Integrating geographical information systems and multiple criteria decision-making methods. Int. J. Geogr. Inf. Syst. 1995, 9, 251–273. [Google Scholar] [CrossRef]
- Iskandaryan, D.; Ramos, F.; Trilles, S. Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review. Appl. Sci. 2020, 10, 2401. [Google Scholar] [CrossRef]
- Yin, L.; Wang, L.; Huang, W.; Liu, S.; Yang, B.; Zheng, W. Spatiotemporal Analysis of Haze in Beijing Based on the Multi-Convolution Model. Atmosphere 2021, 12, 1408. [Google Scholar] [CrossRef]
- Marquez, L.O.; Smith, N.C. A framework for linking urban form and air quality. Environ. Model. Softw. 1999, 14, 541–548. [Google Scholar] [CrossRef]
- Urban form and air pollution: Clustering patterns of urban form factors related to particulate matter in Seoul, Korea. Sustain. Cities Soc. 2022, 81, 103859. [CrossRef]
- Jorquera, H.; Villalobos, A.M. Combining Cluster Analysis of Air Pollution and Meteorological Data with Receptor Model Results for Ambient PM2.5 and PM10. Int. J. Environ. Res. Public Health 2020, 17, 8455. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zareba, M.; Dlugosz, H.; Danek, T.; Weglinska, E. Big-Data-Driven Machine Learning for Enhancing Spatiotemporal Air Pollution Pattern Analysis. Atmosphere 2023, 14, 760. https://doi.org/10.3390/atmos14040760
Zareba M, Dlugosz H, Danek T, Weglinska E. Big-Data-Driven Machine Learning for Enhancing Spatiotemporal Air Pollution Pattern Analysis. Atmosphere. 2023; 14(4):760. https://doi.org/10.3390/atmos14040760
Chicago/Turabian StyleZareba, Mateusz, Hubert Dlugosz, Tomasz Danek, and Elzbieta Weglinska. 2023. "Big-Data-Driven Machine Learning for Enhancing Spatiotemporal Air Pollution Pattern Analysis" Atmosphere 14, no. 4: 760. https://doi.org/10.3390/atmos14040760
APA StyleZareba, M., Dlugosz, H., Danek, T., & Weglinska, E. (2023). Big-Data-Driven Machine Learning for Enhancing Spatiotemporal Air Pollution Pattern Analysis. Atmosphere, 14(4), 760. https://doi.org/10.3390/atmos14040760