Comparative Analysis of Traditional and Advanced Clustering Techniques in Bioaerosol Data: Evaluating the Efficacy of K-Means, HCA, and GenieClust with and without Autoencoder Integration
Abstract
:1. Introduction
2. Materials and Method
2.1. Sampling
2.2. Preprocessing
2.3. Analytical Techniques
2.3.1. Cluster Analysis
2.3.2. Optimal Cluster Selection
2.4. Methods Evaluated
2.4.1. Preprocessing Strategies
2.4.2. K-Means
2.4.3. HCA
2.4.4. GenieClust
2.4.5. ABC
3. Results
3.1. K-Means and HCA
3.2. Genie Performance
Calinski–Harabasz Score
3.3. Ambient Airborne Concentration Analysis
4. Discussion
Limitations and Further Work
- Review and optimize the architecture and hyperparameters of the AE.
- Applying dimensionality reduction techniques.
- Explore different AE variations or regularization techniques to enhance the quality of the encoding.
- Evaluate the clustering performance using various evaluation metrics beyond similarity, such as the silhouette score or cluster purity.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Patel, T.Y.; Buttner, M.; Rivas, D.; Cross, C.; Bazylinski, D.A.; Seggev, J. Variation in Airborne Fungal Spore Concentrations among Five Monitoring Locations in a Desert Urban Environment. Environ. Monit. Assess. 2018, 190, 634. [Google Scholar] [CrossRef] [PubMed]
- Katz, A.; Alimova, A.; Xu, M.; Rudolph, E.; Shah, M.K.; Savage, H.E.; Rosen, R.B.; McCormick, S.A.; Alfano, R.R. Bacteria Size Determination by Elastic Light Scattering. IEEE J. Sel. Top. Quantum Electron. 2003, 9, 277–287. [Google Scholar] [CrossRef]
- Bradley, R.S. Paleoclimatology; Elsevier: Amsterdam, The Netherlands, 2015. [Google Scholar] [CrossRef]
- Grgacic, E.V.L.; Anderson, D.A. Virus-like Particles: Passport to Immune Recognition. Methods 2006, 40, 60–65. [Google Scholar] [CrossRef] [PubMed]
- Pearson, C.; Littlewood, E.; Douglas, P.; Robertson, S.; Gant, T.W.; Hansell, A.L. Exposures and Health Outcomes in Relation to Bioaerosol Emissions from Composting Facilities: A Systematic Review of Occupational and Community Studies. J. Toxicol. Environ. Health Part B Crit. Rev. 2015, 18, 43–69. [Google Scholar] [CrossRef] [PubMed]
- Kalogerakis, N.; Paschali, D.; Lekaditis, V.; Pantidou, A.; Eleftheriadis, K.; Lazaridis, M. Indoor Air Quality—Bioaerosol Measurements in Domestic and Office Premises. J. Aerosol Sci. 2005, 36, 751–761. [Google Scholar] [CrossRef]
- Douwes, J.; Thorne, P.; Pearce, N.; Heederik, D. Bioaerosol Health Effects and Exposure Assessment: Progress and Prospects. Ann. Occup. Hyg. 2003, 47, 187–200. [Google Scholar] [CrossRef]
- Huffman, J.A.; Perring, A.E.; Savage, N.J.; Clot, B.; Crouzy, B.; Tummon, F.; Shoshanim, O.; Damit, B.; Schneider, J.; Sivaprakasam, V.; et al. Real-Time Sensing of Bioaerosols: Review and Current Perspectives. Aerosol Sci. Technol. 2019, 5, 465–495. [Google Scholar] [CrossRef]
- Fröhlich-Nowoisky, J.; Kampf, C.J.; Weber, B.; Huffman, J.A.; Pöhlker, C.; Andreae, M.O.; Lang-Yona, N.; Burrows, S.M.; Gunthe, S.S.; Elbert, W.; et al. Bioaerosols in the Earth System: Climate, Health, and Ecosystem Interactions. Atmos. Res. 2016, 182, 346–376. [Google Scholar] [CrossRef]
- Pöhlker, C.; Huffman, J.A.; Pöschl, U. Autofluorescence of Atmospheric Bioaerosols—Fluorescent Biomolecules and Potential Interferences. Atmos. Meas. Tech. 2012, 5, 37–71. [Google Scholar] [CrossRef]
- Wilson, K.H.; Wilson, W.J.; Radosevich, J.L.; DeSantis, T.Z.; Viswanathan, V.S.; Kuczmarski, T.A.; Andersen, G.L. High-Density Microarray of Small-Subunit Ribosomal DNA Probes. Appl. Environ. Microbiol. 2002, 68, 2535–2541. [Google Scholar] [CrossRef]
- Wittmaack, K.; Wehnes, H.; Heinzmann, U.; Agerer, R. An Overview on Bioaerosols Viewed by Scanning Electron Microscopy. Sci. Total Environ. 2005, 346, 244–255. [Google Scholar] [CrossRef] [PubMed]
- Toprak, E.; Schnaiter, M. Fluorescent Biological Aerosol Particles Measured with the Waveband Integrated Bioaerosol Sensor WIBS-4: Laboratory Tests Combined with a One Year Field Study. Atmos. Chem. Phys. 2013, 13, 225–243. [Google Scholar] [CrossRef]
- Song, H.; Marsden, N.; Lloyd, J.R.; Robinson, C.H.; Boothman, C.; Crawford, I.; Gallagher, M.; Coe, H.; Allen, G.; Flynn, M. Airborne Prokaryotic, Fungal and Eukaryotic Communities of an Urban Environment in the UK. Atmosphere 2022, 13, 1212. [Google Scholar] [CrossRef]
- Fennelly, M.; Sewell, G.; Prentice, M.; O’Connor, D.; Sodeau, J. Review: The Use of Real-Time Fluorescence Instrumentation to Monitor Ambient Primary Biological Aerosol Particles (PBAP). Atmosphere 2017, 9, 1. [Google Scholar] [CrossRef]
- O’Connor, D.J.; Healy, D.A.; Hellebust, S.; Buters, J.T.M.; Sodeau, J.R. Using the WIBS-4 (Waveband Integrated Bioaerosol Sensor) Technique for the On-Line Detection of Pollen Grains. Aerosol Sci. Technol. 2014, 48, 341–349. [Google Scholar] [CrossRef]
- Wei, K.; Zou, Z.; Zheng, Y.; Li, J.; Shen, F.; Wu, C.; Wu, Y.; Hu, M.; Yao, M. Ambient Bioaerosol Particle Dynamics Observed during Haze and Sunny Days in Beijing. Sci. Total Environ. 2016, 550, 751–759. [Google Scholar] [CrossRef] [PubMed]
- Gabey, A.M.; Gallagher, M.W.; Whitehead, J.; Dorsey, J.R.; Kaye, P.H.; Stanley, W.R. Measurements and Comparison of Primary Biological Aerosol above and below a Tropical Forest Canopy Using a Dual Channel Fluorescence Spectrometer. Atmos. Chem. Phys. 2010, 10, 4453–4466. [Google Scholar] [CrossRef]
- Petersson Sjögren, M.; Alsved, M.; Šantl-Temkiv, T.; Bjerring Kristensen, T.; Löndahl, J. Measurement Report: Atmospheric Fluorescent Bioaerosol Concentrations Measured during 18 Months in a Coniferous Forest in the South of Sweden. Atmos. Chem. Phys. 2023, 23, 4977–4992. [Google Scholar] [CrossRef]
- Shukla, S.; Naganna, S. A Review on K-Means Data Clustering Approach. Int. J. Inf. Comput. Technol. 2014, 4, 1847–1860. [Google Scholar]
- Singh, K.; Malik, D.; Sharma, N. Evolving Limitations in K-Means Algorithm in Data Mining and Their Removal. Int. J. Comput. Eng. Manag. 2011, 12, 105–109. [Google Scholar]
- Murtagh, F.; Contreras, P. Algorithms for Hierarchical Clustering: An Overview. WIREs Data Min. Knowl. Discov. 2012, 2, 86–97. [Google Scholar] [CrossRef]
- Crawford, I.; Ruske, S.; Topping, D.O.; Gallagher, M.W. Evaluation of Hierarchical Agglomerative Cluster Analysis Methods for Discrimination of Primary Biological Aerosol. Atmos. Meas. Tech. 2015, 8, 4979–4991. [Google Scholar] [CrossRef]
- Tian, J.; Liu, Y.; Zheng, W.; Yin, L. Smog Prediction Based on the Deep Belief—BP Neural Network Model (DBN-BP). Urban Clim. 2022, 41, 101078. [Google Scholar] [CrossRef]
- Yin, L.; Wang, L.; Huang, W.; Liu, S.; Yang, B.; Zheng, W. Spatiotemporal Analysis of Haze in Beijing Based on the Multi-Convolution Model. Atmosphere 2021, 12, 1408. [Google Scholar] [CrossRef]
- Chen, J.; Liu, Z.; Yin, Z.; Liu, X.; Li, X.; Yin, L.; Zheng, W. Predict the Effect of Meteorological Factors on Haze Using BP Neural Network. Urban Clim. 2023, 51, 101630. [Google Scholar] [CrossRef]
- Manimekalai, S.; Prasath, B.; Daniel Shadrach, F.; Lakshmanan, V.; Daniya, T.; Guha, T. Artificial Neural Network with Extreme Learning Machine-Based Wastewater Treatment Systems. In Proceedings of the 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), Mysuru, India, 16–17 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Shang, K.; Chen, Z.; Liu, Z.; Song, L.; Zheng, W.; Yang, B.; Liu, S.; Yin, L. Haze Prediction Model Using Deep Recurrent Neural Network. Atmosphere 2021, 12, 1625. [Google Scholar] [CrossRef]
- Kwaśny, M.; Bombalska, A.; Kaliszewski, M.; Włodarski, M.; Kopczyński, K. Fluorescence Methods for the Detection of Bioaerosols in Their Civil and Military Applications. Sensors 2023, 23, 3339. [Google Scholar] [CrossRef]
- Xin, Z.; Chen, J.; Peng, H. Advances in Spectral Techniques for Detection of Pathogenic Microorganisms. Zoonoses 2022, 2, 8. [Google Scholar] [CrossRef]
- Markey, E.; Hourihane Clancy, J.; Martínez-Bracero, M.; Neeson, F.; Sarda-Estève, R.; Baisnée, D.; McGillicuddy, E.J.; Sewell, G.; O’Connor, D.J. A Modified Spectroscopic Approach for the Real-Time Detection of Pollen and Fungal Spores at a Semi-Urban Site Using the WIBS-4+, Part I. Sensors 2022, 22, 8747. [Google Scholar] [CrossRef]
- Liu, T.; Duan, F.; Ma, Y.; Ma, T.; Zhang, Q.; Xu, Y.; Li, F.; Huang, T.; Kimoto, T.; Zhang, Q.; et al. Classification and Sources of Extremely Severe Sandstorms Mixed with Haze Pollution in Beijing. Environ. Pollut. 2023, 322, 121154. [Google Scholar] [CrossRef]
- Xie, J.; Girshick, R.; Farhadi, A. Unsupervised Deep Embedding for Clustering Analysis. arXiv 2015. [Google Scholar] [CrossRef]
- Hernandez, M.; Perring, A.E.; McCabe, K.; Kok, G.; Granger, G.; Baumgardner, D. Chamber Catalogues of Optical and Fluorescent Signatures Distinguish Bioaerosol Classes. Atmos. Meas. Tech. 2016, 9, 3283–3292. [Google Scholar] [CrossRef]
- Savage, N.J.; Huffman, J.A. Evaluation of a Hierarchical Agglomerative Clustering Method Applied to WIBS Laboratory Data for Improved Discrimination of Biological Particles by Comparing Data Preparation Techniques. Atmos. Meas. Tech. 2018, 11, 4929–4942. [Google Scholar] [CrossRef]
- Crawford, I.; Gallagher, M.W.; Bower, K.N.; Choularton, T.W.; Flynn, M.J.; Ruske, S.; Listowski, C.; Brough, N.; Lachlan-Cope, T.; Fleming, Z.L.; et al. Real-Time Detection of Airborne Fluorescent Bioparticles in Antarctica. Atmos. Chem. Phys. 2017, 17, 14291–14307. [Google Scholar] [CrossRef]
- Crawford, I.; Lloyd, G.; Herrmann, E.; Hoyle, C.R.; Bower, K.N.; Connolly, P.J.; Flynn, M.J.; Kaye, P.H.; Choularton, T.W.; Gallagher, M.W. Observations of Fluorescent Aerosol–Cloud Interactions in the Free Troposphere at the High-Altitude Research Station Jungfraujoch. Atmos. Chem. Phys. 2016, 16, 2273–2284. [Google Scholar] [CrossRef]
- Watson, N. Meteorological Data from Palas FIDAS 200 Instrument at Manchester Air Quality Site, 2019 Onwards. Available online: https://catalogue.ceda.ac.uk/uuid/62af3c6051044460aa0a716e2204bffc (accessed on 7 August 2023).
- Forde, E.; Gallagher, M.; Walker, M.; Foot, V.; Attwood, A.; Granger, G.; Sarda-Estève, R.; Stanley, W.; Kaye, P.; Topping, D. Intercomparison of Multiple UV-LIF Spectrometers Using the Aerosol Challenge Simulator. Atmosphere 2019, 10, 797. [Google Scholar] [CrossRef]
- Savage, N.J.; Krentz, C.E.; Könemann, T.; Han, T.T.; Mainelis, G.; Pöhlker, C.; Huffman, J.A. Systematic Characterization and Fluorescence Threshold Strategies for the Wideband Integrated Bioaerosol Sensor (WIBS) Using Size-Resolved Biological and Interfering Particles. Atmos. Meas. Tech. 2017, 10, 4279–4302. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Lieberherr, G.; Auderset, K.; Calpini, B.; Clot, B.; Crouzy, B.; Gysel-Beer, M.; Konzelmann, T.; Manzano, J.; Mihajlovic, A.; Moallemi, A.; et al. Assessment of Real-Time Bioaerosol Particle Counters Using Reference Chamber Experiments. Atmos. Meas. Tech. 2021, 14, 7693–7706. [Google Scholar] [CrossRef]
- Ruske, S.; Topping, D.O.; Foot, V.E.; Morse, A.P.; Gallagher, M.W. Machine Learning for Improved Data Analysis of Biological Aerosol Using the WIBS. Atmos. Meas. Tech. 2018, 11, 6203–6230. [Google Scholar] [CrossRef]
- Forde, E.; Gallagher, M.; Foot, V.; Sarda-Esteve, R.; Crawford, I.; Kaye, P.; Stanley, W.; Topping, D. Characterisation and Source Identification of Biofluorescent Aerosol Emissions over Winter and Summer Periods in the United Kingdom. Atmos. Chem. Phys. 2019, 19, 1665–1684. [Google Scholar] [CrossRef]
- Robinson, N.H.; Allan, J.D.; Huffman, J.A.; Kaye, P.H.; Foot, V.E.; Gallagher, M. Cluster Analysis of WIBS Single-Particle Bioaerosol Data. Atmos. Meas. Tech. 2013, 6, 337–347. [Google Scholar] [CrossRef]
- Fodor, I.K. A Survey of Dimension Reduction Techniques; OSTI: Livermore, CA, USA, 2002. [Google Scholar] [CrossRef]
- Song, C.; Liu, F.; Huang, Y.; Wang, L.; Tan, T. Auto-Encoder Based Data Clustering. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 117–124. [Google Scholar] [CrossRef]
- Chang, C.-P.; Hsu, W.-C.; Liao, I.-E. Anomaly Detection for Industrial Control Systems Using K-Means and Convolutional Autoencoder. In Proceedings of the 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, 19–21 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Guo, X.; Liu, X.; Zhu, E.; Yin, J. Deep Clustering with Convolutional Autoencoders. In Neural Information Processing; Springer: Berlin/Heidelberg, Germany, 2017; pp. 373–382. [Google Scholar] [CrossRef]
- Keras-Tuner 1.3.5. Available online: https://pypi.org/project/keras-tuner/ (accessed on 17 June 2023).
- Zhang, C.; Xia, S. K-Means Clustering Algorithm with Improved Initial Center. In Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data Mining, Moscow, Russia, 23–25 January 2009; pp. 790–792. [Google Scholar] [CrossRef]
- sklearn.cluster.KMeans. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (accessed on 17 June 2023).
- sklearn.cluster.AgglomerativeClustering. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering (accessed on 26 June 2023).
- Gagolewski, M. Genieclust: Fast and Robust Hierarchical Clustering. SoftwareX 2021, 15, 100722. [Google Scholar] [CrossRef]
- Crawford, I.; Bower, K.; Topping, D.; Di Piazza, S.; Massabò, D.; Vernocchi, V.; Gallagher, M. Towards a UK Airborne Bioaerosol Climatology: Real-Time Monitoring Strategies for High Time Resolution Bioaerosol Classification and Quantification. Atmosphere 2023, 14, 1214. [Google Scholar] [CrossRef]
- Wang, X.; Wang, L. Research on Intrusion Detection Based on Feature Extraction of Autoencoder and the Improved K-Means Algorithm. In Proceedings of the 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 9–10 December 2017; pp. 352–356. [Google Scholar] [CrossRef]
- Gagolewski, M. Benchmarks (How Good Is It?). Available online: https://genieclust.gagolewski.com/weave/benchmarks_ar.html (accessed on 24 August 2023).
FL1 280 | FL2 280 | FL2 370 | ||||
---|---|---|---|---|---|---|
Excitation | Detection | Excitation | Detection | Excitation | Detection | |
WIBS-NEO/5 | 280 nm | 310–400 nm | 280 nm | 420–650 nm | 370 nm | 420–650 nm |
Level 1. Data Selection | ||||||||
---|---|---|---|---|---|---|---|---|
| x | x | x | x | x | x | ||
| x | x | ||||||
Level 2. Data Standardisation | ||||||||
| x | x | x | x | ||||
| x | x | x | x | ||||
Level 3. Cluster Technique | ||||||||
| x | x | ||||||
| x | x | ||||||
| x | x | x | x |
K-Means | HCA | GenieClust | ||||
---|---|---|---|---|---|---|
Raw | AE | Raw | AE | Raw | AE | |
Optimal No. Clusters | 6 | 6 | 8 | 6 | 8 | 11 |
Distinct Clusters | 6 | 6 | 7 | 6 | 8 | 8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Moss, M.A.N.; Hughes, D.D.; Crawford, I.; Gallagher, M.W.; Flynn, M.J.; Topping, D.O. Comparative Analysis of Traditional and Advanced Clustering Techniques in Bioaerosol Data: Evaluating the Efficacy of K-Means, HCA, and GenieClust with and without Autoencoder Integration. Atmosphere 2023, 14, 1416. https://doi.org/10.3390/atmos14091416
Moss MAN, Hughes DD, Crawford I, Gallagher MW, Flynn MJ, Topping DO. Comparative Analysis of Traditional and Advanced Clustering Techniques in Bioaerosol Data: Evaluating the Efficacy of K-Means, HCA, and GenieClust with and without Autoencoder Integration. Atmosphere. 2023; 14(9):1416. https://doi.org/10.3390/atmos14091416
Chicago/Turabian StyleMoss, Maxamillian A. N., Dagen D. Hughes, Ian Crawford, Martin W. Gallagher, Michael J. Flynn, and David O. Topping. 2023. "Comparative Analysis of Traditional and Advanced Clustering Techniques in Bioaerosol Data: Evaluating the Efficacy of K-Means, HCA, and GenieClust with and without Autoencoder Integration" Atmosphere 14, no. 9: 1416. https://doi.org/10.3390/atmos14091416
APA StyleMoss, M. A. N., Hughes, D. D., Crawford, I., Gallagher, M. W., Flynn, M. J., & Topping, D. O. (2023). Comparative Analysis of Traditional and Advanced Clustering Techniques in Bioaerosol Data: Evaluating the Efficacy of K-Means, HCA, and GenieClust with and without Autoencoder Integration. Atmosphere, 14(9), 1416. https://doi.org/10.3390/atmos14091416