Clustering with Uncertainty: A Literature Review to Address a Cross-Domain Perspective †
Abstract
:1. Introduction
1.1. Related Work
1.2. Previous Work
- Explanations have been enhanced and a conceptualization (Section 2 has been added, and most parts of the paper have been broadened accordingly).
- Generic improvement, as the paper has been holistically revised in all its parts.
1.3. Structure of the Paper
2. Clustering with Uncertainty
3. Methodology and Approach
3.1. Selection Criteria and Saturation
3.2. Analysis Framework and Limitations
4. A Cross-Domain Analysis
4.1. Approach Analysis
4.2. Generic-Purpose Solutions
4.3. Domain-Specific Solutions
4.4. Domain Analysis
5. Discussion
5.1. Overview
5.2. Major Gaps and Challenges
5.3. Consolidation and Operationalization Through Open Software
5.4. Clustering in a Data- and Computationally Intensive Society: Uncertainty Propagation
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cormode, G.; McGregor, A. Approximation algorithms for clustering uncertain data. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Seattle, WA, USA, 18–23 June 2008; pp. 191–200. [Google Scholar]
- Weng, C.H.; Chen, Y.L. Mining fuzzy association rules from uncertain data. Knowl. Inf. Syst. 2010, 23, 129–152. [Google Scholar] [CrossRef]
- D’Urso, P. Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review. Inf. Sci. 2017, 400, 30–62. [Google Scholar] [CrossRef]
- Kuczenski, B. False confidence: Are we ignoring significant sources of uncertainty? Int. J. Life Cycle Assess. 2019, 24, 1760–1764. [Google Scholar] [CrossRef]
- Griffin, S.C.; Claxton, K.P.; Palmer, S.J.; Sculpher, M.J. Dangerous omissions: The consequences of ignoring decision uncertainty. Health Econ. 2011, 20, 212–224. [Google Scholar] [CrossRef] [PubMed]
- Brodlie, K.; Allendes Osorio, R.; Lopes, A. A review of uncertainty in data visualization. In Expanding the Frontiers of Visual Analytics and Visualization; Springer: Berlin/Heidelberg, Germany, 2012; pp. 81–109. [Google Scholar]
- Nuijten, M.; Mittendorf, T.; Persson, U. Practical issues in handling data input and uncertainty in a budget impact analysis. Eur. J. Health Econ. 2011, 12, 231–241. [Google Scholar] [CrossRef]
- Karimi, J.; Somers, T.M.; Gupta, Y.P. Impact of environmental uncertainty and task characteristics on user satisfaction with data. Inf. Syst. Res. 2004, 15, 175–193. [Google Scholar] [CrossRef]
- McMillan, H.K.; Westerberg, I.K.; Krueger, T. Hydrological data uncertainty and its implications. Wiley Interdiscip. Rev. Water 2018, 5, e1319. [Google Scholar] [CrossRef]
- Hariri, R.H.; Fredericks, E.M.; Bowers, K.M. Uncertainty in big data analytics: Survey, opportunities, and challenges. J. Big Data 2019, 6, 44. [Google Scholar] [CrossRef]
- Wang, X.; He, Y. Learning from uncertainty for big data: Future analytical challenges and strategies. IEEE Syst. Man Cybern. Mag. 2016, 2, 26–31. [Google Scholar] [CrossRef]
- Kamal, A.; Dhakal, P.; Javaid, A.Y.; Devabhaktuni, V.K.; Kaur, D.; Zaientz, J.; Marinier, R. Recent advances and challenges in uncertainty visualization: A survey. J. Vis. 2021, 24, 861–890. [Google Scholar] [CrossRef]
- Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion 2021, 76, 243–297. [Google Scholar] [CrossRef]
- Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef] [PubMed]
- Wierzchoń, S.T.; Kłopotek, M.A. Modern Algorithms of Cluster Analysis; Springer: Berlin/Heidelberg, Germany, 2018; Volume 34. [Google Scholar]
- Sinaga, K.P.; Yang, M.S. Unsupervised K-means clustering algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
- Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 132–149. [Google Scholar]
- Castellanos, A.; Cigarrán, J.; García-Serrano, A. Formal concept analysis for topic detection: A clustering quality experimental analysis. Inf. Syst. 2017, 66, 24–42. [Google Scholar] [CrossRef]
- Lee, C.S.; Kao, Y.F.; Kuo, Y.H.; Wang, M.H. Automated ontology construction for unstructured text documents. Data Knowl. Eng. 2007, 60, 547–566. [Google Scholar] [CrossRef]
- Pileggi, S.F. Ontological Modelling and Social Networks: From Expert Validation to Consolidated Domains. In Lectures Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2023; pp. 672–687. [Google Scholar]
- Tew, C.; Giraud-Carrier, C.; Tanner, K.; Burton, S. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min. Knowl. Discov. 2014, 28, 1004–1045. [Google Scholar] [CrossRef]
- Jiang, B.; Pei, J.; Tao, Y.; Lin, X. Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 2011, 25, 751–763. [Google Scholar] [CrossRef]
- Hüllermeier, E. Uncertainty in clustering and classification. In Proceedings of the Scalable Uncertainty Management: 4th International Conference, SUM 2010, Toulouse, France, 27–29 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 16–19. [Google Scholar]
- de Kok, J.W.T.M.; van Rosmalen, F.; Koeze, J.; Keus, F.; van Kuijk, S.M.J.; Forte, J.C.; Schnabel, R.M.; Driessen, R.G.H.; van Herpt, T.T.W.; Sels, J.-W.E.M.; et al. Deep embedded clustering generalisability and adaptation for integrating mixed datatypes: Two critical care cohorts. Sci. Rep. 2024, 14, 1045. [Google Scholar] [CrossRef]
- Scarselli, F.; Gori, M.; Chung Tsoi, A.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. Learn. Syst. 2009, 20, 61–80. [Google Scholar] [CrossRef]
- Aggarwal, C.C.; Philip, S.Y. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 2008, 21, 609–623. [Google Scholar] [CrossRef]
- Pileggi, S.F. A Cross-Domain Perspective to Clustering with Uncertainty. In International Conference on Computational Science; Springer: Cham, Switzerland, 2024; pp. 295–308. [Google Scholar]
- Cheng, Y.; Fu, K.S. Conceptual clustering in knowledge organization. IEEE Trans. Pattern Anal. Mach. Intell. 1985, 5, 592–598. [Google Scholar] [CrossRef]
- Lee, R.C. Clustering analysis and its applications. In Advances in Information Systems Science: Volume 8; Springer: Berlin/Heidelberg, Germany, 1981; pp. 169–292. [Google Scholar]
- Thorndike, R.L. Who belongs in the family? Psychometrika 1953, 18, 267–276. [Google Scholar] [CrossRef]
- Saxena, A.; Prasad, M.; Gupta, A.; Bharill, N.; Patel, O.P.; Tiwari, A.; Er, M.J.; Ding, W.; Lin, C.T. A review of clustering techniques and developments. Neurocomputing 2017, 267, 664–681. [Google Scholar] [CrossRef]
- Yang, M.S. A survey of fuzzy clustering. Math. Comput. Model. 1993, 18, 1–16. [Google Scholar] [CrossRef]
- Zhang, C.; Butepage, J.; Kjellstrom, H.; Mandt, S. Advances in Variational Inference. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2008–2026. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, Z.; Li, S.; Guo, Y.; Liu, Q.; Wang, G. Cloud-Cluster: An uncertainty clustering algorithm based on cloud model. Knowl.-Based Syst. 2023, 263, 110261. [Google Scholar] [CrossRef]
- Sharma, K.K.; Seal, A. Outlier-robust multi-view clustering for uncertain data. Knowl.-Based Syst. 2021, 211, 106567. [Google Scholar] [CrossRef]
- Sharma, K.K.; Seal, A. Multi-view spectral clustering for uncertain objects. Inf. Sci. 2021, 547, 723–745. [Google Scholar] [CrossRef]
- Sharma, K.K.; Seal, A. Modeling uncertain data using Monte Carlo integration method for clustering. Expert Syst. Appl. 2019, 137, 100–116. [Google Scholar] [CrossRef]
- Dalton, L.A.; Benalcázar, M.E.; Dougherty, E.R. Optimal clustering under uncertainty. PLoS ONE 2018, 13, e0204627. [Google Scholar] [CrossRef]
- Huang, D.; Wang, C.D.; Lai, J.H. Locally weighted ensemble clustering. IEEE Trans. Cybern. 2017, 48, 1460–1473. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Zhang, X.; Zhang, X.; Cui, Y. Self-adapted mixture distance measure for clustering uncertain data. Knowl.-Based Syst. 2017, 126, 33–47. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, H.; Zhang, X. Novel density-based and hierarchical density-based clustering algorithms for uncertain data. Neural Netw. 2017, 93, 240–255. [Google Scholar] [CrossRef]
- Gullo, F.; Ponti, G.; Tagarelli, A.; Greco, S. An information-theoretic approach to hierarchical clustering of uncertain data. Inf. Sci. 2017, 402, 199–215. [Google Scholar] [CrossRef]
- Xiong, C.; Johnson, D.M.; Corso, J.J. Active clustering with model-based uncertainty reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 5–17. [Google Scholar] [CrossRef]
- Huang, D.; Lai, J.H.; Wang, C.D. Robust ensemble clustering using probability trajectories. IEEE Trans. Knowl. Data Eng. 2015, 28, 1312–1326. [Google Scholar] [CrossRef]
- Xu, L.; Hu, Q.; Hung, E.; Chen, B.; Tan, X.; Liao, C. Large margin clustering on uncertain data by considering probability distribution similarity. Neurocomputing 2015, 158, 81–89. [Google Scholar] [CrossRef]
- Züfle, A.; Emrich, T.; Schmid, K.A.; Mamoulis, N.; Zimek, A.; Renz, M. Representative clustering of uncertain data. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2014; pp. 243–252. [Google Scholar]
- Wauthier, F.L.; Jojic, N.; Jordan, M.I. Active spectral clustering via iterative uncertainty reduction. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1339–1347. [Google Scholar]
- Gullo, F.; Ponti, G.; Tagarelli, A. Minimizing the variance of cluster mixture models for clustering uncertain objects. Stat. Anal. Data Mining ASA Data Sci. J. 2013, 6, 116–135. [Google Scholar] [CrossRef]
- Kao, B.; Lee, S.D.; Lee, F.K.; Cheung, D.W.; Ho, W.S. Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans. Knowl. Data Eng. 2010, 22, 1219–1233. [Google Scholar]
- Günnemann, S.; Kremer, H.; Seidl, T. Subspace clustering for uncertain data. In Proceedings of the 2010 SIAM International Conference on Data Mining, Columbus, OH, USA, 29 April–1 May 2010; pp. 385–396. [Google Scholar]
- Guha, S.; Munagala, K. Exceeding expectations and clustering uncertain data. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Seattle, WA, USA, 18–23 June 2009; pp. 269–278. [Google Scholar]
- Volk, P.B.; Rosenthal, F.; Hahmann, M.; Habich, D.; Lehner, W. Clustering uncertain data with possible worlds. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; pp. 1625–1632. [Google Scholar]
- Gullo, F.; Ponti, G.; Tagarelli, A. Clustering uncertain data via k-medoids. In Lecture Notes on Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; pp. 229–242. [Google Scholar]
- Gullo, F.; Ponti, G.; Tagarelli, A.; Greco, S. A hierarchical algorithm for clustering uncertain data via an information-theoretic approach. In Proceedings of the 2008 28th IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 821–826. [Google Scholar]
- Kao, B.; Lee, S.D.; Cheung, D.W.; Ho, W.S.; Chan, K. Clustering uncertain data using voronoi diagrams. In Proceedings of the 2008 28th IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 333–342. [Google Scholar]
- Rhee, F.C.H. Uncertain fuzzy clustering: Insights and recommendations. IEEE Comput. Intell. Mag. 2007, 1, 44–56. [Google Scholar]
- Hwang, C.; Rhee, F.C.H. Uncertain fuzzy clustering: Interval type-2 fuzzy approach to c-means. IEEE Trans. Fuzzy Syst. 2007, 15, 107–120. [Google Scholar] [CrossRef]
- Kriegel, H.P.; Pfeifle, M. Density-based clustering of uncertain data. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, USA, 21–24 August 2005; pp. 672–677. [Google Scholar]
- Kriegel, H.P.; Pfeifle, M. Hierarchical density-based clustering of uncertain data. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05), Houston, TX, USA, 27–30 November 2005; p. 4. [Google Scholar]
- Bhavsar, S.; Pitchumani, R.; Maack, J.; Satkauskas, I.; Reynolds, M.; Jones, W. Stochastic economic dispatch of wind power under uncertainty using clustering-based extreme scenarios. Electr. Power Syst. Res. 2024, 229, 110158. [Google Scholar] [CrossRef]
- Rendon, N.; Giraldo, J.H.; Bouwmans, T.; Rodríguez-Buritica, S.; Ramirez, E.; Isaza, C. Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning. Eng. Appl. Artif. Intell. 2023, 124, 106635. [Google Scholar] [CrossRef]
- He, Y.; Yang, J.P.; Li, Y.F. A three-stage automated modal identification framework for bridge parameters based on frequency uncertainty and density clustering. Eng. Struct. 2022, 255, 113891. [Google Scholar] [CrossRef]
- Hussain, S.F.; Butt, I.A.; Hanif, M.; Anwar, S. Clustering uncertain graphs using ant colony optimization (ACO). Neural Comput. Appl. 2022, 34, 11721–11738. [Google Scholar] [CrossRef]
- Wang, P.; Ding, C.; Tan, W.; Gong, M.; Jia, K.; Tao, D. Uncertainty-aware clustering for unsupervised domain adaptive object re-identification. IEEE Trans. Multimed. 2022, 25, 2624–2635. [Google Scholar] [CrossRef]
- Hewitt, M.; Ortmann, J.; Rei, W. Decision-based scenario clustering for decision-making under uncertainty. Ann. Oper. Res. 2022, 315, 747–771. [Google Scholar] [CrossRef]
- Prabhu, V.; Chandrasekaran, A.; Saenko, K.; Hoffman, J. Active domain adaptation via clustering uncertainty-weighted embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8505–8514. [Google Scholar]
- Debnath, B.; Coviello, G.; Yang, Y.; Chakradhar, S. UAC: An uncertainty-aware face clustering algorithm. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3487–3495. [Google Scholar]
- Haddadpour, H.; Niri, M.E. Uncertainty assessment in reservoir performance prediction using a two-stage clustering approach: Proof of concept and field application. J. Pet. Sci. Eng. 2021, 204, 108765. [Google Scholar] [CrossRef]
- Shi, W.; Chen, W.N.; Gu, T.; Jin, H.; Zhang, J. Handling uncertainty in financial decision making: A clustering estimation of distribution algorithm with simplified simulation. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 5, 42–56. [Google Scholar] [CrossRef]
- Li, Y.; Chung, S.H. Ride-sharing under travel time uncertainty: Robust optimization and clustering approaches. Comput. Ind. Eng. 2020, 149, 106601. [Google Scholar] [CrossRef]
- Huang, J.; Gong, S.; Zhu, X. Deep semantic clustering by partition confidence maximisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8849–8858. [Google Scholar]
- Naouali, S.; Salem, S.B.; Chtourou, Z. Uncertainty mode selection in categorical clustering using the rough set theory. Expert Syst. Appl. 2020, 158, 113555. [Google Scholar] [CrossRef]
- Charwand, M.; Gitizadeh, M.; Siano, P.; Chicco, G.; Moshavash, Z. Clustering of electrical load patterns and time periods using uncertainty-based multi-level amplitude thresholding. Int. J. Electr. Power Energy Syst. 2020, 117, 105624. [Google Scholar] [CrossRef]
- Kang, B.; Kim, S.; Jung, H.; Choe, J.; Lee, K. Efficient assessment of reservoir uncertainty using distance-based clustering: A review. Energies 2019, 12, 1859. [Google Scholar] [CrossRef]
- Shukla, A.K.; Muhuri, P.K. Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Eng. Appl. Artif. Intell. 2019, 77, 268–282. [Google Scholar] [CrossRef]
- Tabesh, M.; Askari-Nasab, H. Clustering mining blocks in presence of geological uncertainty. Min. Technol. 2019, 128, 162–176. [Google Scholar] [CrossRef]
- Han, K.; Gui, F.; Xiao, X.; Tang, J.; He, Y.; Cao, Z.; Huang, H. Efficient and effective algorithms for clustering uncertain graphs. Proc. VLDB Endow. 2019, 12, 667–680. [Google Scholar] [CrossRef]
- Afridi, M.K.; Azam, N.; Yao, J.; Alanazi, E. A three-way clustering approach for handling missing data using GTRS. Int. J. Approx. Reason. 2018, 98, 11–24. [Google Scholar] [CrossRef]
- Ceccarello, M.; Fantozzi, C.; Pietracaprina, A.; Pucci, G.; Vandin, F. Clustering uncertain graphs. Proc. VLDB Endow. 2017, 11, 472–484. [Google Scholar] [CrossRef]
- Yao, C.; Chen, M.; Hong, Y.Y. Novel adaptive multi-clustering algorithm-based optimal ESS sizing in ship power system considering uncertainty. IEEE Trans. Power Syst. 2017, 33, 307–316. [Google Scholar] [CrossRef]
- Chang, Y.; Chen, J.; Cho, M.H.; Castaldi, P.J.; Silverman, E.K.; Dy, J.G. Multiple clustering views from multiple uncertain experts. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 674–683. [Google Scholar]
- Zhou, J.; Chen, L.; Chen, C.P.; Wang, Y.; Li, H.X. Uncertain data clustering in distributed peer-to-peer networks. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2392–2406. [Google Scholar] [CrossRef]
- Halim, Z.; Waqas, M.; Baig, A.R.; Rashid, A. Efficient clustering of large uncertain graphs using neighborhood information. Int. J. Approx. Reason. 2017, 90, 274–291. [Google Scholar] [CrossRef]
- Shukla, A.; Singh, S. Clustering based unit commitment with wind power uncertainty. Energy Convers. Manag. 2016, 111, 89–102. [Google Scholar] [CrossRef]
- Schubert, E.; Koos, A.; Emrich, T.; Züfle, A.; Schmid, K.A.; Zimek, A. A framework for clustering uncertain data. Proc. VLDB Endow. 2015, 8, 1976–1979. [Google Scholar] [CrossRef]
- Jin, C.; Yu, J.X.; Zhou, A.; Cao, F. Efficient clustering of uncertain data streams. Knowl. Inf. Syst. 2014, 40, 509–539. [Google Scholar] [CrossRef]
- Luo, Q.; Peng, Y.; Peng, X.; Saddik, A.E. Uncertain data clustering-based distance estimation in wireless sensor networks. Sensors 2014, 14, 6584–6605. [Google Scholar] [CrossRef]
- Chen, Y.; Lim, S.H.; Xu, H. Weighted graph clustering with non-uniform uncertainties. In Proceedings of the International Conference on Machine Learning. PMLR, Beijing, China, 21–26 June 2014; pp. 1566–1574. [Google Scholar]
- Ghosh, S.; Mitra, S. Clustering large data with uncertainty. Appl. Soft Comput. 2013, 13, 1639–1645. [Google Scholar] [CrossRef]
- Liu, L.; Jin, R.; Aggarwal, C.; Shen, Y. Reliable clustering on uncertain graphs. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10 December 2012; pp. 459–468. [Google Scholar]
- Pelekis, N.; Kopanakis, I.; Kotsifakos, E.E.; Frentzos, E.; Theodoridis, Y. Clustering uncertain trajectories. Knowl. Inf. Syst. 2011, 28, 117–147. [Google Scholar] [CrossRef]
- Meesuksabai, W.; Kangkachit, T.; Waiyamai, K. Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. In Proceedings of the Advanced Data Mining and Applications: 7th International Conference, ADMA 2011, Beijing, China, 17–19 December 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 27–40. [Google Scholar]
- Huang, G.Y.; Liang, D.P.; Hu, C.Z.; Ren, J.D. An algorithm for clustering heterogeneous data streams with uncertainty. In Proceedings of the 2010 International Conference on Machine Learning and Cybernetics, Qingdao, China, 11–14 July 2010; Volume 4, pp. 2059–2064. [Google Scholar]
- Aggarwal, C.C. On high dimensional projected clustering of uncertain data streams. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; pp. 1152–1154. [Google Scholar]
- Pelekis, N.; Kopanakis, I.; Kotsifakos, E.; Frentzos, E.; Theodoridis, Y. Clustering trajectories of moving objects in an uncertain world. In Proceedings of the 2009 9th IEEE international Conference on Data Mining, Miami Beach, FL, USA, 6–9 December 2009; pp. 417–427. [Google Scholar]
- Aggarwal, C.C.; Philip, S.Y. A framework for clustering uncertain data streams. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Cancun, Mexico, 7–12 April 2008; pp. 150–159. [Google Scholar]
- Xia, Y.; Xi, B. Conceptual clustering categorical data with uncertainty. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), Patras, Greece, 29–31 October 2007; Volume 1, pp. 329–336. [Google Scholar]
- Liu, X.; Lin, K.K.; Andersen, B.; Rattray, M. Including probe-level uncertainty in model-based gene expression clustering. BMC Bioinform. 2007, 8, 98. [Google Scholar] [CrossRef]
- Suzuki, R.; Shimodaira, H. Pvclust: An R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 2006, 22, 1540–1542. [Google Scholar] [CrossRef]
- Chau, M.; Cheng, R.; Kao, B.; Ng, J. Uncertain data mining: An example in clustering location data. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, 9–12 April 2006; pp. 199–204. [Google Scholar]
- Ran, X.; Xi, Y.; Lu, Y.; Wang, X.; Lu, Z. Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artif. Intell. Rev. 2023, 56, 8219–8264. [Google Scholar] [CrossRef]
- Boongoen, T.; Iam-On, N. Cluster ensembles: A survey of approaches with recent extensions and applications. Comput. Sci. Rev. 2018, 28, 1–25. [Google Scholar] [CrossRef]
- Fu, L.; Lin, P.; Vasilakos, A.V.; Wang, S. An overview of recent multi-view clustering. Neurocomputing 2020, 402, 148–161. [Google Scholar] [CrossRef]
- Ibrahim, D. An Overview of Soft Computing. Procedia Comput. Sci. 2016, 102, 34–38. [Google Scholar] [CrossRef]
- Vicente-Saez, R.; Martinez-Fuentes, C. Open Science now: A systematic literature review for an integrated definition. J. Bus. Res. 2018, 88, 428–436. [Google Scholar] [CrossRef]
- Bonaccorsi, A.; Rossi, C. Why open source software can succeed. Res. Policy 2003, 32, 1243–1258. [Google Scholar] [CrossRef]
- Pfenninger, S.; DeCarolis, J.; Hirth, L.; Quoilin, S.; Staffell, I. The importance of open data and software: Is energy research lagging behind? Energy Policy 2017, 101, 211–215. [Google Scholar] [CrossRef]
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
- Fan, C.Y.; Fan, P.S.; Chan, T.Y.; Chang, S.H. Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Syst. Appl. 2012, 39, 8844–8851. [Google Scholar] [CrossRef]
- Pileggi, S.F. A hybrid approach to analysing large scale surveys: Individual values, opinions and perceptions. SN Soc. Sci. 2024, 4, 144. [Google Scholar] [CrossRef]
- Oyewole, G.J.; Thopil, G.A. Data clustering: Application and trends. Artif. Intell. Rev. 2023, 56, 6439–6475. [Google Scholar] [CrossRef]
- Mardani, A.; Liao, H.; Nilashi, M.; Alrasheedi, M.; Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod. 2020, 275, 122942. [Google Scholar] [CrossRef]
Title/Ref. | Year | Approach |
---|---|---|
Cloud-Cluster: An uncertainty clustering algorithm based on cloud model [34] | 2023 | Method |
Outlier-robust multi-view clustering for uncertain data [35] | 2021 | Multi-view Clustering |
Multi-view spectral clustering for uncertain objects [36] | 2021 | Multi-view Clustering |
Modeling uncertain data using Monte Carlo integration method for clustering [37] | 2019 | Monte-Carlo |
Optimal clustering under uncertainty [38] | 2018 | Method |
Locally weighted ensemble clustering [39] | 2017 | Ensemble Clustering |
Self-adapted mixture distance measure for clustering uncertain data [40] | 2017 | Method |
Novel density-based and hierarchical density-based clustering algorithms for uncertain data [41] | 2017 | Hierarchical Clustering |
An information-theoretic approach to hierarchical clustering of uncertain data [42] | 2017 | Hierarchical Clustering |
Active Clustering with Model-Based Uncertainty Reduction [43] | 2016 | Active Clustering |
Robust ensemble clustering using probability trajectories [44] | 2015 | Ensemble Clustering |
Large margin clustering on uncertain data by considering probability distribution similarity [45] | 2015 | PD Similarity |
Representative clustering of uncertain data [46] | 2014 | Framework |
Active spectral clustering via iterative uncertainty reduction [47] | 2012 | Active Clustering |
Minimizing the variance of cluster mixture models for clustering uncertain objects [48] | 2012 | Method |
Clustering uncertain data based on probability distribution similarity [22] | 2011 | PD Similarity |
Clustering uncertain data using voronoi diagrams and r-tree index [49] | 2010 | Method |
Subspace clustering for uncertain data [50] | 2010 | Sub-space clustering |
Exceeding expectations and clustering uncertain data [51] | 2009 | Optimization |
Clustering Uncertain Data with Possible Worlds [52] | 2009 | Method |
Clustering Uncertain Data Via K-Medoids [53] | 2008 | Method |
A hierarchical algorithm for clustering uncertain data via an information-theoretic approach [54] | 2008 | Hierarchical Clustering |
Clustering Uncertain Data Using Voronoi Diagrams [55] | 2008 | Voronoi diagrams |
Uncertain fuzzy clustering: Insights and recommendations [56] | 2007 | Fuzzy Logic |
Uncertain fuzzy clustering: Interval type-2 fuzzy approach to c-means [57] | 2007 | Fuzzy Logic |
Density-based clustering of uncertain data [58] | 2005 | Fuzzy Logic |
Hierarchical density-based clustering of uncertain data [59] | 2005 | Fuzzy Logic |
Title/Ref. | Year | Approach | Domain |
---|---|---|---|
Stochastic economic dispatch of wind power under uncertainty using clustering-based extreme scenarios [60] | 2024 | Stochastic Model | Energy |
Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learning [61] | 2022 | Fuzzy Logic | Machine Learning |
A three-stage automated modal identification framework for bridge parameters based on frequency uncertainty and density clustering [62] | 2022 | Method | Engineering |
Clustering uncertain graphs using ant colony optimization (ACO) [63] | 2022 | Optimization | Graphs |
Uncertainty-Aware Clustering for Unsupervised Domain Adaptive Object Re-Identification [64] | 2022 | Hierarchical Clustering | Machine Learning |
Decision-based scenario clustering for decision-making under uncertainty [65] | 2022 | Method | Decision Making |
Active domain adaptation via clustering uncertainty-weighted embeddings [66] | 2021 | Active Learning | Machine Learning |
UAC: An Uncertainty-Aware Face Clustering Algorithm [67] | 2021 | Method | Face Recognition |
Uncertainty assessment in reservoir performance prediction using a two-stage clustering approach: Proof of concept and field application [68] | 2021 | Method | Petroleum Science |
Handling uncertainty in financial decision making: a clustering estimation of distribution algorithm with simplified simulation [69] | 2020 | Method | Decision Making |
Ride-sharing under travel time uncertainty: Robust optimization and clustering approaches [70] | 2020 | Method | Transportation |
Deep semantic clustering by partition confidence maximisation [71] | 2020 | Method | Machine Learning |
Uncertainty mode selection in categorical clustering using the rough set theory [72] | 2020 | Rough Set | Categorical Data |
Clustering of electrical load patterns and time periods using uncertainty-based multi-level amplitude thresholding [73] | 2020 | Fuzzy Logic | Energy |
Efficient Assessment of Reservoir Uncertainty Using Distance-Based Clustering: A Review [74] | 2019 | Review | Petroleum Science |
Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets [75] | 2019 | Fuzzy Logic | Genetics |
Clustering mining blocks in presence of geological uncertainty [76] | 2019 | Method | Geology |
Efficient and effective algorithms for clustering uncertain graphs [77] | 2019 | Optimization | Graphs |
A three-way clustering approach for handling missing data using GTRS [78] | 2018 | Three-way Clustering | Missing Data |
Clustering uncertain graphs [79] | 2017 | Method | Graphs |
Novel adaptive multi-clustering algorithm-based optimal ESS sizing in ship power system considering uncertainty [80] | 2017 | Optimization | Energy |
Multiple clustering views from multiple uncertain experts [81] | 2017 | Bayesian Model | Collaborative Environments |
Uncertain data clustering in distributed peer-to-peer networks [82] | 2017 | Distributed Clustering | P2P Network |
Efficient clustering of large uncertain graphs using neighborhood information [83] | 2017 | Framework | Graphs |
Clustering based unit commitment with wind power uncertainty [84] | 2016 | Method | Energy |
A framework for clustering uncertain data [85] | 2015 | Framework | Visualization |
Efficient clustering of uncertain data streams [86] | 2014 | Method | Data Stream |
Uncertain data clustering-based distance estimation in wireless sensor networks [87] | 2014 | Method | Wireless Sensor Network |
Weighted graph clustering with non-uniform uncertainties [88] | 2014 | Optimization | Graphs |
Clustering large data with uncertainty [89] | 2013 | Fuzzy Logic | Large Data |
Reliable clustering on uncertain graphs [90] | 2012 | Possible Worlds | Graphs |
Clustering uncertain trajectories [91] | 2011 | Method | Location Data |
Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty [92] | 2011 | Method Data Stream | |
An algorithm for clustering heterogeneous data streams with uncertainty [93] | 2010 | Method | Data Stream |
On high dimensional projected clustering of uncertain data streams [94] | 2009 | Method | Data Stream |
Clustering trajectories of moving objects in an uncertain world [95] | 2009 | Method | Location Data |
A Framework for Clustering Uncertain Data Streams [96] | 2008 | Method | Data Stream |
Conceptual clustering categorical data with uncertainty [97] | 2007 | Quality Assessment | Categorical Data |
Including probe-level uncertainty in model-based gene expression clustering [98] | 2007 | Method | Genetics |
Pvclust: an r package for assessing the uncertainty in hierarchical clustering [99] | 2006 | Hierarchical Clustering | Genetics |
Uncertain Data Mining: An Example in Clustering Location Data [100] | 2006 | UK-Means | Location Data |
Gap | |
---|---|
G1 | Lack of freely available implementations to provide ready-to-use computational resources. |
G2 | The relationship between generic-purpose and domain-specific solutions os not always clear, namely, ad-hoc solutions do not always place an emphasis on their characteristics and peculiarities. |
G3 | A fine-grained application-specific approach that does not facilitate re-use in a different context. |
G4 | There is a tangible lack of abstraction in domain-specific approaches, which often focus on a specific problem without formally defining it. This does not allow for reasoning in terms of the classes of problems. |
G5 | Despite the existence of a well-identified class of methods and techniques, the solutions proposed in the different works are not always discussed in context looking at the existing body of knowledge, namely the already existing approaches. |
G6 | The propagation of uncertainty across the different steps in the current technological context characterised by data and computational intensive solutions is not fully analysed and critically discussed. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pileggi, S.F. Clustering with Uncertainty: A Literature Review to Address a Cross-Domain Perspective. Informatics 2025, 12, 38. https://doi.org/10.3390/informatics12020038
Pileggi SF. Clustering with Uncertainty: A Literature Review to Address a Cross-Domain Perspective. Informatics. 2025; 12(2):38. https://doi.org/10.3390/informatics12020038
Chicago/Turabian StylePileggi, Salvatore Flavio. 2025. "Clustering with Uncertainty: A Literature Review to Address a Cross-Domain Perspective" Informatics 12, no. 2: 38. https://doi.org/10.3390/informatics12020038
APA StylePileggi, S. F. (2025). Clustering with Uncertainty: A Literature Review to Address a Cross-Domain Perspective. Informatics, 12(2), 38. https://doi.org/10.3390/informatics12020038