A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov–Nagumo Average
Abstract
:1. Introduction
2. Materials and Methods
2.1. Generalized Energy Function
2.1.1. Pareto Distribution
2.1.2. Fréchet Distribution
2.2. Estimation of Variances and Mixing Proportions in Clusters
2.3. Estimating Algorithm
Algorithm 1: Pareto clustering |
1. Set initial values for . 2. Repeat the following steps for and until convergence. 3. |
2.4. Evaluation of Clustering Methods
2.4.1. Metrics
2.4.2. Simulation Studies
2.4.3. Benchmark Data Analysis
3. Results
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Derivation of the Volume Constant vd
Appendix B. Monotone Decrease of the Generalized Energy Function
Appendix C. Perspective Plots and Contour Plots for pτ,β (θ *)
References
- Rokach, L.; Maimon, O. Clustering Methods. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2005. [Google Scholar]
- Tukey, J.W. We need both exploratory and confirmatory. Am. Stat. 1980, 314, 23–25. [Google Scholar]
- Dubes, R.; Jain, A.K. Clustering methodologies in exploratory data analysis. Adv. Comput. 1980, 19, 113–228. [Google Scholar]
- Jain, A.K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
- Ghosh, S.; Dubey, S. Comparative analysis of k-means and fuzzy c-means algorithms. Int. J. Adv. Comput. Sci. Appl. 2013, 4, 35–39. [Google Scholar] [CrossRef] [Green Version]
- Komori, O.; Eguchi, S.; Ikeda, S.; Okamura, H.; Ichinokawa, M.; Nakayama, S. An asymmetric logistic regression model for ecological data. Methods Ecol. Evol. 2016, 7, 249–260. [Google Scholar] [CrossRef]
- Komori, O.; Eguchi, S.; Saigusa, Y.; Okamura, H.; Ichinokawa, M. Robust bias correction model for estimation of global trend in marine populations. Ecosphere 2017, 8, 1–9. [Google Scholar] [CrossRef]
- Omae, K.; Komori, O.; Eguchi, S. Quasi-linear score for capturing heterogeneous structure in biomarkers. BMC Bioinform. 2017, 18, 308. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Naudts, J. Generalised Thermostatistics; Springer: London, UK, 2011. [Google Scholar]
- Rose, K.; Gurewitz, E.; Fox, G.C. Statistical mechanics and phase transitions in clustering. Phys. Rev. Lett. 1990, 65, 945–948. [Google Scholar] [CrossRef]
- Beirlant, J.; Goegebeur, Y.; Segers, J.; Teugels, J.L.; Waal, D.D.; Ferro, C. Statistics of Extremes: Theory and Applications; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar]
- Cox, D.R. Note on grouping. J. Am. Stat. Assoc. 1957, 52, 543–547. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods of classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Cam, L.M.L., Neyman, J., Eds.; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
- Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–2003. [Google Scholar] [CrossRef]
- Hathaway, R.J.; Bezdek, J.C. Optimization of clustering criteria by reformulation. IEEE Trans. Fuzzy Syst. 1995, 3, 241–245. [Google Scholar] [CrossRef]
- Yu, J. General C-means clustering model. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1197–1211. [Google Scholar] [PubMed]
- Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
- Eguchi, S.; Komori, O. Path Connectedness on a Space of Probability Density Functions. In Geometric Science of Information: Second International Conference, GSI 2015; Nielsen, F., Barbaresco, F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; p. 615. [Google Scholar]
- Komori, O.; Eguchi, S.; Saigusa, Y.; Kusumoto, B.; Kubota, Y. Sampling bias correction in species distribution models by quasi-linear Poisson point process. Ecol. Inform. 2020, 55, 1–11. [Google Scholar] [CrossRef]
- Nelsen, R.B. An Introduction to Copulas; Springer: New York, NY, USA, 2006. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. 1977, 39, 1–38. [Google Scholar]
- Scrucca, L.; Fop, M.; Murphy, T.B.; Raftery, A.E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 2016, 8, 289–317. [Google Scholar] [CrossRef] [Green Version]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. A k-means clustering algorithm. J. R. Stat. Soc. Ser. 1979, 28, 100–108. [Google Scholar]
- Reynolds, A.P.; Richards, G.; de la Iglesia, B.; Rayward-Smith, V.J. Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorithms 2006, 5, 475–504. [Google Scholar] [CrossRef]
- Fränti, P.; Rezaei, M.; Zhao, Q. Centroid index: Cluster level similarity measure. Pattern Recognit. 2014, 47, 3034–3045. [Google Scholar] [CrossRef]
- Sofaer, H.R.; Hoeting, J.A.; Jarnevich, C.S. The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 2019, 10, 565–577. [Google Scholar] [CrossRef]
- Amigó, E.; Gonzalo, J.; Artiles, J.; Verdejo, F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 2009, 12, 461–486. [Google Scholar] [CrossRef] [Green Version]
- Van Rijsbergen, C. Foundation of evaluation. J. Doc. 1974, 30, 365–373. [Google Scholar] [CrossRef]
- Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
- Chib, S.; Greenberg, E. Understanding the Metropolis-Hastings algorithm. Am. Stat. 1995, 49, 327–335. [Google Scholar]
- Fränti, P.; Sieranoja, S. K-means properties on six clustering benchmark datasets. Appl. Intell. 2018, 48, 4743–4759. [Google Scholar] [CrossRef]
- Yang, B.; Fu, X.; Sidiropoulos, N.D.; Hong, M. Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 7–9 August 2017; Precup, D., Teh, Y.W., Eds.; 2017; Volume 70, pp. 3861–3870. [Google Scholar]
- Mohsen, H.; El-Dahshan, E.S.A.; El-Horbaty, E.S.M.; Salem, A.B.M. Classification using deep learning neural networks for brain tumors. Future Comput. Inform. J. 2018, 3, 68–71. [Google Scholar] [CrossRef]
- Gorsevski, P.V.; Gessler, P.E.; Jankowski, P. Integrating a fuzzy k-means classification and a Bayesian approach for spatial prediction of landslide hazard. J. Geogr. Syst. 2003, 5, 223–251. [Google Scholar] [CrossRef]
- Kwok, T.; Smith, K.; Lozano, S.; Taniar, D. Parallel Fuzzy c- Means Clustering for Large Data Sets. In Euro-Par 2002 Parallel Processing; Monien, B., Feldmann, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 365–374. [Google Scholar]
- Mollah, M.N.H.; Eguchi, S.; Minami, M. Robust Prewhitening for ICA by Minimizing β-Divergence and Its Application to FastICA. Neural Process. Lett. 2007, 25, 91–110. [Google Scholar] [CrossRef]
- Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
- Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008, 99, 2053–2081. [Google Scholar] [CrossRef] [Green Version]
- Notsu, A.; Eguchi, S. Robust clustering method in the presence of scattered observations. Neural Comput. 2016, 28, 1141–1162. [Google Scholar] [CrossRef] [PubMed]
- Pernkopf, F.; Bouchaffra, D. Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1344–1348. [Google Scholar] [CrossRef] [PubMed]
- Likas, A.; Vlassis, N.; Verbeek, J.J. The global k-means clustering algorithm. Pattern Recognit. 2003, 36, 451–461. [Google Scholar] [CrossRef] [Green Version]
- Krishna, K.; Murty, M.N. Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part (Cybern.) 1999, 29, 433–439. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Komori, O.; Eguchi, S. A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov–Nagumo Average. Entropy 2021, 23, 518. https://doi.org/10.3390/e23050518
Komori O, Eguchi S. A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov–Nagumo Average. Entropy. 2021; 23(5):518. https://doi.org/10.3390/e23050518
Chicago/Turabian StyleKomori, Osamu, and Shinto Eguchi. 2021. "A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov–Nagumo Average" Entropy 23, no. 5: 518. https://doi.org/10.3390/e23050518