Ensemble Clustering in GPS Velocities: A Case Study of Turkey
Abstract
:1. Introduction
2. Data and Methodology
2.1. Gap Statistic Algorithm
2.2. Clustering Ensemble Approach
- Robustness: better average performance compared to individual clustering algorithms.
- Novelty: finding a new consolidated solution unattainable by any single clustering algorithm.
- Stability: Final solutions with lower sensitivity to noise and outliers.
2.3. Generation Mechanisms
2.3.1. Birch Clustering
2.3.2. K-Means Clustering
- k points are chosen randomly as initial cluster centroids.
- The squared Euclidean distances from each point to the initially chosen centroids are calculated and then points are allocated to the nearest centroid.
- The new cluster centroids of the formed clusters are updated by taking the mean of the points in each cluster.
- The previous steps are repeated until the changes of all clusters remain stable and reach convergence. Notice that although it is possible to carry out k-means with other distance metrics such as Manhattan, Chebychev, etc., it is not suggested for they may prevent the convergence.
2.3.3. Mini Batch K-Means Clustering
2.3.4. Hierarchical Agglomerative Clustering
2.3.5. Spectral Clustering
- Compute affinity matrix defined using if and , where is scale parameter and the choosing value of is performed manually.
- Construct degree matrix to be the diagonal matrix.
- Calculate the normalized Laplacian matrix [42] defined using .
- Find , the k largest eigenvectors of Laplacian matrix , and construct the matrix by stacking the eigenvectors in columns.
- Form the matrix Y from U by renormalizing each of U’s rows to have unit length .
- Let each row of Y be points in , cluster them into c clusters using k-means or any other algorithm.
2.4. Consensus Functions
2.4.1. Hybrid Bipartite Graph Formulation
2.4.2. Meta-Clustering Algorithm
2.4.3. Non-Negative Matrix Factorization
3. Results and Discussion
3.1. How Many Clusters?
3.2. Identification of Blocks
3.3. Clustering Results
3.4. Ensemble Results
3.5. Comparison of Clustering Results with Ensemble Clustering
4. Conclusions
- Before clustering, the GAP algorithm was used to obtain a priori information about the distribution of GPS velocities and classify the data into five classes.
- Five different individual clustering methods, BIRCH, k-means, mini batch k-means, HAC, and spectral clustering, were used to classify the published horizontal GPS velocities into five clusters. In general, the individual clustering methods separated NAF and EAF immediately. Furthermore, some sites in the western part of Turkey were assigned for the Aegean block. However, the number of GPS sites and the area of this block are changing due to the used methods. Moreover, in complex regions such as eastern or southern parts of Turkey, GPS sites were assigned to distinct generally neighbor clusters.
- To enhance the differences in clustering methods, the performance of three different ensemble clustering methods, HBGF, MCLA, and NMF-based consensus clustering is utilized for the first time with a GPS velocity field.
- Among the three ensemble clustering methods, HBGF and NMF did not give satisfactory results.
- On the other hand, MCLA consensus results are successful and proven here to be used with GPS-derived velocities.
- As a result, the block boundaries created with the MCLA ensemble clustering algorithm are compatible with the literature.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BIRCH | Balanced Iterative Reducing and Clustering using Hierarchies |
CORS-TR | Continuously Operating Reference Stations Turkey |
EAF | East Anatolian Fault |
EMSRI | Earth and Marine Sciences Research Institute |
GPS | Global Positioning System |
GMM | Gaussian Mixture Model |
HAC | Hierarchical Agglomerative Clustering |
HBGF | Hybrid Bipartite Graph Formulation |
MAGNET | Marmara Region Continuous Network |
MCLA | Meta-CLustering Algorithm |
NAF | North Anatolian Fault |
NMF | Non-negative Matrix Factorization |
TNPGN | Turkish National Permanent GNSS Network |
TUBITAK | Scientifc and Technical Research Council of Turkey |
References
- McClusky, S.; Balassanian, S.; Barka, A.; Demir, C.; Ergintav, S.; Georgiev, I.; Gurkan, O.; Hamburger, M.; Hurst, K.; Kahle, H.; et al. Global Positioning System constraints on plate kinematics and dynamics in the eastern Mediterranean and Caucasus. J. Geophys. Res. Solid Earth 2000, 105, 5695–5719. [Google Scholar]
- Lazos, I.; Papanikolaou, I.; Sboras, S.; Foumelis, M.; Pikridas, C. Geodetic Upper Crust Deformation Based on Primary GNSS and INSAR Data in the Strymon Basin, Northern Greece—Correlation with Active Faults. Appl. Sci. 2022, 12, 9391. [Google Scholar]
- Reilinger, R.; McClusky, S.; Paradissis, D.; Ergintav, S.; Vernant, P. Geodetic constraints on the tectonic evolution of the Aegean region and strain accumulation along the Hellenic subduction zone. Tectonophysics 2010, 488, 22–30. [Google Scholar]
- Thatcher, W. How the continents deform: The evidence from tectonic geodesy. Annu. Rev. Earth Planet. Sci. 2009, 37, 237–262. [Google Scholar] [CrossRef] [Green Version]
- Reilinger, R.; McClusky, S.; Vernant, P.; Lawrence, S.; Ergintav, S.; Cakmak, R.; Ozener, H.; Kadirov, F.; Guliev, I.; Stepanyan, R.; et al. GPS constraints on continental deformation in the Africa-Arabia-Eurasia continental collision zone and implications for the dynamics of plate interactions. J. Geophys. Res. Solid Earth 2006, 111, B5. [Google Scholar]
- Vernant, P. What can we learn from 20 years of interseismic GPS measurements across strike-slip faults? Tectonophysics 2015, 644, 22–39. [Google Scholar] [CrossRef]
- Özarpacı, S.; Kılıç, B.; Bayrak, O.C.; Özdemir, A.; Yılmaz, Y.; Floyd, M. Comparative analysis of the optimum cluster number determination algorithms in clustering GPS velocities. Geophys. J. Int. 2022, 232, 70–80. [Google Scholar]
- Vega-Pons, S.; Ruiz-Shulcloper, J. A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell. 2011, 25, 337–372. [Google Scholar] [CrossRef]
- Golalipour, K.; Akbari, E.; Hamidi, S.S.; Lee, M.; Enayatifar, R. A From clustering to clustering ensemble selection: A review. Eng. Appl. Artif. Intell. 2021, 104, 104388. [Google Scholar]
- Simpson, R.W.; Thatcher, W.; Savage, J.C. Using cluster analysis to organize and explore regional GPS velocities. Geophys. Res. Lett. 2012, 39, 18. [Google Scholar]
- Savage, J.C.; Simpson, R.W. Clustering of GPS velocities in the Mojave Block, southeastern California. J. Geophys. Res. Solid Earth 2013, 118, 1747–1759. [Google Scholar] [CrossRef]
- Savage, J.C.; Simpson, R.W. Clustering of velocities in a GPS network spanning the Sierra Nevada Block, the northern Walker Lane Belt, and the central Nevada Seismic Belt, California-Nevada. J. Geophys. Res. Solid Earth 2013, 118, 4937–4947. [Google Scholar]
- Savage, J.C.; Wells, R.E. Identifying block structure in the Pacific Northwest, USA. J. Geophys. Res. Solid Earth. 2015, 120, 7905–7916. [Google Scholar] [CrossRef]
- Savage, J.C. Euler-vector clustering of GPS velocities defines microplate geometry in southwest Japan. J. Geophys. Res. Solid Earth 2018, 123, 1954–1968. [Google Scholar] [CrossRef]
- Özdemir, S.; Karslıoğlu, M.O. Soft clustering of GPS velocities from a homogeneous permanent network in Turkey. J. Geod. 2019, 93, 1171–1195. [Google Scholar]
- Takahashi, A.; Hashimoto, M.; Hu, J.C.; Takeuchi, K.; Tsai, M.C.; Fukahata, Y. Hierarchical cluster analysis of dense GPS data and examination of the nature of the clusters associated with regional tectonics in Taiwan. J. Geophys. Res. Solid Earth 2019, 124, 5174–5191. [Google Scholar]
- Granat, R.; Donnellan, A.; Heflin, M.; Lyzenga, G.; Glasscoe, M.; Parker, J.; Pierce, M.; Wang, J.; Rundle, J.; Ludwig, L.G. Clustering Analysis Methods for GNSS Observations: A Data-Driven Approach to Identifying California’s Major Faults. Earth Space Sci. 2021, 11, e2021EA001680. [Google Scholar] [CrossRef]
- Kleinberg, J. An impossibility theorem for clustering. Adv. Neural. Inf. Process Syst. 2002, 15, 463–470. [Google Scholar]
- Ghaemi, R.; Sulaiman, M.N.; Ibrahim, H.; Mustapha, N. A survey: Clustering ensembles techniques. World Acad. Sci. Eng. Technol. 2009, 50, 644–653. [Google Scholar]
- Strehl, A.; Ghosh, J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
- Li, F.; Qian, Y.; Wang, J.; Dang, C.; Jing, L. Clustering ensemble based on sample’s stability. Artif. Intell. 2019, 273, 37–55. [Google Scholar] [CrossRef]
- Zhou, P.; Du, L.; Liu, X.; Shen, Y.D.; Fan, M.; Li, X. Self-paced clustering ensemble. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1497–1511. [Google Scholar]
- Emre, Ö.; Duman, T.Y.; Özalp, S.; Elmacı, H.; Olgun, Ş.; Şaroğlu, F. Açıklamalı Türkiye Diri Fay Haritası. Ölçek 1:1.250.000; Maden Tetkik ve Arama Genel Müdürlüğü, Özel Yayın Serisi-30: Ankara, Turkey, 2013. [Google Scholar]
- Tibshirani, R.; Walther, G.; Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R Stat. Soc. Ser. B Methodol. 2001, 63, 411–423. [Google Scholar] [CrossRef]
- Alqurashi, T.; Wang, W. Clustering ensemble method. Int. J. Mach. Learn. Cybern. 2019, 10, 1227–1246. [Google Scholar] [CrossRef]
- Wu, X.; Ma, T.; Cao, J.; Tian, Y.; Alabdulkarim, A. A comparative study of clustering ensemble algorithms. Comput. Electr. Eng. 2018, 68, 603–615. [Google Scholar] [CrossRef]
- Topchy, A.P.; Jain, A.K.; Punch, W. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1866–1881. [Google Scholar]
- Ghosh, J.; Acharya, A. Cluster ensembles. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 305–315. [Google Scholar]
- Hamidi, S.S.; Akbari, E.; Motameni, H. Consensus clustering algorithm based on the automatic partitioning similarity graph. Data Knowl. Eng. 2019, 124, 101754. [Google Scholar] [CrossRef]
- Gionis, A.; Mannila, H.; Tsaparas, P. Clustering aggregation. ACM Trans. Knowl. Discov. Data. 2007, 10, 341–352. [Google Scholar]
- Tsai, C.F.; Hung, C. Cluster ensembles in collaborative filtering recommendation. Appl. Soft Comput. 2012, 12, 1417–1425. [Google Scholar] [CrossRef]
- Yi, J.; Yang, T.; Jin, R.; Jain, A.K.; Mahdavi, M. Robust ensemble clustering by matrix completion. In Proceedings of the IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2013; pp. 1176–1181. [Google Scholar]
- Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: An efficient data clustering method for very large databases. ACM Sigmod Rec. 1996, 25, 103–114. [Google Scholar]
- Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: A new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1997, 1, 141–182. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Los Angeles, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
- Sculley, D. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, CA, USA, 26–30 April 2010; pp. 1177–1178. [Google Scholar]
- Peng, K.; Leung, V.C.; Huang, Q. Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 2018, 6, 11897–11906. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York, NY, USA, 1990. [Google Scholar]
- Ward, J.H., Jr. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar]
- Yan, D.; Huang, L.; Jordan, M.I. Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 907–916. [Google Scholar]
- Von Luxburg, U. A tutorial on spectral clustering. Stat Comput. 2007, 17, 395–416. [Google Scholar]
- Chung, F.R.K. Spectral Graph Theory; American Mathematical Society: Providence, RI, USA, 1997; Volume 92. [Google Scholar]
- Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001; Volume 14. [Google Scholar]
- Zhou, Z.H.; Tang, W. Clusterer ensemble. Knowl. Based Syst. 2006, 19, 77–83. [Google Scholar] [CrossRef]
- Ayad, H.G.; Kamel, M.S. On voting-based consensus of cluster ensembles. Voting-Based Consens. Clust. Ensembles. 2010, 43, 1943–1953. [Google Scholar] [CrossRef]
- Fred, A.L.; Jain, A.K. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 835–850. [Google Scholar] [CrossRef]
- Fern, X.Z.; Brodley, C.E. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AL, Canada, 4–8 July 2004; p. 36. [Google Scholar]
- Li, T.; Ding, C.; Jordan, M.I. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7th IEEE International Conference on Data Mining, Omaha, NE, USA, 28–31 October 2007; pp. 577–582. [Google Scholar]
- Cichocki, A.; Zdunek, R.; Phan, A.H.; Amari, S.I. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Vega-Pons, S.; Correa-Morris, J.; Ruiz-Shulcloper, J. Weighted partition consensus via kernels. Pattern Recognit. 2010, 43, 2712–2724. [Google Scholar] [CrossRef]
- Luo, H.; Jing, F.; Xie, X. Combining multiple clusterings using information theory based genetic algorithm. Int. Conf. Comput. Intell. Secur. 2006, 1, 84–89. [Google Scholar]
- Topchy, A.P.; Law, M.H.; Jain, A.K.; Fred, A.L. Analysis of consensus partition in cluster ensemble. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04), Brighton, UK, 1–4 November 2004; pp. 225–232. [Google Scholar]
- Liang, W.; Zhang, Y.; Xu, J.; Lin, D. Optimization of basic clustering for ensemble clustering: An information-theoretic perspective. IEEE Access. 2019, 7, 179048–179062. [Google Scholar]
- Karypis, G.; Kumar, V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 1998, 20, 359–392. [Google Scholar] [CrossRef]
- Strehl, A.; Ghosh, J. Value-based customer grouping from large retail data sets. Data Min Knowl Discov. Theory Tools Technol. 2000, 4057, 33–42. [Google Scholar]
- Paatero, P.; Tapper, U. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 1994, 5, 111–126. [Google Scholar] [CrossRef]
- Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems 13, Denver, CO, USA, 1 January 2000; Volume 13. [Google Scholar]
- Xu, D.; Tian, Y. A comprehensive survey of clustering algorithms. Ann. Data Sci. 2015, 2, 165–193. [Google Scholar] [CrossRef]
- Li, X.; Chen, M.; Wang, Q. Discrimination-aware projected matrix factorization. IEEE Trans. Knowl. Data Eng. 2019, 32, 809–814. [Google Scholar]
- Wessel, P.; Luis, J.F.; Uieda, L.; Scharroo, R.; Wobbe, F.; Smith, W.H.F.; Tian, D. The Generic Mapping Tools version 6. Geochem. Geophys. Geosystems. 2019, 20, 5556–5564. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kılıç, B.; Özarpacı, S. Ensemble Clustering in GPS Velocities: A Case Study of Turkey. Appl. Sci. 2022, 12, 12636. https://doi.org/10.3390/app122412636
Kılıç B, Özarpacı S. Ensemble Clustering in GPS Velocities: A Case Study of Turkey. Applied Sciences. 2022; 12(24):12636. https://doi.org/10.3390/app122412636
Chicago/Turabian StyleKılıç, Batuhan, and Seda Özarpacı. 2022. "Ensemble Clustering in GPS Velocities: A Case Study of Turkey" Applied Sciences 12, no. 24: 12636. https://doi.org/10.3390/app122412636
APA StyleKılıç, B., & Özarpacı, S. (2022). Ensemble Clustering in GPS Velocities: A Case Study of Turkey. Applied Sciences, 12(24), 12636. https://doi.org/10.3390/app122412636