An Ensemble of Locally Reliable Cluster Solutions
Abstract
:1. Introduction
2. Related Works
3. Proposed Ensemble Clustering
3.1. Notifications and Definitions
3.1.1. Clustering
3.1.2. A Valid Sub-Cluster from a Cluster
3.1.3. Ensemble of Clustering Results
3.1.4. Similarity Between a Pair of Clusters
3.1.5. An Undirected Weighting Graph Corresponding to an Ensemble Clustering
3.2. Problem Definition
3.2.1. Production of Multiple Base Clustering Results
Algorithm 1 The Diverse Ensemble Generation algorithm |
Input: , , Output: , 01. ; 02. For = 1 to 03. = a positive random integer number in ; 04. = FindValidCluster(, , ); 05. EndFor 06. 07. Return , |
Algorithm 2 The FindValidCluster algorithm |
Input: , , Output: , 01. ; ; 02. ; 03. ; 04. While 05. = KMedoids(,); 06. For = 1 to 07. If () 08. ; 09. ; 10. ; 11. ; 12. EndIf 13. EndFor 14. EndWhile 15. ; 16. Return , ; |
3.2.2. Time Complexity of Production of Multiple Base Clustering Results
3.3. Construction of Clusters’ Relations
3.4. Extraction of Consensus Clustering Result
3.5. Overall Implementation Complexity
4. Experimental Analysis
4.1. Benchmark Datasets
4.2. Evaluation Criteria
4.3. Compared Methods
4.4. Experimental Settings
4.5. Experimental Results
4.5.1. Comparison with the State of the Art Ensemble Methods
4.5.2. Comparison with Strong Clustering Algorithms
4.5.3. Final Decisive Experimental Results
5. Conclusions and Future Works
Author Contributions
Funding
Conflicts of Interest
References
- Han, J.; Kamber, M. Data Mining: Concepts and Techniques; Morgan Kaufmann: San Francisco, CA, USA, 2001. [Google Scholar]
- Shojafar, M.; Canali, C.; Lancellotti, R.; Abawajy, J.H. Adaptive Computing-Plus-Communication Optimization Framework for Multimedia Processing in Cloud Systems. IEEE Trans. Cloud Comput. (TCC) 2016, 99, 1–14. [Google Scholar] [CrossRef]
- Shamshirband, S.; Amini, A.; Anuar, N.B.; Kiah, M.L.M.; Teh, Y.W.; Furnell, S. D-FICCA: A density-based fuzzy imperialist competitive clustering algorithm for intrusion detection in wireless sensor networks. Measurement 2014, 55, 212–226. [Google Scholar] [CrossRef]
- Agaian, S.; Madhukar, M.; Chronopoulos, A.T. A new acute leukaemia-automated classification system. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2016, 6, 303–314. [Google Scholar] [CrossRef]
- Khoshnevisan, B.; Rafiee, S.; Omid, M.; Mousazadeh, H.; Shamshirband, S.; Hamid, S.H.A. Developing a fuzzy clustering model for better energy use in farm management systems. Renew. Sustain. Energy Rev. 2015, 48, 27–34. [Google Scholar] [CrossRef]
- Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice Hall: Englewood Cliffs, NJ, USA, 1988. [Google Scholar]
- Jain, A.K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
- Zhou, Z. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Fred, A.; Jain, A. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 835–850. [Google Scholar] [CrossRef]
- Kuncheva, L.; Vetrov, D. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1798–1808. [Google Scholar] [CrossRef]
- Zhang, X.; Jiao, L.; Liu, F.; Bo, L.; Gong, M. Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2126–2136. [Google Scholar] [CrossRef] [Green Version]
- Gionis, A.; Mannila, H.; Tsaparas, P. Clustering aggregation. ACM Trans. Knowl. Discov. Data 2007, 1, 1–30. [Google Scholar] [CrossRef] [Green Version]
- Law, M.; Topchy, A.; Jain, A. Multi-objective data clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar]
- Yu, Z.; Chen, H.; You, J.; Han, G.; Li, L. Hybrid fuzzy cluster ensemble framework for tumor clustering from bio-molecular data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2013, 10, 657–670. [Google Scholar] [CrossRef]
- Fischer, B.; Buhmann, J. Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1411–1415. [Google Scholar] [CrossRef]
- Topchy, A.; Minaei-Bidgoli, B.; Jain, A. Adaptive clustering ensembles. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26 August 2004. [Google Scholar]
- Zhou, Z.; Tang, W. Clusterer ensemble. Knowl.-Based Syst. 2006, 19, 77–83. [Google Scholar] [CrossRef]
- Hong, Y.; Kwong, S.; Wang, H.; Ren, Q. Resampling-based selective clustering ensembles. Pattern Recognit. Lett. 2009, 30, 298–305. [Google Scholar] [CrossRef]
- Fern, X.; Brodley, C. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003. [Google Scholar]
- Zhou, P.; Du, L.; Shi, L.; Wang, H.; Shi, L.; Shen, Y.D. Learning a robust consensus matrix for clustering ensemble via Kullback-Leibler divergence minimization. In 25th International Joint Conference on Artificial Intelligence; AAAI Publications: Palm Springs, CA, USA, 2015. [Google Scholar]
- Yu, Z.; Li, L.; Liu, J.; Zhang, J.; Han, G. Adaptive noise immune cluster ensemble using affinity propagation. IEEE Trans. Knowl. Data Eng. 2015, 27, 3176–3189. [Google Scholar] [CrossRef]
- Gullo, F.; Domeniconi, C. Metacluster-based projective clustering ensembles. Mach. Learn. 2013, 98, 1–36. [Google Scholar] [CrossRef] [Green Version]
- Yang, Y.; Jiang, J. Hybrid Sampling-Based Clustering Ensemble with Global and Local Constitutions. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 952–965. [Google Scholar] [CrossRef]
- Minaei-Bidgoli, B.; Parvin, H.; Alinejad-Rokny, H.; Alizadeh, H.; Punch, W.F. Effects of resampling method and adaptation on clustering ensemble efficacy. Artif. Intell. Rev. 2014, 41, 27–48. [Google Scholar] [CrossRef]
- Fred, A.; Jain, A.K. Data clustering using evidence accumulation. In Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; pp. 276–280. [Google Scholar]
- Yang, Y.; Chen, K. Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans. Knowl. Data Eng. 2011, 23, 307–320. [Google Scholar] [CrossRef] [Green Version]
- Iam-On, N.; Boongoen, T.; Garrett, S.; Price, C. A link-based approach to the cluster ensemble problem. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2396–2409. [Google Scholar] [CrossRef]
- Iam-On, N.; Boongoen, T.; Garrett, S.; Price, C. A link-based cluster ensemble approach for categorical data clustering. IEEE Trans. Knowl. Data Eng. 2012, 24, 413–425. [Google Scholar] [CrossRef]
- Strehl, A.; Ghosh, J. Cluster ensembles: A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
- Fern, X.; Brodley, C. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004. [Google Scholar]
- Huang, D.; Lai, J.; Wang, C.D. Ensemble clustering using factor graph. Pattern Recognit. 2016, 50, 131–142. [Google Scholar] [CrossRef]
- Selim, M.; Ertunc, E. Combining multiple clusterings using similarity graph. Pattern Recognit. 2011, 44, 694–703. [Google Scholar]
- Boulis, C.; Ostendorf, M. Combining multiple clustering systems. In European Conference on Principles and Practice of Knowledge Discovery in Databases; Springer: Berlin/Hidelberg, Germany, 2004. [Google Scholar]
- Hore, P.; Hall, L.O.; Goldgo, B. A scalable framework for cluster ensembles. Pattern Recognit. 2009, 42, 676–688. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Long, B.; Zhang, Z.; Yu, P.S. Combining multiple clusterings by soft correspondence. In Proceedings of the 4th IEEE International Conference on Data Mining, Houston, TX, USA, 27–30 November 2005. [Google Scholar]
- Cristofor, D.; Simovici, D. Finding median partitions using information theoretical based genetic algorithms. J. Univers. Comput. Sci. 2002, 8, 153–172. [Google Scholar]
- Topchy, A.; Jain, A.; Punch, W. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1866–1881. [Google Scholar] [CrossRef]
- Wang, H.; Shan, H.; Banerjee, A. Bayesian cluster ensembles. Stat. Anal. Data Min. 2011, 4, 54–70. [Google Scholar] [CrossRef]
- He, Z.; Xu, X.; Deng, S. A cluster ensemble method for clustering categorical data. Inf. Fusion 2005, 6, 143–151. [Google Scholar] [CrossRef]
- Nguyen, N.; Caruana, R. Consensus Clusterings. In Proceedings of the Seventh IEEE International Conference on Data Mining, Omaha, NE, USA, 28–31 October 2007; pp. 607–612. [Google Scholar]
- Huang, Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 1998, 2, 283–304. [Google Scholar] [CrossRef]
- Nazari, A.; Dehghan, A.; Nejatian, S.; Rezaie, V.; Parvin, H. A Comprehensive Study of Clustering Ensemble Weighting Based on Cluster Quality and Diversity. Pattern Anal. Appl. 2019, 22, 133–145. [Google Scholar] [CrossRef]
- Bagherinia, A.; Minaei-Bidgoli, B.; Hossinzadeh, M.; Parvin, H. Elite fuzzy clustering ensemble based on clustering diversity and quality measures. Appl. Intell. 2019, 49, 1724–1747. [Google Scholar] [CrossRef]
- Alizadeh, H.; Minaeibidgoli, B.; Parvin, H. Cluster ensemble selection based on a new cluster stability measure. Intell. Data Anal. 2014, 18, 389–408. [Google Scholar] [CrossRef] [Green Version]
- Alizadeh, H.; Minaei-Bidgoli, B.; Parvin, H. A New Criterion for Clusters Validation. In Artificial Intelligence Applications and Innovations (AIAI 2011); IFIP, Part I; Springer: Heidelberg, Germany, 2011; pp. 240–246. [Google Scholar]
- Abbasi, S.; Nejatian, S.; Parvin, H.; Rezaie, V.; Bagherifard, K. Clustering ensemble selection considering quality and diversity. Artif. Intell. Rev. 2019, 52, 1311–1340. [Google Scholar] [CrossRef]
- Rashidi, F.; Nejatian, S.; Parvin, H.; Rezaie, V. Diversity Based Cluster Weighting in Cluster Ensemble: An Information Theory Approach. Artif. Intell. Rev. 2019, 52, 1341–1368. [Google Scholar] [CrossRef]
- Zhou, S.; Xu, Z.; Liu, F. Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 3007–3017. [Google Scholar] [CrossRef]
- Karypis, G.; Han, E.-H.S.; Kumar, V. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 1999, 32, 68–75. [Google Scholar] [CrossRef] [Green Version]
- Ji, Y.; Xia, L. Improved Chameleon: A Lightweight Method for Identity Verification in Near Field Communication. In Proceedings of the 2016 International Symposium on Computer, Consumer and Control (IS3C), Xi’an, China, 4–6 July 2016; pp. 387–392. [Google Scholar]
- MacQueen, J.B. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
- Kaufman, L.; Rousseeuw, P.J. Clustering by Means of Medoids, in Statistical Data Analysis Based on the L1—Norm and Related Methods; Dodge, Y., Ed.; North-Holland: Amsterdam, The Netherlands, 1987; pp. 405–416. [Google Scholar]
- Bezdek, J.C.; Pal, N.R. Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B 1998, 28, 301–315. [Google Scholar] [CrossRef] [Green Version]
- Pal, N.R.; Bezdek, J.C. On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 1995, 3, 370–379. [Google Scholar] [CrossRef]
- Guha, S.; Rastogi, R.; Shim, K. Cure: An efficient clustering algorithm for large databases. In Proceedings of the Conference on Management of Data (ACM SIGMOD), Seattle, WA, USA, 1–4 June 1998; pp. 73–84. [Google Scholar]
- Sneath, P.H.A.; Sokal, R.R. Numerical Taxonomy; Freeman: San Francisco, CA, USA; London, UK, 1973. [Google Scholar]
- King, B. Step-wise clustering procedures. J. Am. State Assoc. 1967, 69, 86–101. [Google Scholar] [CrossRef]
- Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888C905. [Google Scholar]
- Ng, A.Y.; Jordan, M.I.; Weiss, Y. On Spectral Clustering: Analysis and an Algorithm. In Advances in Neural Information Processing Systems; Dietterich, T.G., Becker, S., Ghahramani, Z., Eds.; MIT Press: Cambridge, MA, USA, 2002; Volume 14. [Google Scholar]
- UCI Machine Learning Repository. 2016. Available online: http://www.ics.uci.edu/mlearn/ML-Repository.html (accessed on 19 February 2016).
- Press, W.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Conditional Entropy and Mutual Information. In Numerical Recipes: The Art of Scientific Computing, 3rd ed.; Cambridge University Press: New York, NY, USA, 2007. [Google Scholar]
- Ester, M.; Kriegel, H.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD-96: Proceedings: Second International Conference on Knowledge Discovery and Data Mining; Simoudis, E., Han, J., Fayyad, U.M., Eds.; AAAI Press: Menlo Park, CA, USA, 1996; pp. 226–231. [Google Scholar]
- Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Parvin, H.; Minaei-Bidgoli, B. A clustering ensemble framework based on elite selection of weighted clusters. Adv. Data Anal. Classif. 2013, 7, 181–208. [Google Scholar] [CrossRef]
- Parvin, H.; Minaei-Bidgoli, B. A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal. Appl. 2015, 18, 87–112. [Google Scholar] [CrossRef]
- Dietterich, T.G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998, 7, 1895–1924. [Google Scholar] [CrossRef] [Green Version]
Symbol | Description |
---|---|
A dataset | |
-th data object in dataset | |
Real label of the -th data object in dataset | |
-th feature from -th data object | |
The size of data set | |
The number of features in data set | |
A set of initial clustering results | |
-th clustering result in the ensemble clustering | |
-th cluster in the -th clustering result of the ensemble clustering | |
A Boolean indicating whether th data point of the given dataset belongs to the -th cluster in the -th clustering result of the ensemble clustering or not | |
The number of the consensus clusters in given dataset | |
A valid sub-cluster from the cluster | |
A Boolean indicating whether th data point of the given dataset belongs to the valid sub-cluster of the -th cluster in the -th clustering result of the ensemble clustering or not | |
The neighboring radius parameter of the valid cluster in the proposed algorithm | |
center point of cluster | |
-th feature from center point of cluster | |
Consensus clustering result | |
Similarity between two clusters and | |
-th hypothetical cluster between two center points of clusters and | |
The center of -th hypothetical cluster between two clusters and | |
The size of the ensemble clustering | |
The number of the clusters in the -th clustering result | |
The graph defined on the ensemble clustering | |
The nodes of the graph defined on the ensemble clustering | |
The edges of the graph defined on the ensemble clustering | |
A clustering result similar to the real labels |
Dataset | ||||
---|---|---|---|---|
Artificial dataset | Ring3 (R3) | 1500 | 2 | 3 |
Artificial dataset | Banana2 (B2) | 2000 | 2 | 2 |
Artificial dataset | Aggregation7 (A7) | 788 | 2 | 7 |
Artificial dataset | Imbalance2 (I2) | 2250 | 2 | 2 |
UCI dataset | Iris (I) | 150 | 4 | 3 |
UCI dataset | Wine (W) | 178 | 13 | 3 |
UCI dataset | Breast (B) | 569 | 30 | 2 |
UCI dataset | Digits (D) | 5620 | 63 | 10 |
UCI dataset | KDD-CUP99 | 1,048,576 | 39 | 2 |
ARI | NMI | |||
---|---|---|---|---|
Average ± STD | L-D-W | Average ± STD | L-D-W | |
EAC+SL | 66.87 ± 3.39 | 0-2-6 | 64.43 ± 2.43 | 1-0-7 |
EAC+AL | 68.19 ± 2.65 | 0-2-6 | 60.97 ± 2.75 | 0-2-6 |
WCT+SL | 60.11 ± 3.23 | 0-1-7 | 55.78 ± 2.69 | 1-1-6 |
WCT+AL | 67.58 ± 3.13 | 0-1-7 | 60.72 ± 2.32 | 0-1-7 |
WTQ+SL | 65.87 ± 2.89 | 0-1-7 | 62.28 ± 3.03 | 0-1-7 |
WTQ+AL | 67.88 ± 2.58 | 1-1-6 | 61.02 ± 2.94 | 1-2-5 |
CSM+AL | 58.67 ± 3.68 | 1-0-7 | 48.90 ± 2.72 | 1-0-7 |
CSM+SL | 68.21 ± 2.48 | 1-0-7 | 60.99 ± 2.71 | 1-1-6 |
CSPA | 59.97 ± 2.55 | 1-0-7 | 54.55 ± 2.43 | 0-0-8 |
HGPA | 24.24 ± 2.36 | 0-0-8 | 20.29 ± 2.26 | 0-0-8 |
MCLA | 66.21 ± 3.29 | 0-2-6 | 58.26 ± 2.54 | 0-1-7 |
SUV | 48.82 ± 3.08 | 1-0-7 | 40.93 ± 2.31 | 1-1-6 |
SWV | 52.76 ± 2.83 | 1-0-7 | 47.43 ± 3.25 | 1-1-6 |
EM | 57.42 ± 2.89 | 0-0-8 | 52.48 ± 2.97 | 0-0-8 |
IVC | 58.48 ± 3.02 | 0-0-8 | 53.52 ± 2.37 | 0-0-8 |
PC+EEAC+SL | 85.01 ± 3.31 | 1-1-6 | 80.82 ± 2.40 | 1-2-5 |
PC+EEAC+AL | 89.51 ± 2.07 | 1-0-7 | 83.29 ± 1.42 | 1-2-5 |
PC+CSPA | 84.63 ± 2.75 | 0-1-7 | 78.02 ± 2.76 | 0-0-8 |
PC+HGPA | 55.77 ± 2.98 | 0-0-8 | 55.26 ± 3.52 | 0-0-8 |
PC+MCLA | 91.42 ± 2.09 | 1-0-7 | 83.03 ± 2.25 | 1-2-5 |
PC+EM | 86.37 ± 2.44 | 1-1-6 | 79.80 ± 2.35 | 0-0-8 |
Proposed | 93.86 ± 1.37 | 89.58 ± 2.15 |
Time (in Sec.) | ||
---|---|---|
10K | 91 | 11.23 |
20K | 213 | 51.29 |
30K | 225 | 80.11 |
40K | 232 | 114.06 |
50K | 233 | 138.91 |
60K | 242 | 178.71 |
70K | 245 | 197.62 |
80K | 353 | 331.02 |
90K | 461 | 516.96 |
100K | 472 | 576.58 |
Source | Dataset | |||
---|---|---|---|---|
UCI dataset | Glass (Gl) | 214 | 9 | 6 |
UCI dataset | Galaxy (Ga) | 323 | 4 | 7 |
UCI dataset | Yeast (Y) | 1484 | 8 | 10 |
Evaluation Measure | Dataset Number | T-Test Results Wins Against-Draws with-Loses to | |||||
---|---|---|---|---|---|---|---|
B | I | Gl | Ga | Y | W | ||
NMI [9] + ItoU | 95.73− | 82.89− | 41.38− | 21.71− | 34.45− | 91.83± | 5-1-0 |
MAX [44] + ItoU | 96.39± | 83.21− | 42.63− | 20.57− | 33.89− | 91.29− | 5-1-0 |
APMM [45] + ItoU | 95.16− | 82.10− | 41.98− | 24.01− | 34.12− | 91.78± | 5-1-0 |
ENMI [46] + ItoU | 96.51± | 84.66− | 42.65− | 24.84− | 35.58− | 92.27+ | 4-1-1 |
Proposed | 97.28 | 86.05 | 44.79 | 29.44 | 38.20 | 92.13 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Niu, H.; Khozouie, N.; Parvin, H.; Alinejad-Rokny, H.; Beheshti, A.; Mahmoudi, M.R. An Ensemble of Locally Reliable Cluster Solutions. Appl. Sci. 2020, 10, 1891. https://doi.org/10.3390/app10051891
Niu H, Khozouie N, Parvin H, Alinejad-Rokny H, Beheshti A, Mahmoudi MR. An Ensemble of Locally Reliable Cluster Solutions. Applied Sciences. 2020; 10(5):1891. https://doi.org/10.3390/app10051891
Chicago/Turabian StyleNiu, Huan, Nasim Khozouie, Hamid Parvin, Hamid Alinejad-Rokny, Amin Beheshti, and Mohammad Reza Mahmoudi. 2020. "An Ensemble of Locally Reliable Cluster Solutions" Applied Sciences 10, no. 5: 1891. https://doi.org/10.3390/app10051891
APA StyleNiu, H., Khozouie, N., Parvin, H., Alinejad-Rokny, H., Beheshti, A., & Mahmoudi, M. R. (2020). An Ensemble of Locally Reliable Cluster Solutions. Applied Sciences, 10(5), 1891. https://doi.org/10.3390/app10051891