Minimum Distribution Support Vector Clustering
Abstract
:1. Introduction
- We characterize the envelope radius of minimum hypersphere by the first- and second-order statistics, i.e., the mean and variance. By minimizing these two statistics, it can avoid the problem of too many or too few support vector points caused by the inappropriate kernel width coefficient q to some extent, form a better cluster contour, and, thus, improve the accuracy.
- We enhance the generalization ability and robustness of the algorithm by introducing these statistics while the distribution of data is fixed for the given q in feature space.
- We further prove that our method has better performance inspired by the expectation of the probability of test error proposed in SVDD.
- We customize a dual coordinate descent (DCD) algorithm to optimize the objective function of MDSVC for our experiments.
2. Background
Recent Progress in Margin Theory
3. Minimum Distribution Support Vector Clustering
3.1. Formula of MDSVC
3.1.1. Preliminary
3.1.2. Minimizing the Mean and Variance
3.2. The MDSVC Algorithm
Algorithm 1: MDSVC. The DCD Algorithm for our method MDSVC |
Step 1. Input: Data set X, parameters:], maxIter, m Step 2. Initialization: Step 3. Iteration(1~maxIter): Iteration stops when the β converges. Step 3.1. Randomly disturb β and then get the random index i Step 3.2. Loop (i = 1, 2, …, m): update gradient and update β, α alternately. |
Step 4. Output: α, β. |
3.3. The Properties of MDSVC
- (1)
- the data is the support point according to the SVC and KKT conditions, we have
- (2)
- , xi is the bounded SV (SVs) and must be misclassified in the leave-one-out procedure. Hence we have
4. Experimental Study
4.1. Evaluation Criteria
4.2. Experimental Results and Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Support Vector Clustering
References
- Chowdhury, A.; Mollah, M.E.; Rahman, M.A. An efficient method for subjectively choosing parameter ‘k’ automatically in VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) algorithm. Int. Conf. Comput. Autom. Eng. 2010, 1, 38–41. [Google Scholar] [CrossRef]
- Nag, A.; Karforma, S. An Efficient Clustering Algorithm for Spatial Datasets with Noise. Int. J. Mod. Educ. Comput. Sci. 2018, 10, 29–36. [Google Scholar] [CrossRef] [Green Version]
- Tong, W.; Liu, S.; Gao, X.-Z. A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing 2020, 458, 655–666. [Google Scholar] [CrossRef]
- Kumar, K.M.; Reddy, A.R.M. An efficient k-means clustering filtering algorithm using density based initial cluster centers. Inf. Sci. 2017, 418, 286–301. [Google Scholar] [CrossRef]
- Jiang, W.E.I.; Siddiqui, S. Hyper-parameter optimization for support vector machines using stochastic gradient descent and dual coordinate descent. EURO J. Comput. Optim. 2020, 8, 85–101. [Google Scholar] [CrossRef]
- Tax, D.M.J.; Duin, R.P.W. Support vector domain description. Pattern Recognit. Lett. 1999, 20, 1191–1199. [Google Scholar] [CrossRef]
- Ben-Hur, A.; Horn, D.; Siegelmann, H.T.; Vapnik, V. A Support Vector Method for Clustering. In Advances in Neural Information Processing Systems 13; MIT Press: Cambridge, MA, USA, 2001; pp. 367–373. [Google Scholar]
- Lee, S.-H.; Daniels, K. Gaussian Kernel Width Generator for Support Vector Clustering. In Advances in Bioinformatics and Its Applications; Series in Mathematical Biology and Medicine; World Scientific: Singapore, 2005; Volume 8, pp. 151–162. [Google Scholar] [CrossRef] [Green Version]
- Lee, S.-H.; Daniels, K.M. Cone Cluster Labeling for Support Vector Clustering. In Proceedings of the 2006 SIAM International Conference on Data Mining, Bethesda, MD, USA, 12–22 April 2006; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2006; pp. 484–488. [Google Scholar] [CrossRef] [Green Version]
- Yang, J.; Estivill-Castro, V.; Chalup, S. Support Vector Clustering Through Proximity Graph Modeling. In Proceedings of the 9th International Conference on Neural Information Processing, Singapore, 18–22 November 2002; Volume 2, pp. 898–903. [Google Scholar] [CrossRef] [Green Version]
- Peng, Q.; Wang, Y.; Ou, G.; Tian, Y.; Huang, L.; Pang, W. Partitioning Clustering Based on Support Vector Ranking. Adv. Data Min. Appl. 2016, 10086, 726–737. [Google Scholar] [CrossRef] [Green Version]
- Jennath, H.S.; Asharaf, S. An Efficient Cluster Assignment Algorithm for Scaling Support Vector Clustering. In International Conference on Innovative Computing and Communications; Springer: Singapore, 2022; pp. 285–297. [Google Scholar] [CrossRef]
- Gao, W.; Zhou, Z.-H. On the doubt about margin explanation of boosting. Artif. Intell. 2013, 203, 1–18. [Google Scholar] [CrossRef]
- Guo, Y.; Zhang, C. Recent Advances in Large Margin Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Zhou, Z.-H. Optimal Margin Distribution Machine. IEEE Trans. Knowl. Data Eng. 2020, 32, 1143–1156. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhou, Z.-H. Large Margin Distribution Machine. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 313–322. [Google Scholar] [CrossRef] [Green Version]
- Liu, M.-Z.; Shao, Y.-H.; Wang, Z.; Li, C.-N.; Chen, W.-J. Minimum deviation distribution machine for large scale regression. Knowl.-Based Syst. 2018, 146, 167–180. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, Y.; Song, Y.; Xie, X.; Huang, L.; Pang, W.; Coghill, G.M. An Efficient v-minimum Absolute Deviation Distribution Regression Machine. IEEE Access 2020, 8, 85533–85551. [Google Scholar] [CrossRef]
- Rastogi, R.; Anand, P.; Chandra, S. Large-margin Distribution Machine-based regression. Neural Comput. Appl. 2020, 32, 3633–3648. [Google Scholar] [CrossRef]
- Zhang, K.; Tsang, I.W.; Kwok, J.T. Maximum Margin Clustering Made Practical. IEEE Trans. Neural Netw. 2009, 20, 583–596. [Google Scholar] [CrossRef] [PubMed]
- Saradhi, V.V.; Abraham, P.C. Incremental maximum margin clustering. Pattern Anal. Appl. 2016, 19, 1057–1067. [Google Scholar] [CrossRef]
- Zhang, T.; Zhou, Z.-H. Optimal Margin Distribution Clustering. Natl. Conf. Artif. Intell. 2018, 32, 4474–4481. [Google Scholar]
- Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Saragih, J.M.; Lucey, S.; Cohn, J.F. Deformable Model Fitting by Regularized Landmark Mean-Shift. Int. J. Comput. Vis. 2011, 91, 200–215. [Google Scholar] [CrossRef]
- Berkhin, P. A Survey of Clustering Data Mining Techniques. In Grouping Multidimensional Data; Springer: Berlin/Heidelberg, Germany, 2006; pp. 25–71. [Google Scholar] [CrossRef]
The Formula of MDSVC | Time Complexity of the Formula |
---|---|
m*n*m | |
m3 | |
m3 | |
m3 | |
m2 |
Source | Datasets | Samples | Feature |
---|---|---|---|
artificial | convex | 150 | 3 |
dbmoon | 200 | 2 | |
ring | 900 | 2 | |
real | iris | 150 | 3 |
glass | 214 | 9 | |
breast | 277 | 9 | |
heart | 303 | 13 | |
liver | 345 | 6 | |
ionosphere | 351 | 34 | |
vote | 435 | 16 | |
balance | 625 | 4 |
Metrics | Definition |
---|---|
Acc | |
ARI |
Datasets | Metric | KM | SC | HC | ODMC | SVC | MDSVC |
---|---|---|---|---|---|---|---|
convex | ARI Acc PERCENTAGE | 0.970 0.820 / | 0.748 0.013 / | 1.000 0.333 / | 0.329 0.333 / | 1.000 1.000 64.2% | 1.000 1.000 35.0% |
dbmoon | ARI Acc PERCENTAGE | 0.638 0.900 / | 0.324 0.185 / | 0.516 0.140 / | 0.498 0.500 / | 0.928 0.990 79.7% | 1.000 1.000 55.3% |
ring | ARI Acc PERCENTAGE | 0.113 0.322 / | 0.171 0.338 / | 1.000 0.500 / | 0.420 0.511 / | 1.000 1.000 95.8% | 1.000 1.000 53.1% |
MDSVC: w/t/l | ARI Acc PERCENTAGE | (3/0/0) (3/0/0) / | (3,0,0) (3,0,0) / | (3,0,0) (3,0,0) / | (3,0,0) (3,0,0) / | (1,2,0) (1,2,0) (3,0,0) |
Datasets | Metric | KM | SC | HC | ODMC | SVC | MDSVC |
---|---|---|---|---|---|---|---|
iris | ARI | 0.730 | 0.474 | 0.558 | 0.329 | 0.848 | 0.828 |
Acc | 0.347 | 0.193 | 0.333 | 0.333 | 0.667 | 0.753 | |
PERCENTAGE | / | / | / | / | 96.1% | 51.8% | |
glass | ARI | 0.230 | 0.067 | 0.259 | 0.260 | 0.750 | 0.751 |
Acc | 0.327 | 0.014 | 0.028 | 0.327 | 0.289 | 0.351 | |
PERCENTAGE | / | / | / | / | 89.8% | 12.5% | |
breast | ARI | 0.171 | 0.177 | 0.062 | 0.585 | 0.542 | 0.612 |
Acc | 0.376 | 0.087 | 0.025 | 0.707 | 0.484 | 0.711 | |
PERCENTAGE | / | / | / | / | 98.7% | 71.5% | |
heart | ARI | 0.564 | 0.074 | 0.058 | 0.637 | 0.571 | 0.580 |
Acc | 0.551 | 0.172 | 0.195 | 0.772 | 0.990 | 0.990 | |
PERCENTAGE | / | / | / | / | 61.3% | 55.1% | |
liver | ARI | 0.001 | 0.002 | 0.009 | 0.511 | 0.489 | 0.512 |
Acc | 0.154 | 0.033 | 0.067 | 0.420 | 0.476 | 0.493 | |
PERCENTAGE | / | / | / | / | 89.7% | 50.4% | |
ionosphere | ARI | 0.178 | 0.191 | 0.189 | 0.612 | 0.747 | 0.756 |
Acc | 0.477 | 0.393 | 0.171 | 0.738 | 0.687 | 0.734 | |
PERCENTAGE | / | / | / | / | 90.6% | 26.2% | |
vote | ARI | 0.296 | 0.009 | 0.512 | 0.525 | 0.512 | 0.525 |
Acc | 0.540 | 0.112 | 0.356 | 0.386 | 0.361 | 0.387 | |
PERCENTAGE | / | / | / | / | 95.2% | 88.7% | |
balance | ARI | 0.114 | 0.184 | 0.695 | 0.112 | 0.570 | 0.653 |
Acc | 0.294 | 0.075 | 0.016 | 0.147 | 0.278 | 0.356 | |
PERCENTAGE | / | / | / | / | 61.4% | 58.1% | |
MDSVC: w/t/l | ARI | (7,0,0) | (7,0,0) | (6,0,1) | (4,2,1) | (5,1.1) | |
Acc | (6,0,1) | (7,0,0) | (7,0,0) | (5,1,1) | (6,1,0) | ||
PERCENTAGE | / | / | / | / | (7,0,0) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Chen, J.; Xie, X.; Yang, S.; Pang, W.; Huang, L.; Zhang, S.; Zhao, S. Minimum Distribution Support Vector Clustering. Entropy 2021, 23, 1473. https://doi.org/10.3390/e23111473
Wang Y, Chen J, Xie X, Yang S, Pang W, Huang L, Zhang S, Zhao S. Minimum Distribution Support Vector Clustering. Entropy. 2021; 23(11):1473. https://doi.org/10.3390/e23111473
Chicago/Turabian StyleWang, Yan, Jiali Chen, Xuping Xie, Sen Yang, Wei Pang, Lan Huang, Shuangquan Zhang, and Shishun Zhao. 2021. "Minimum Distribution Support Vector Clustering" Entropy 23, no. 11: 1473. https://doi.org/10.3390/e23111473
APA StyleWang, Y., Chen, J., Xie, X., Yang, S., Pang, W., Huang, L., Zhang, S., & Zhao, S. (2021). Minimum Distribution Support Vector Clustering. Entropy, 23(11), 1473. https://doi.org/10.3390/e23111473