**1. Introduction**

Cluster analysis groups a dataset into clusters according to the correlations of data. To date, many clustering algorithms have emerged, such as plane-based clustering algorithm, spectral clustering, density-based DBSCAN [1], OPTICS [2], Density Peak algorithm (DP) characterizing the center of clusters [3], and partition-based k-means algorithm [4]. In particular, the support vector machine (SVM) has become an important tool for data mining. As a classical machine learning algorithm, SVM can well address the issue of local extremum and high dimensionality of data in the process of model optimization, and it makes data separable in feature space through nonlinear transformation [5].

In particular, Tax and Duin proposed a novel method in which the decision boundaries are constructed by a set of support vectors, the so-called support vector domain description (SVDD) [6]. Leveraged by the kernel theory and SVDD, support vector clustering (SVC) was proposed based on contour clustering, which has many advantages over other clustering algorithms [7]. SVC is robust to noise and does not need to pre-specify the number of clusters in advance. For SVC, it is feasible to adjust its parameter C to obtain better performance, but this comes at the cost of increasing outliers, and it only introduces a soft boundary for optimization. Several insights into understanding the features of SVC have been offered in [8,9]. After studying the relevant literature, we found that these insights mainly cover two aspects: the first aspect is the selection of parameters *q* and *C*. Lee and Daniels chose a method similar to a secant to generate monotone increasing sequences of

**Citation:** Wang, Y.; Chen, J.; Xie, X.; Yang, S.; Pang, W.; Huang, L.; Zhang, S.; Zhao, S. Minimum Distribution Support Vector Clustering. *Entropy* **2021**, *23*, 1473. https://doi.org/ 10.3390/e23111473

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 6 October 2021 Accepted: 4 November 2021 Published: 8 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

*q* and establish the monotone function of *q* and radius R, which can be applied to high dimensions; the second aspect is optimizing the cluster assignments. Considering the high cost of the second stage of SVC, several methods have been proposed for improving the cluster partition of SVC. First, Ben et al. improved the original Complete Graph (CG) partition by using the adjacency matrix partition based on SV points, which simplified the original calculation, but this method failed to avoid random sampling. Yang et al. elaborated on the Proximity Graph (PG) to model the proximity structure of the m samples with time complexity of O(m) or O(mlog(m)). However, the complexity of this algorithm increases with the increase in dimensionality [10]. Lee et al. studied a cone cluster labeling (CCL) method by using the geometry of the feature space to assign clusters in the data space. If two cones intersect, the samples in these cones belong to the same cluster [9]. However, the performance of CCL is sensitive to kernel parameter q for the cones decided by q. More recently, Peng et al. designed a partition method that utilized the clustering algorithm of similarity segmentation-based point sorting (CASS-PS) and considered the geometrical properties of support vectors in the feature space to avoid the downsides of SVC and CASS-PS [11]. However, CASS-PS is sensitive to the number and distribution of the support vector points recognized. Jennath and Asharaf proposed an efficient cluster assignment algorithm for SVC using the similarity of feature set for data points utilizing an efficient MEB approximation algorithm [12].

It is well known from the margin theory that maximizing the minimum margin is often not the best way for further improving the learning performance. Regarding this, the introduction of the margin mean and margin variance in distribution can make the model achieve better generalization performance, as revealed by Gao and Zhou [13,14]. In classification and regression analysis, there are many methods for improving the learning performance by considering the statistical information of the data. Zhang and Zhou proposed the large margin distribution machine (LDM) and optimal margin distribution machine (ODM) for data classification, which adjusted the mean and variance to improve the performance of the model [15,16]. In regression analysis, MDR, ε-SVR, LDMR, and v-MDAR considers the marginal distribution to achieve better performance. MDR, proposed by Liu et al., minimizes the regression deviation mean and the regression deviation variance, which introduced the statistics of regression deviation into ε-SVR [17]. To deal with this issue, Wang et al. characterized the absolute regression deviation mean and the absolute regression deviation variance and proposed the v-minimum absolute deviation distribution regression (v-MADR) machine [18]. However, it is not very appropriate when both positive-label and negative-label samples are present. Inspired by LDM, Rastogi et al. also proposed a large margin distribution machine-based regression model (LDMR) [19].

In clustering analysis, for a good clustering, when the labels are consistent with the clustering results, SVM can obtain a larger minimum margin. Inspired by this, maximum margin clustering (MMC) considered the large margin heuristic from SVM and added the maximum margin to all possible markers [20]. Improved versions of MMC are also proposed [21]. The optimal margin distribution clustering (ODMC) proposed by Zhang et al. forms the optimal marginal distribution during the clustering process, which characterizes the margin distribution by the first- and second-order statistics. It also has the same convergence rate as state-of-the-art cutting plane-based algorithms [22].

The success of the aforementioned models suggests that there may still exist room for further improving SVC. These models do not involve the improvement in the generalization performance of SVC, that is, the reconstruction of hyperplane, when the distribution of data is fixed in feature space. In this research, we propose a novel approach called minimum distribution support vector clustering (MDSVC), and our novel contributions are as follows:

• We characterize the envelope radius of minimum hypersphere by the first- and secondorder statistics, i.e., the mean and variance. By minimizing these two statistics, it can avoid the problem of too many or too few support vector points caused by the inappropriate kernel width coefficient q to some extent, form a better cluster contour, and, thus, improve the accuracy.


The remainder of this paper is organized as follows. Section 2 introduces the notations, the recent progress in the margin theory, and the SVC algorithm. In Section 3, we present the MDSVC algorithm, which minimizes the mean and the variance, and propose a DCD algorithm to solve the objective function of MDSVC. Section 4 reports our experimental results on both artificial and real datasets. We discuss our method in Section 5 and draw conclusions in Section 6.
