*4.2. Experimental Results and Analysis*

In the process of SVC tuning, it is noted that there are often too many SVs or too few SVs, failing to form a better contour. Irrational SVs may not divide the clusters better and/or obtain higher precision. Based on this observation, we design experiments on the number of SVs with varying values of *λ*<sup>1</sup> and *q*. As mentioned before, the Gaussian kernel *<sup>k</sup>*(*x*, *<sup>y</sup>*) = exp(−*<sup>q</sup>* \**<sup>x</sup>* <sup>−</sup> *<sup>y</sup>*2) is employed for nonlinear clustering, and we can derive *k*(*x*, *x*) = 1. We apply the commonly used dichotomy method to select the kernel width coefficient *q*.

Before conducting experiments on the evaluations for MDVSC and other clustering methods, we analyze the relationship between *λ*<sup>1</sup> and *λ*<sup>2</sup> about SVs on two artificial datasets and two real datasets in Figure 2. For the appropriate range of these two parameters, we can realize that the number of SVs increases when *λ*<sup>1</sup> increases as *a* is closer to the denser data in the feature space. Furthermore, the increase in *λ*<sup>2</sup> leads to a decrease in the number of SVs for less volatility in terms of distance from *a* because the sphere is in the right place with fewer SVs. Thus, it is instructive for us to adjust *λ*<sup>1</sup> and *λ*<sup>2</sup> to solve the problem of too many or too few SVs when *q* and *C* are given.

We show the results with respect to the corresponding performance metrics in Tables 4 and 5, where PERCENTAGE represents the percentage of the average number of SVs to the total data. We adopt/to represent the method has no need to compute the PERCENTAGE. We summarize the win/tie/loss counts for MDSVC in the last row compared with other methods. For a clearer comparison between MDSVC and SVC, *q* is selected from the same range [2−7, 27] to compute the PERCENTAGE.

In particular, the evaluation of datasets is shown in Tables 4 and 5. Table 4 shows that MDSVC is almost on par with SVC on artificial datasets. It is worth noting that our MDSVC can reduce the number of SVs significantly under the same conditions compared to SVC, i.e., the same *q* and *C*. In Table 5, although we note that both SVC and MDSVC have worse Acc or ARI on some datasets, MDSVC still obtains better results than SVC and other methods on most real datasets. Based on the analysis of the experiments, we derive that we can change the SVs by changing the other parameters, *λ*<sup>1</sup> and *λ*2, to achieve better performance when the parameters *q* and *C* are selected for MDSVC. In addition, in terms of the CPU time, MDSVC has superior performance on the datasets (ring, vehicle) with higher dimensions and larger size than SVC, as shown in Figure 3. Referring to the comparison of the CPU time between MDSVC and SVC, we indicate that MDSVC has two advantages: better performance and less running time.

The estimated clustering assignments on artificial datasets, convex, and ring, are shown in Figure 4. In order to show the clusters divided by SVs more intuitively and accurately, we draw the contour lines decided by *R*. We note that the SVC algorithm is almost always overfitting on artificial datasets when the boundary is optimal; that is, all data points are identified as SVs, and, thus, Figure 4 only shows the best non-fitting effect of SVC. Obviously, MDSVC is superior to SVC in terms of forming better boundaries on artificial datasets.

Considering Figure 4a–d, the boundaries of the convex and the dbmoon formed by MDSVC are more rational than SVC in terms of separating clusters. For the ring set, the challenge for SVC is to make rational boundaries with the appropriate number of SVs. MDSVC forms four more rational boundaries and, thus, separates the ring set into two clusters, as shown in Figure 4e, while SVC recognizes only two boundaries in Figure 4f. Moreover, the introduction of statistical items (non-negative), which makes the hyperplane closer to the denser part in the feature space, results in the value of *R* being larger than SVC. Therefore, it can be seen that we have obtained a greater boundary under the premise of not increasing outliers. In summary, MDSVC obtains better boundaries and a better presentation of the statistical information in the above datasets.

**Figure 2.** (**a**): The relationship between *λ*<sup>1</sup> and *λ*<sup>2</sup> on dbmoon about SVs. (**b**): The relationship between *λ*<sup>1</sup> and *λ*<sup>2</sup> on convex. (**c**): The relationship between *λ*<sup>1</sup> and *λ*<sup>2</sup> on glass about SVs. (**d**): The relationship between *λ*<sup>1</sup> and *λ*<sup>2</sup> on iris about SVs.


**Table 4.** The result comparisons on artificial datasets.

For further evaluation, we assess the impact of parameters on ARI, Acc, and PER-CENTAGE as the change of parameter values may have a significant influence on the clustering results. Percentage characterizes the level of SVs. For our MDSVC, there are three trade-off parameters *λ*1, *λ*2, *C*, and the kernel parameter *q*. We show the impact of *λ*<sup>1</sup> on ARI, Acc, and PERCENTAGE by varying it from 2−<sup>5</sup> to 2<sup>5</sup> while making the other parameters fixed as the optimal ones. As one can see from Figure 5e–h, the number of SVs

is more sensitive to the kernel *q* and *C* compared to *λ*<sup>1</sup> and *λ*2. In Figure 5b,d,f,h,j,l, we can see that the results are not sensitive to parameter *λ*<sup>1</sup> after reaching the optimal results on most datasets. To sum up, we indicate that the mean and variance are both the main factors that affect the performance of the algorithm.


**Table 5.** The result comparisons on real datasets.

**Figure 3.** The CPU time of MDSVC and SVC.

**Figure 4.** The result of MDSVC on three artificial datasets: convex, dbmoon, and ring. The parameters are set as follows: (**a**): *q* = 0.1; *λ*1= 8; *λ*<sup>2</sup> = 32; *C* = 0.1. (**b**) *q* = 1; *C* = 0.1. (**c**): *q* = 0.1; *λ*<sup>1</sup> = 1; *λ*<sup>2</sup> = 4; *C* = 0.1. (**d**) *q* = 0.5; *C* = 0.1. (**e**): *q* = 2; *λ*<sup>1</sup> = 200; *λ*<sup>2</sup> = 300; *C* = 0.1. (**f**) *q* = 1; *C* = 0.5.

**Figure 5.** The impact of *λ*1, *λ*2, *C*, and kernel parameter Q on ARI, Acc, and PERCENTAGE for different datasets. (**a**–**d**): The impact of *λ*1, *λ*2, *C*, and kernel parameter Q on Acc. (**e**–**h**): The impact of *λ*1, *λ*2, *C*, and kernel parameter Q on PERCENTAGE. (**i**–**l**): The impact of *λ*1, *λ*2, *C*, and kernel parameter Q on Acc.
