*3.3. The Properties of MDSVC*

We briefly introduce the properties of MDSVC in this subsection. Hereinafter, the points with 0 < *β<sup>i</sup>* < *C* will be referred to as support vectors (SVs); the points with *β<sup>i</sup>* = *C* will be called bounded support vectors (BSVs), which are the same as in SVC. Additionally, the SVDD [5] used cross-validation (leave-one-out) as the criterion to characterize

the expectation of the probability of test error, and, then, they describe the expectation as follows

$$E(P(error)) = \frac{num(SV)}{m} \tag{26}$$

The above expectation is more suitable as a standard for adjusting the parameters in the experiments of SVDD rather than having a theoretical basis. It can only estimate the error of the first kind, i.e., the target class. By analyzing the above equation, we further infer that our algorithm can reduce the number of SVs to some extent compared with SVC. Thus, we can obtain better generalization performance compared with SVC theoretically. Inspired by SVDD and LDM, we give the expectation in a manner similar to the approach used in LDM.

**Theorem 3.** *The center. Let β represent the optimal solution of Equation (19) and E*[*R*(*β*)] *be the expectation of the probability of error, and then we obtain*

$$E[R(\mathcal{J})] \le \frac{E\left[d\sum\_{i \in I\_1} \frac{\beta\_i^{s\_i}}{2(1 - R^2)} + |I\_2|\right]}{m},\tag{27}$$

*where <sup>I</sup>*<sup>1</sup> <sup>≡</sup> {*i*|<sup>0</sup> <sup>&</sup>lt; *<sup>α</sup><sup>i</sup>* <sup>&</sup>lt; *<sup>C</sup>*}, *<sup>I</sup>*<sup>2</sup> <sup>≡</sup> {*i*|*α<sup>i</sup>* <sup>=</sup> *<sup>C</sup>*}, *<sup>d</sup>* <sup>=</sup> *max*{*diag*{*D*}}.

**Proof of Theorem 3.** Suppose

$$\begin{aligned} \mathcal{J}^{\*} &= \underset{0 \le \mathfrak{k} \le \mathbb{C}}{\operatorname{argmin}} f(\mathfrak{k}),\\ \mathcal{J}^{i} &= \underset{0 \le \mathfrak{k} \le \mathbb{C}, \mathfrak{k}^{i} = 0}{\operatorname{argmin}} f(\mathfrak{k}), i = 1, \dots, m, \end{aligned}$$

and the parameters of the sphere are *R* and *a*, respectively. As in [16], the expectation is calculated as below

$$E[R(\mathcal{J})] = \frac{E[\gamma((\mathbf{x}\_1, y\_1), \dots, (\mathbf{x}\_{m\_\nu} y\_m))]}{m},\tag{28}$$

where *γ*((*x*1, *y*1),...,(*xm*, *ym*)) is the number of errors produced during the leave-one-out procedure. Data points are divided into three categories. Note that if *β<sup>i</sup>* ∗ = 0, the point is interior in the data space. The cluster of the interior points is totally up to the SVs regardless of the assignment of the cluster in the second stage of the MDSVC procedure based on the analysis of SVDD. Hence, we consider two cases as follows:

(1) 0 < *β<sup>i</sup>* ∗ < *C*, the data is the support point according to the SVC and KKT conditions, we have

$$f(\mathfrak{f}^i) - \min\_t f(\mathfrak{f}^i + t\mathfrak{e}\_i) \le f(\mathfrak{f}^i) - f(\mathfrak{f}^\*) \le f(\mathfrak{f}^\* - \beta\_i^\* \mathfrak{e}\_i) - f(\mathfrak{f}^\*),\tag{29}$$

where *ei* is a vector with 1 in the *i*-th coordinate and 0 elsewhere. Incorporating Equation (16) into the aforementioned formula, we have *φ*(*xi*, *<sup>a</sup>*) ≤ *<sup>β</sup>i*∗*dii* <sup>2</sup> , where *<sup>x</sup><sup>i</sup>* are SVs. Further, note that if *<sup>x</sup><sup>i</sup>* is an SV, we have *φ*(*xi*, *<sup>a</sup>*) <sup>=</sup> *a*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>R</sup>*<sup>2</sup> , which is a lemma proposed in CCL [9]. Thus, we rearrange *φ*(*xi*, *<sup>a</sup>*) ≤ *<sup>β</sup>i*∗*dii* <sup>2</sup> , and then obtain 1 <sup>≤</sup> *<sup>β</sup>i*∗*dii* <sup>2</sup>(1−*R*2) .

(2) *β<sup>i</sup>* <sup>∗</sup> = *C*, *x<sup>i</sup>* is the bounded SV (SVs) and must be misclassified in the leave-one-out procedure. Hence we have

$$\gamma((\mathbf{x}\_1, y\_1), \dots, (\mathbf{x}\_{m\_\prime} y\_m)) \le d \sum\_{i \in I\_1} \frac{\beta\_i^{\ast \ast}}{2(1 - R^2)} + |I\_2| \tag{30}$$

where *<sup>I</sup>*<sup>1</sup> <sup>≡</sup> {*i*|<sup>0</sup> <sup>&</sup>lt; *<sup>α</sup><sup>i</sup>* <sup>&</sup>lt; *<sup>C</sup>*}, *<sup>I</sup>*<sup>2</sup> <sup>≡</sup> {*i*|*α<sup>i</sup>* <sup>=</sup> *<sup>C</sup>*}, *<sup>d</sup>* <sup>=</sup> max{*diag*{*D*}}. Taking the mean of both sides of Equation (30) and with Equation (28), we finally obtain the result that Equation (27) holds. -
