**Appendix A. Support Vector Clustering**

Support vector clustering (SVC) introduces soft boundary as a tolerance mechanism to reduce the number of boundary support vector points. The algorithm is robust to noise and does not need to know the number of clusters. However, the effectiveness of the algorithm depends on the selection of the kernel width coefficient *q* and the soft boundary constant *C*. Clearly, parameter adjustment is time-consuming. SVC has the formulation as follows

$$\begin{array}{l}\underset{R,\mathfrak{a}}{\min}\mathcal{R}^2 + \mathcal{C}\sum\_{i=1}^{m}\xi\_i\\\text{s.t.} \||\phi(\mathbf{x}\_i) - \mathfrak{a}\||^2 \leq \mathcal{R}^2 + \xi\_{i\prime}\mathcal{Z}\_i \geq 0\end{array} \tag{A1}$$

where parameter *<sup>C</sup>* is used for controlling outliers and *<sup>C</sup> <sup>m</sup>* ∑ *i*=1 *ξ<sup>i</sup>* is a penalty term, and then the slack variables *ξ<sup>i</sup>* are used as tolerance. SVC looks for the smallest enclosing sphere of radius *<sup>R</sup>*, under the constraints *φ*(*xi*) <sup>−</sup> *<sup>a</sup>*<sup>2</sup> <sup>≤</sup> *<sup>R</sup>*<sup>2</sup> <sup>+</sup> *<sup>ξ</sup>i*, where ||.|| is the Euclidean norm and *a* is the center of the hypersphere. We can use the Lagrange function to solve the problem

$$L = R^2 + \mathbb{C} \sum\_{i=1}^{m} \mathbb{J}\_i - \sum\_{i=1}^{m} \mu\_i \mathbb{J}\_i - \sum\_{i=1}^{m} \beta\_i (R^2 + \mathbb{J}\_i - \left\|\phi(\mathbf{x}\_i) - \mathbf{a}\right\|^2)$$

After we take the derivative of the above formula, the dual problem can be cast as follows

$$\begin{aligned} \max\_{\boldsymbol{\beta}} & L = \sum\_{i} \beta\_{i} \kappa(\mathbf{x}\_{i}, \mathbf{x}\_{i}) - \sum\_{i} \sum\_{j} \beta\_{i} \beta\_{j} \kappa(\mathbf{x}\_{i}, \mathbf{x}\_{j}),\\ & 0 \le \beta\_{i} \le \mathbf{C} \end{aligned}$$

Thus, we can define the distance of each point in the feature space

$$\mathcal{R}^2(\mathbf{x}) = \left\| \phi(\mathbf{x}) - \mathbf{a} \right\|^2$$

Finally, *R*<sup>2</sup> has the following form

$$R^2(\mathbf{x}) = \kappa(\mathbf{x}, \mathbf{x}) - 2\sum\_{i} \beta\_i \kappa(\mathbf{x}, \mathbf{x}\_i) + \sum\_{i,j} \beta\_i \beta\_j \kappa(\mathbf{x}\_i, \mathbf{x}\_j) \tag{A2}$$

The radius of the hypersphere is

$$\mathcal{R} = \{ R(\mathfrak{x}\_i) | \mathfrak{x}\_i \text{ is a support vector} \}$$

Here, the Lagrange multiplier *<sup>β</sup><sup>i</sup>* <sup>∈</sup> (0, *<sup>C</sup>*), *xi* is a support vector (SV). The point is a boundary support vector point (BSV) when *β<sup>i</sup>* = *C*. SVC used the adjacency matrix *Aij* to identify the connected components. For two points *xi* and *xj*

$$A\_{ij} = \begin{cases} 0 & \exists \mathbf{x}\_i \text{ s.t. } R^2(\mathbf{x}) > R\_i \text{ and } \mathbf{x} - \mathbf{x}\_i = t(\mathbf{x}\_j - \mathbf{x}\_i) \\\ 1 & \text{otherwise.} \end{cases}$$

Finally, the clusters can be defined according to the adjacency matrix *Aij***.** The time complexity of calculating the adjacency matrix is *O* (*vm2*), in which v is the number of samples for the line segment. The quadratic programming problem can be solved by the SMO algorithm, the memory requirements of which are low, and it can be implemented using *O* (1) memory at the cost of a decrease in efficiency. The obvious shortcoming of SVC lies in the high cost of partition.
