**3. Statistical K-Means Algorithm**

The K-means algorithm on statistical manifolds, which we refer to as the SKM algorithm, consists of three parts: local statistical method, K-means algorithm, and selection of difference function. This section first introduces the K-nearest neighbor local statistical method and then introduces the details of the SKM algorithm.

#### *3.1. Local Statistical Method*

The point cloud is a sampling of some specified features in the objective world, each of which we consider to have the same properties within a small neighborhood. Mathematically, we obtain neighborhood properties through local statistics. Specifically, we use local statistics as parameters to describe a parameter distribution. Two sets of different local statistics can determine two different distributions on the same parameter distribution

family. This idea is equivalent to finding a distribution for any point in the point cloud and its neighbors in the point cloud (subclouds of the point cloud) such that the subcloud is a sample of that distribution.

For the initial point cloud without any annotation, we have no reason to think that its local statistics conform to some special distribution. We believe that the factors affecting the local distribution of point clouds in their natural background are complex enough; consequently, the local statistics can be generated from a multivariate normal distribution according to the Central Limit Theorem. Therefore, we only need to calculate the mean and covariance matrix of each point of the point cloud in its local area to determine a normal distribution. By doing this, the entire point cloud will be projected as a parameter point cloud on the family of multivariate normal distribution, and then, the K-means algorithm is used on the parameter point cloud to cluster the original data. The data are then classified using their differences in neighborhood densities [18–21].

For the selection of the neighborhood in the point cloud, we use the *k*-nearest neighbor method: that is, for any positive integer *k*, find a *k* Euclidean nearest neighbor of some point in the point cloud. This method can reflect the number density of local point clouds. Next, we introduce the selection method of k-nearest neighbors.

**Definition 8.** *Let Cm* <sup>=</sup> {*pi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* <sup>|</sup> *<sup>i</sup>* <sup>=</sup> 1, 2, ··· , *<sup>m</sup>*} *be a point cloud of scale m, abbreviated C. For any p* ∈ *Cm,*

$$k\text{-N}(p,k) = \left\{ p\_j \in \mathbb{C}\_{\mathfrak{m}}, j \in [i\_1, \dots, i\_k] \mid ||p\_l - p|| \ge ||p\_j - p||\_\prime \,\forall l \notin [i\_1, \dots, i\_k] \right\}$$

*is called the k-nearest neighbor of p in Cm, abbreviated as k-N, and p* ∈ *k-N* ⊆ *C.*

Denote *μ*(*k*-*N*) = *E*[*k*-*N*(*p*, *k*)] − *p* and Σ(*k*-*N*) = Cov[*k*-*N*(*p*, *k*)] as the mean and covariance matrices of the distances between data points in *p* and *N*(*p*, *k*), respectively, thus defining the local statistical map

$$
\Psi\_k: \mathbb{C} \to \mathcal{N}\_{n\_\prime} \tag{8}
$$

where Ψ*k*(*p*) := *f*(*μ*(*k*-*N*), Σ(*k*-*N*)) = √ <sup>1</sup> (2*π*)*<sup>n</sup>* det(Σ) exp −(*x*−*μ*)*T*Σ−1(*x*−*μ*) 2 ! . It is worth noting that we refer to the image of point cloud *C* under the local statistical map Ψ*<sup>k</sup>*

$$k\mathbb{C} := \Psi\_k[\mathbb{C}], k\mathbb{C} \subseteq \mathcal{N}\_n$$

as the parameter point cloud under the k-nearest neighbor method in this paper.
