*3.3. Unsupervised Machine Learning Models*

Unsupervised machine learning is a major part of machine learning. It can study the intrinsic relationship of datasets without data labels when dealing with practical problems. The main applications of unsupervised learning are: segmenting a dataset by some shared attributes; detecting exceptions that are not suitable for any group; and simplifying datasets by aggregating variables with similar properties. Among these, the objective of this study is to explore the susceptibility of submarine landslides. As a consequence, an important unsupervised machine learning class named clustering, which contains k-means, spectral, and hierarchical clustering, was selected as the study method. Each clustering method was built after parameter selection with internal validation measures: the Calinski–Harabasz index [22], silhouette index [23], and Davies–Bouldin index [24]. The more precise the clustering result, the higher the Calinski–Harabasz score and silhouette index. The lower

the Davies–Bouldin index, the more accurate the clustering result. The clustering results' performances were validated with external validation measures; for instance, liquefaction distribution, hydrodynamic action, slope angle, etc.

#### 3.3.1. k-Means

The k-means clustering algorithm is a typical unsupervised machine learning model that is widely used for the clustering analysis [25] of non-labeled data. The advantage of k-means is that it is easy to implement and visualize the result. The number of clusters is the only parameter that needs to be specified beforehand. To build a k-means model:


$$dist(\mathbf{x}, \boldsymbol{\mu}) = \sqrt{\sum\_{i=1}^{n} (\mathbf{x}\_i - \boldsymbol{\mu}\_i)^2}$$

where *x* is the sample point; *μ* is the center of mass of the cluster; *n* is the number of features in each sample point; and *i* is each feature of the constituent point *x*.

(d) Summarize the total distances of all clusters:

$$\begin{array}{c} \text{Clustering } \text{ Sum } \ of \ \text{Square(CSS)} = \sum\_{j=0}^{m} \sum\_{i=0}^{n} \left( \boldsymbol{x}\_{i} - \boldsymbol{\mu}\_{i} \right)^{2} \\ \text{Total } \ \text{Clustering } \ \text{Sum } \ of \ \text{Square} = \sum\_{l=1}^{k} \text{CSS}\_{l} \end{array}$$

where *m* is the number of samples in a cluster and *j* is the number of each sample.


#### 3.3.2. Spectral Clustering

Spectral clustering is another unsupervised machine learning model, which clusters through the characteristic vector of the Laplacian matrix of sample data. Spectral clustering maps data from a high-dimensional space to low-dimensional space, and then, uses other clustering algorithms to cluster in a low-dimensional space. Compared with k-means, spectral clustering uses a dimension reduction algorithm, which is more suitable for highdimensional data processing and more effective for sparse data processing. Spectral clustering outputs clusters *A*1, *A*2, ... , *An* by inputting *n* sample points *X* = {*x*1, *x*2, ..., *xn*} and the number of clusters k. In this model, the kernel function parameter and cluster number are the influential parameters. The specific steps are below:

(a) Calculate the similarity matrix *W* of *n* ∗ *n*, which includes the minimum proximity method, k-proximity method, and full-connection method. The full-connection method used in this study is as described:

$$s\_{ij} = s(\mathfrak{x}\_{i\prime}\mathfrak{x}\_{j}) = \sum\_{i=1, j=1}^{n} \exp\frac{-\left\|\mathfrak{x}\_{i} - \mathfrak{x}\_{j}\right\|^{2}}{2\sigma}$$

where *sij* = the similarity matrix and *σ* = kernel function parameter, which controls the neighborhood width of the sample point.

(b) Calculation matrix *D*:

$$d\_i = \sum\_{j=1}^{n} w\_{ij}$$

where *D* is the *n* ∗ *n* diagonal matrix formed with *di*.


#### 3.3.3. Hierarchical Clustering

Hierarchical clustering is another unsupervised algorithm that is based on hierarchical methods. When using hierarchical clustering, each object is regarded as a cluster, and then, the clusters are merged step by step according to some rules so that the number of cluster classes is reached. The advantages are: the similarity of distance and rule is easy to define and is limited, and the hierarchical relationship of classes can be found and can be clustered into other shapes. Meanwhile, the disadvantages are: the computational complexity is too high, a singular value can also have a great influence, and the algorithm is likely to cluster into chains. To build a hierarchical clustering model:


The effective parameters are cluster number, linkage, and affinity, of which linkage contains Ward, average, and complete; and affinity contains Euclidean, Manhattan, and cosine. Ward can only be combined with Euclidean when averaged, which provides the best performance but a large computation, and can be combined with Manhattan and cosine. We should study the parameters and validate the result before building the final hierarchical clustering model.
