Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models

Du, Xing; Sun, Yongfu; Song, Yupeng; Xiu, Zongxiang; Su, Zhiming

doi:10.3390/app122010544

Open AccessArticle

Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models

by

Xing Du

^1,2,*

,

Yongfu Sun

³,

Yupeng Song

^1,*,

Zongxiang Xiu

¹ and

Zhiming Su

¹

First Institute of Oceanography, MNR, Qingdao 266061, China

²

College of Environmental Science and Engineering, Ocean University of China, Qingdao 266100, China

³

National Deep Sea Center, Qingdao 266237, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(20), 10544; https://doi.org/10.3390/app122010544

Submission received: 1 September 2022 / Revised: 13 October 2022 / Accepted: 14 October 2022 / Published: 19 October 2022

(This article belongs to the Special Issue High Performance Computing and Artificial Intelligence for Geosciences)

Download

Browse Figures

Versions Notes

Abstract

A submarine landslide is a well-known geohazard that can cause significant damage to offshore engineering facilities. Most standard predicting and mapping methods require expert knowledge, supervision, and fieldwork. In this research, the main objective was to analyze the potential of unsupervised machine learning methods and compare the performance of three different unsupervised machine learning models (k-means, spectral clustering, and hierarchical clustering) in modeling the susceptibility of the submarine landslide. Nine groups of geological factors were selected as the input parameters, which were obtained through field surveys. To estimate submarine landslide susceptibility, all input factors were separated into three or four groups based on data features and environmental variables. Finally, the goodness-of-fit and accuracy of models were validated with both internal metrics (Calinski–Harabasz index, silhouette index, and Davies–Bouldin index) and external metrics (existing landslide distribution, hydrodynamic distribution, and liquefication distribution). The findings of k-means, spectral clustering, and hierarchical clustering performed commendably and accurately in forecasting the submarine landslide susceptibility. Spectral clustering has the greatest congruence with environmental geology parameters. Therefore, the unsupervised machine learning model can be used in submarine-landslide-predicting studies, and the spectral clustering method performed best. Furthermore, machine learning can improve submarine landslide mapping in the future with the development of models and the extension of geological data related to submarine landslides.

Keywords:

submarine landslide; machine learning; hazard susceptibility; spatial distribution

1. Introduction

A submarine landslide is a destructive phenomenon of marine geological disasters. Large-scale submarine landslides can even cause long-distance migration of thousands of cubic kilometers of sediment [1], damage various engineering facilities such as offshore oil production platforms [2] and submarine optical cables [3], and even cause tsunamis [4]. This can result in a number of incidents, such as communication failure and platform collapse, which pose a significant risk to human life and property. Therefore, there is a pressing need for submarine landslide stability before engaging in offshore engineering activities.

At present, the main research directions of submarine landslides include: using high-precision geophysical detection to identify and classify landslide morphology [5,6], carrying out stability calculations of submarine landslides by the numerical analysis method [7,8,9], and simulating the landslide process through physical model tests such as a conventional water tank or centrifuge [10,11]. Despite attempts through the above traditional studies, the research on risk assessment and categorization is still insufficient due to the complicated control circumstances of submarine landslides, the large number of trigger factors, and the difficulty of monitoring.

Machine learning and deep learning techniques have been proven to be powerful and promising tools in many geotechnical applications [12,13,14,15,16]. Chen et al. [12] designed landslide spatial models using maximum entropy, support vector machine, and artificial neural network methods. Tse et al. [15] performed an unsupervised learning approach to study the synchroneity of past events in the South China Sea. Qi and Tang [16] used integrated metaheuristic and machine learning approaches for slope stability prediction. Deep learning convolutional neural networks [17] and support vector machines [18] are also used in landslide detection. Even though the mentioned methods performed well for landslide modeling in a given area, there is no conclusive information about which model is the best for other regions. In addition, the application of the recently developed techniques and methods for a more accurate evaluation of the predictive capability of landslide susceptibility models should be evaluated further.. At present, the main research objects of landslides using machine learning are landslides on land, but few types of research are on submarine landslides. The problem of zoning submarine landslide hazards is still a difficult area for landslide research. On the other hand, machine learning excels at resolving nonlinear problems without the need of explicit mathematical relationships. Therefore, it is essential to investigate if machine learning algorithms can be utilized for zoning undersea landslide hazards and to research how well various machine learning methods perform.

The primary purpose of this study is to offer an integrated strategy for assessing submarine landslide susceptibility that uses unsupervised machine learning models to evaluate landslide risk and partition the affected region. Submarine landslides in the Yellow River Estuary are selected for study and validation of the suggested method. Three machine learning models based on k-means, spectral clustering, and hierarchical clustering are developed and compared for performance evaluation.

2. Study Area Description

The Yellow River Estuary is located in the north of Shandong Peninsula, China (118.73° E–119.65° E, 38.1° N–38.3° N) (Figure 1). In this sea region, the water depth ranges from 0 to 18 m. When the water depth is less than 15 m, the predominant sediment type is silt combined with a minor quantity of silty sand; when the water depth is greater than 15 m, the predominant sediment type is silty clay. The wave height increases gradually with the increase in water depth and reaches the maximum value near the water depth of 10 m. As a consequence, wave forces on the seabed increase first, and then decrease with the increase in water depth. The seabed gradient ranges from 1/2000 to 1/500. Wave-induced liquefaction, which can cause seabed sediment to lose stability and slide, is the main geological disaster in the sea area.

There are two main factors for the selection of the Yellow River Estuary as the study area. First, the detailed geological data of the study area, such as water depth, sediment types, waves, currents, and so on, have been collected by the First Institute of Oceanography, MNR, and can be used in this study. It is of vital importance to obtain detailed geological environment data, as the acquisition of data about submarine landslides is very difficult and incomplete. Second, there are a great many submarine landslides and human activities located in this sea area, which means this study has research significance and practical engineering safety-guidance significance. There are a lot of micro-submarine landslides that have been discovered since the 1990s in the study area. The results of the field geophysical investigation [19] show that the form factor of the submerged delta landslide zone in the Yellow River Estuary is mainly hydrodynamically triggered by submarine liquefaction. Furthermore, the SINOPEC Shengli Oilfield is in this sea area, and as a consequence, hundreds of submarine oil platforms and submarine cables are located in this area.

3. Data and Methods

3.1. Overall Workflow

The main objective of this paper is to use the data of various types of submarine-landslide-hazard impact factors as the basis for regional classification of submarine landslide hazards after data preprocessing and machine learning modeling. The study in this paper can provide an exploration of the hazard classification of global submarine landslides. The workflow of this study is shown in Figure 2, and includes the following specific steps:

Data collection.

The data of this study are obtained from long-term geophysical surveys and in situ monitoring in the study area by the First Institution of Oceanography, MNR [20]. After data collection, we should determine suitable geological factors as submarine-landslide-influencing parameters. Suitable impact parameters should be able to cover both the geological, hydrological, and human influential factors of landslides and be more readily available as common parameters.

Influencing parameter determination.

After determining the factors influencing the submarine landslides, each data point was interpolated separately so that the values of all influencing factors were included at each study site in the study area. Subsequently, the data were classified into several categories according to the characteristics of different influencing factors, and the category distribution maps were drawn.

Classification-level data extraction.

The coordinate values of the study-point locations are defined and the categories of each factor are extracted from the impact-factor classification map obtained in the previous step. The coordinate values of the study-point locations are defined and the categories of each factor are extracted from the impact-factor classification map obtained in the previous step. Thus, each study point corresponds to multiple classes of impact-factor parameters that can be used in the next step of unsupervised machine learning model training.

Establishing unsupervised machine learning models.

Establish 3 unsupervised machine learning models using k-means, spectral clustering, and hierarchical clustering. The different parameters in the model are first modeled several times, and subsequently, the model parameter with the best prediction is selected to build the final established model.

Accuracy comparison.

Compare the accuracy and rationality of different predicting results and choose the best one. Both the mathematical test metrics and the measured geological conditions should be used to test the models’ accuracy. The Calinski–Harabasz index, silhouette index, and Davies–Bouldin index are used to calculate the mathematical accuracies. The liquefication zonation is used to calculate the geological rationality.

Influencing parameter analysis.

Study the importance of all the landslide-influencing parameters by excluding them individually using the best model and test the accuracies with evaluating indicators.

3.2. Landslide-Influencing Parameters

It is very important to choose the suitable influencing factors for submarine landslide assessment. There is no absolute standard parameter when classifying the hazards of submarine landslides. This issue remains one of the difficult problems in the field of research on submarine landslides. The reason is that there are too many influencing factors and it is difficult to obtain the corresponding parameters. From the point of view of geological analysis, geological factors, hydrodynamic factors, topographic factors, and human activities should be taken into account. As much information as possible should be collected to satisfy these four requirements.

In this study, we have selected carefully out of all the various choices available based on the nature of submarine landslide occurrences concerning the characteristics of geology, hydrology, geomorphology, and the impact of human engineering activities. Therefore, 9 factors were selected, namely, sediment type, slope, soil strength, water depth, wave height, maximum current velocity of the bottom, liquefaction, erosion, and human engineering activities (Figure 3). The research data of the 9 factors were obtained by the First Institute of Oceanography, MNR, China through geophysical sounding, drilling, and monitoring surveys in the Chengdao sea area of the Yellow River Estuary [18], and contain detailed information on various geological features of the study area. Each factor was divided into 3 or 4 classes based on the range of data, geological background, and experts’ experience in this study area. At last, 1107 points, of which longitudes vary from 118.75° N to 118.95° N, latitudes change from 38.15° N to 38.28° N, and 0.05 degrees is the interval, were selected as the research sites (Figure 1). All the data used in this study were collected from projects of the First Institute of Oceanography, MNR.

Sediment type plays an important role in the study of submarine landslide susceptibility, as different types of sediments have different physical and mechanical properties, which can affect the difficulty of geological disasters. Studies have shown that sediment type has a large influence on landslide stability [21]. In this study area, sediments are divided into 4 classes: silty sand, silt, silty clay, and clay.

Slope angle is a significant factor in the development of submarine landslide susceptibility and the angles were calculated by the change in water depth. Slope angles are subdivided into 4 categories: <1/2000 radian, 1/2000–1/1000 radian, 1/1000–1/500 radian, and >1/500 radian. The places with large sea-bottom slopes are mainly located at 6 m and 10 m water-depth contours.

Soil strength affects the stability and sliding difficulty of the submarine landslide. The greater the soil strength, the harder the slide occurs. Soil strength is divided into 0–50 kPa, 50–80 kPa, 80–110 kPa, and >110 kPa. The classification of soil strength is mainly based on data from boreholes in this study area.

Water depth, which was obtained by single-beam and multibeam bathymetric instruments, can influence the strength of waves acting on the seabed. It can be classified into 4 classes: 0–5 m, 5–10 m, 10–15 m, and >15 m.

Wave height is a very important factor because it represents the energy that a wave contains. It was collected by pressure wave and tide gauges. The wave height increases first with the depth of water, but there is no obvious increase after reaching a 9 m depth. The study area can be divided into 3 classes, which are 0.5–2.5 m, 2.5–4 m, and >4 m.

The maximum current velocity of the bottom determines the shear stress of the current on the seabed, which may cause erosion. Current velocity increases as the water depth increases and can be classified into 3 classes: 0–0.5 m/s, 0.5–1 m/s, and 1–1.5 m/s.

Liquefaction is the most serious geological hazard in the Yellow River Estuary. There are hundreds of liquefaction zones in the study area, which were discovered by geophysical explorations. Liquefaction zones are mainly distributed between a 6 m to 12 m water depth, where the strength of hydrodynamic action is the strongest. Liquefaction is divided into 4 classes: not easy to liquefy (liquefaction depth < 0.5 m), slightly liquefy (0.5 m < liquefaction depth < 2 m), moderate liquefy (2 m < liquefaction depth < 4 m), and serious liquefy (liquefaction depth > 4 m). Seabed sediments are easy to slide after liquefaction as their bearing capacity reduces greatly.

Erosion is divided into 4 classes, which are the stable zone (<0.02 m/s), slight zone (0.02–0.05 m/s), moderate zone (0.05–0.1 m/s), and serious zone (0.1 m/s). The serious zone and moderate zone are mainly distributed in a water depth of less than 12 m.

Human engineering activities are mainly offshore production platforms, submarine pipelines, cables, and so on in this study area. They can be divided into 4 categories: core zone, buffer zone, potential-impact zone, and no-impact zone. The actual scope of various engineering structures is named the core zone. The buffer zone is the core area extending 500 m outwards. The potential-impact zone is where the buffer zone extends another kilometer outward, and other areas are the no-impact zones.

3.3. Unsupervised Machine Learning Models

Unsupervised machine learning is a major part of machine learning. It can study the intrinsic relationship of datasets without data labels when dealing with practical problems. The main applications of unsupervised learning are: segmenting a dataset by some shared attributes; detecting exceptions that are not suitable for any group; and simplifying datasets by aggregating variables with similar properties. Among these, the objective of this study is to explore the susceptibility of submarine landslides. As a consequence, an important unsupervised machine learning class named clustering, which contains k-means, spectral, and hierarchical clustering, was selected as the study method. Each clustering method was built after parameter selection with internal validation measures: the Calinski–Harabasz index [22], silhouette index [23], and Davies–Bouldin index [24]. The more precise the clustering result, the higher the Calinski–Harabasz score and silhouette index. The lower the Davies–Bouldin index, the more accurate the clustering result. The clustering results’ performances were validated with external validation measures; for instance, liquefaction distribution, hydrodynamic action, slope angle, etc.

3.3.1. k-Means

The k-means clustering algorithm is a typical unsupervised machine learning model that is widely used for the clustering analysis [25] of non-labeled data. The advantage of k-means is that it is easy to implement and visualize the result. The number of clusters is the only parameter that needs to be specified beforehand. To build a k-means model:

(a): Predetermine the clustering number k.
(b): The k-clustering prime points are randomly selected as µ₁, µ₂, …, µ_k.
(c): All the points are assigned to the nearest centroid and clusters are formed. Calculate the distances between every point to the centroid in each cluster.

$d i s t (x, μ) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - μ_{i})}^{2}}$

where x is the sample point; μ is the center of mass of the cluster; n is the number of features in each sample point; and i is each feature of the constituent point x.
(d): Summarize the total distances of all clusters:

$\begin{array}{l} C l u s t e r S u m o f S q u a r e (C S S) = \sum_{j = 0}^{m} {\sum_{i = 0}^{n} (x_{i} - μ_{i})}^{2} \\ T o t a l C l u s t e r S u m o f S q u a r e = \sum_{l = 1}^{k} C S S_{l} \end{array}$

where m is the number of samples in a cluster and j is the number of each sample.
(e): Calculate the minimum quadratic error from the data point to the center of each cluster, and move the center to the point.
(f): Repeat the calculation from step (c) until the total cluster sum of squares does not change or reach the maximum iteration times.

3.3.2. Spectral Clustering

Spectral clustering is another unsupervised machine learning model, which clusters through the characteristic vector of the Laplacian matrix of sample data. Spectral clustering maps data from a high-dimensional space to low-dimensional space, and then, uses other clustering algorithms to cluster in a low-dimensional space. Compared with k-means, spectral clustering uses a dimension reduction algorithm, which is more suitable for high-dimensional data processing and more effective for sparse data processing. Spectral clustering outputs clusters

A_{1}, A_{2}, \dots, A_{n}

by inputting n sample points

X = {x_{1}, x_{2}, \dots, x_{n}}

and the number of clusters k. In this model, the kernel function parameter and cluster number are the influential parameters. The specific steps are below:

(a): Calculate the similarity matrix $W$ of $n * n$ , which includes the minimum proximity method, k-proximity method, and full-connection method. The full-connection method used in this study is as described:

$s_{i j} = s (x_{i}, x_{j}) = \sum_{i = 1, j = 1}^{n} \exp \frac{- {‖x_{i} - x_{j}‖}^{2}}{2 σ}$

where $s_{i j}$ = the similarity matrix and $σ$ = kernel function parameter, which controls the neighborhood width of the sample point.
(b): Calculation matrix D:

$d_{i} = \sum_{j = 1}^{n} ω_{i j}$

where D is the n ∗ n diagonal matrix formed with $d_{i}$ .
(c): Calculate the Laplacian matrix $L = D - W$ .
(d): Calculate the characteristic value of D and sort it from small to large, then take the first k characteristic values and calculate their feature vector $u_{1}, u_{2}, \dots, u_{n}$ .
(e): Form the matrix $U = {u_{1}, u_{2}, \dots, u_{n}}, U \in R^{n + k}$ .
(f): Let $y_{i} \in R_{k}$ be the vector of the line $i$ of $U$ , $i = 1, 2, \dots, n$ .
(g): Cluster the datasets $Y = {y_{1}, y_{2}, \dots, y_{n}}$ into clusters $C_{1}, C_{2}, \dots, C_{k}$ .
(h): Output clusters $A_{1}, A_{2}, \dots, A_{k}$ , among which $A_{i} = {j| y_{i} \in C_{i}}$ .

3.3.3. Hierarchical Clustering

Hierarchical clustering is another unsupervised algorithm that is based on hierarchical methods. When using hierarchical clustering, each object is regarded as a cluster, and then, the clusters are merged step by step according to some rules so that the number of cluster classes is reached. The advantages are: the similarity of distance and rule is easy to define and is limited, and the hierarchical relationship of classes can be found and can be clustered into other shapes. Meanwhile, the disadvantages are: the computational complexity is too high, a singular value can also have a great influence, and the algorithm is likely to cluster into chains. To build a hierarchical clustering model:

(a): Each object is regarded as a class, and the minimum distance between two objects is calculated;
(b): The two classes with the smallest distance are combined into a new class;
(c): Recalculate the distance between the new class and all classes;
(d): Repeat (a) and (b) until all classes are finally merged into one class.

The effective parameters are cluster number, linkage, and affinity, of which linkage contains Ward, average, and complete; and affinity contains Euclidean, Manhattan, and cosine. Ward can only be combined with Euclidean when averaged, which provides the best performance but a large computation, and can be combined with Manhattan and cosine. We should study the parameters and validate the result before building the final hierarchical clustering model.

3.4. Validation

Cluster results of k-means, spectral clustering, and hierarchical clustering validated with different measures are shown in Figure 4. It can be seen from (1)~(6) and (10)~(12) that all the three cluster methods performed best when the cluster number is 2, and decreased when the cluster number increased. In the mathematics view, the data should be divided into two classes as they performed best. However, we should consider our geological needs as an important classification factor as well. This sea area can be divided into two classes, with the seabed sediments mutating at about a 14 m water depth.

Nevertheless, we need a more accurate range of submarine landslide susceptibility with acceptable mathematical accuracy. The cluster number of 3 is still insufficient to classify the study region because the classification result may be easily inferred from the hydrodynamic characteristics. As a result, the final cluster number is 4, which corresponds to the classifications very high, high, low, and very low.

As seen from (7) to (9), the clustering result performed best when gamma was equal to 0.01 with the Calinski–Harabasz index and gamma equal to about 1 using the silhouette index and Davies–Bouldin index. Therefore, the kernel function parameter gamma is 1. It is shown in (13)~(15) that the result performed best when the affinity method is Manhattan. The specific parameters used in the three cluster models can be seen in Table 1.

All machine learning calculations in the article were carried out using the scikit-learn machine learning package [26], an open-source python library, via a laptop with windows 10, 16G RAM, CPU R 5800H, and GPU 3060.

4. Results

This section outlines the clustering results of the submarine landslide susceptibility using k-means, spectral, and hierarchical models after parameter validation and selection. The study area was divided into four areas of different submarine landslide susceptibilities without real geological labels. To define the final labels, all the geological parameters should be considered. As mentioned in Section 3.2, the hydrodynamic force to the seabed increases first with the deepening of water depth and then decreases. Moreover, the sediment type changes from silt to clay, which is difficult to be influenced, when the water depth is deeper than 15 m. Therefore, the submarine landslide susceptibility labeling principle of this area is that: the least serious part is the clay region; the second least serious part is the shallow water region; the most serious part is around the 10 m bathymetric contour; and the second most serious part is beside or around it.

As shown in Figure 5, the study area was divided into four parts of submarine landslide susceptibility using the k-means model. The very high-susceptibility part is located at a water depth of 5–11 m, and the edge is a discrete distribution. The high-susceptibility part is distributed at a water depth of 11–13 m. The low-susceptibility part is situated at a water depth of less than 5 m. The very low-susceptibility part is located at a water depth deeper than 13 m.

It can be seen from Figure 5 that the study area was divided into four parts of submarine landslide susceptibility using the spectral clustering model. The very high-susceptibility part is located at a water depth of 5–12 m. The high-susceptibility part is distributed at a water depth of 12–15 m. The low-susceptibility part is situated at a water depth of less than 5 m. The very low-susceptibility part is located at a water depth deeper than 15 m. Compared with the results obtained by k-means, the distribution of submarine landslide susceptibility using spectral clustering is more continuous than that using the k-means algorithm. Furthermore, the very high-susceptibility part is wider and the very low part is narrower in Figure 6 than in Figure 5.

As shown in Figure 7, the study area was divided into four parts of submarine landslide susceptibility using the hierarchical clustering model. The very high-susceptibility part is located at a water depth of 5–8 m. The high-susceptibility part is distributed at a water depth of 8–13 m. The low-susceptibility part is situated at a water depth of less than 5 m, and the very low-susceptibility part is located at a water depth deeper than 13 m. By comparing the results of the three algorithms, we can find that the very high-susceptibility part obtained by using the hierarchical clustering model is much less than other two of the other methods.

Generally, the low- and very low-susceptibility parts of these three methods are very close. The main differences are in the size and distribution of very high and high areas. The three unsupervised machine learning methods obtained two main common parts: one is a low-hazard region at shallow depths of 5 m; the other is a very low-hazard region at depths of 13 m. The reason for this phenomenon is that the hydrodynamic conditions are relatively weak in these two parts of the area, and the influence of various geological-impact parameters on submarine landslides is also small; thus, the results obtained by different algorithms are more consistent.

5. Discussion

5.1. Model Performance Comparison

To test the performance of cluster results, both internal validation measures and external validation measures were used. For internal validation measures, we validated the accuracy of three different submarine landslide susceptibility models by using the Calinski–Harabasz index, silhouette index, and Davies–Bouldin index. The three indexes use different algorithms to examine the merits of the classification results from a mathematical perspective. As shown in Figure 8, the k-means model performed best under the evaluation of the Calinski–Harabasz index, whereas the spectral clustering model performed best when evaluating the silhouette index and Davies–Bouldin index. Therefore, spectral clustering has the best performance compared to k-means and hierarchical clustering in the internal validation measures.

In general, external causes of submarine landslides include earthquakes, gas hydration, wave action, volcanic activity, tsunamis, etc. There are no severe geological phenomena such as earthquakes, tsunamis, and volcanoes in the study area. The main external influence factor is the liquefaction of seabed soil caused by waves. Therefore, for the external validation measure, the three distributions of submarine landslide susceptibility were compared with the distribution of liquefaction depth (Figure 9). As shown in Figure 9, the area with the deepest liquefaction depth (ellipse A) is distributed at a water depth of 6–12 m. The area of ellipse A agrees well with the submarine landslide susceptibility results using k-means and spectral clustering, whereas the result obtained using hierarchical clustering is quite different from ellipse A. The area with a deep liquefaction depth (ellipse B) is distributed at a water depth of 12–15 m. It is in good agreement with the result obtained using hierarchical clustering, and the areas obtained by k-means and hierarchical clustering are different from the area of ellipse B. The area with a small, shallow liquefaction depth (ellipse C) is distributed at a water depth of less than 5 m, which agrees well with all the clustering results. The area with a very shallow liquefaction depth (ellipse D) is distributed at a water depth deeper than 15 m. It is in good agreement with the result obtained using hierarchical clustering, and the areas obtained by k-means and hierarchical clustering are larger than the area of ellipse D. As a result, the submarine landslide susceptibility results using spectral clustering performed better than those obtained using k-means and hierarchical clustering in the external validation.

In conclusion, all three models used in this study are capable of producing correct clustering results, with the spectral clustering model being the most precise when grouping the undersea landslide susceptibility. Although the best algorithm can be obtained by internal verification methods, the scores of different algorithms are not different enough to represent the difference. To obtain the best results, it is still necessary to combine external verification methods at the same time.

5.2. Comparison of Model Results with Other Studies

In order to verify the accuracy of the model calculations in this paper, comparisons with the results of other studies are still needed. Therefore, we selected the results of a geophysical survey and the results of traditional GIS-based landslide analysis methods to compare and analyze with the conclusions of this paper.

In the 1980s, a comprehensive, integrated geophysical survey was conducted in the Yellow River Estuary waters [19]. The results of the survey showed that a large number of microslides on the leading edge of the delta existed at a water depth between 4 and 12 m in the study area. It can be seen from Figure 5 that the classification of landslides as hazard results using the spectral clustering algorithm shows that the areas with very high hazards are concentrated at a water depth range of 5–12 m. The simulation results are highly consistent with the actual survey results, indicating the accuracy and reliability of the algorithm in this paper.

As for the conventional approach used to analyze submarine landslide hazards, the GIS-based analyzing method was also used to study the stability of submarine landslides in the Yellow River Estuary [27]. Results show that the most prone landslide areas in the study area are located at a water depth between 8 and 13 m, and the more prone landslide areas are located at a water depth between 5 and 15 m, with an average water depth of 10 m. The range of landslide-hazard areas derived from GIS results is consistent with the overall distribution range and trend compared with those derived in this paper, and the range of landslide hazards is slightly smaller. The model calculation results in this paper are closer to the results of the actual geophysical survey compared with the GIS results.

Therefore, it is clear that the unsupervised machine learning method used in this paper has high reliability and stability after being compared with the traditional methods of submarine-landslide-hazard analysis and the actual geophysical survey results in the field.

Although just one research area was used for the study of submarine landslide hazards, the general geological formation conditions and triggering mechanisms of submarine landslides in all study regions are comparable, despite the differences in causes and triggering variables. As a result, this paper’s research technique and research hypothesis can serve as a guide for the global study of submarine landslide hazards.

5.3. Importance of Landslide-Influencing Factors

To figure out the significance of influencing elements, each factor was eliminated and the results were recalculated. New cluster results with different factors were compared with the original clustering results using the Calinski–Harabasz index, silhouette index, and Davies–Bouldin index. Nine clustering results correspond to the missing influence factors and the evaluation scores were normalized. The higher the normalized ratio is, the less important this factor is. The order of importance in this study does not represent the absolute order of the corresponding impact factors, but only represents the relative order in the study area.

Test results can be seen in Figure 10, where the CH represents the normalized ratio of the Calinski–Harabasz index and the SI means the normalized ratio of the silhouette index. The DAV is obtained by 2 minus the normalized ratio of the Davies–Bouldin index so that the trend is the same as the other two indexes. As seen in Figure 10, the model without liquefaction obtained the lowest score, which means liquefaction is the most important factor that influences landslides in the study area. Models that exclude water depth, wave height, and soil strength obtained higher scores than those that exclude liquefaction, which means they are the second most important factors. These three elements may affect whether liquefaction happens, but none of them can predict it on their own. Consequently, these components are less significant than liquefaction, but more significant than other factors. As for sediment type, erosion, and maximum current velocity of the bottom, they are less important than the factors mentioned before. There is a great correlation between sediment types and soil strength. However, the physical and mechanical properties of the same sediment may be different because of the different depositional states and times. Therefore, soil strength has a greater impact on landslides. Erosion and maximum current velocity of the bottom have certain influences on the stability and strength of sediments, but are not decisive; therefore, the importance degree is relatively low. The least important factors are slope angle and human engineering activities in this study area. Slope angle has an important influence on the landslide according to previous studies [28,29], but the difference in slope angle in this study area is not large enough, and thus, the influence degree is low. As described in Section 3.2, the biggest slope angles are >1/500 radian, which is about equal to 0.11°. However, Hance [30] counted 399 cases of submarine landslide slope angles and found that the most frequent value was 3~4° and the average value was 5.8°. These data are far larger than the slope value in our study area, and the average value is 50 times the maximum value in the region; thus, the slope angle is negligible and is very unimportant in the landslide evaluation of the study area. As for human engineering activities, some engineering activities, such as hydrate exploitation [31,32], can play an important role in the formation of submarine landslides, but the engineering activities in the study area are mainly offshore platforms and submarine pipelines, which are widely distributed, and therefore, their importance is very low and can be ignored.

In conclusion, various influencing variables have varying degrees of impact influence on submarine landslide risk assessment, and the significance relies on the degree of correlation between the landslide and its distribution in the study area. The order of importance and degree of effect of variables acquired in this research only reflect one study area; the techniques described in Section 5.2 must be employed to conduct particular analyses in other study areas.

6. Conclusions

In this paper, a submarine landslide susceptibility assessment was carried out using unsupervised machine learning models in the Yellow River Estuary, China. Nine influential factors were selected to analyze the susceptibility of submarine landslides based on terrain data and remote sensing images. We used different unsupervised machine learning models to classify landslide risk and discussed the accuracy of the model and the importance of a single factor. The main conclusions are as follows:

(1): Unsupervised machine learning models can be used to study and assess submarine landslide susceptibility and provide high accuracy.
(2): Results using the spectral clustering method have the highest accuracy among k-means, spectral clustering, and hierarchical clustering after testing with both internal validation measures and external validation measures.
(3): In this study area, the order of importance of submarine-landslide-influencing factors is as follows: liquefaction, water depth, wave height, soil strength, sediment type, erosion, maximum current velocity of the bottom, slope angle, and human engineering activities. In different research areas, the importance of each impact factor is different, which needs specific analysis.

Due to the complexity of the elements impacting the submarine-landslide-hazard triggers and the difficulties of monitoring submarine landslide sites, it is challenging to determine the precise location and hazard information of each landslide in the research region. Currently, unsupervised machine learning can only be conducted using limited data for semiquantitative description. For a more in-depth examination of submarine landslide hazards, accurate training data is necessary. In the future, we must, therefore, place a greater emphasis on the collection of field data for undersea landslide identification and monitoring.

Author Contributions

Conceptualization, Y.S. (Yongfu Sun); methodology, X.D.; program, X.D.; validation, Y.S. (Yupeng Song) and Z.X.; resources, Z.S.; data curation, Z.S.; writing—original draft preparation, X.D.; writing—review and editing, X.D.; visualization, X.D.; supervision, Y.S. (Yupeng Song); project administration, X.D. and Y.S. (Yongfu Sun); funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Foundation item: The National Natural Science Foundation of China under contract NO. 42102326; the Basic Scientific Fund for National Public Research Institutes of China under contract NO. GY0222Q05; and The Shandong Provincial Natural Science Foundation, China under contract NO. ZR2020QD073.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Micallef, A.; Masson, D.G.; Berndt, C.; Staw, D. Morphology and mechanics of submarine spreading: A case study from the Storegga Slide. J. Geophys. Res. 2007, 112, 739. [Google Scholar] [CrossRef]
Liu, J.; Tian, J.; Ping, Y. Impact forces of submarine landslides on offshore pipelines. Ocean Eng. 2015, 95, 116–127. [Google Scholar] [CrossRef]
Mosher, D.C.; Monahan, P.A.; Barrie, J.V.; Courtney, R.C. Coastal Submarine Failures in the Strait of Georgia, British Columbia: Landslides of the 1946 Vancouver Island Earthquake. J. Coast. Res. 2004, 20, 277–291. [Google Scholar] [CrossRef]
Michael, A.F.; William, R.N.; Greene, H.G.; Homa, J.L.; Ray, W.S. Geology and tsunamigenic potential of submarine landslides in Santva Barbara Channel, Southern California. Mar. Geol. 2005, 224, 1–22. [Google Scholar] [CrossRef]
Wang, W.; Wang, D.; Wu, S.; Völker, D.; Zeng, H.; Cai, G.; Li, Q. Submarine landslides on the north continental slope of the South China Sea. J. Ocean Univ. China 2018, 17, 83–100. [Google Scholar] [CrossRef]
Ilstad, T.; De Blasio, F.V.; Elverhøi, A.; Harbitz, C.B.; Engvik, L.; Longva, O.; Marr, J.G. On the frontal dynamics and morphology of submarine debris flows. Mar. Geol. 2004, 213, 481–497. [Google Scholar] [CrossRef]
El-Ramly, H.; Morgenstern, N.R.; Cruden, D.M. Probabilistic slope stability analysis for practice. Can. Geotech. J. 2002, 39, 665–683. [Google Scholar] [CrossRef]
Griffiths, D.V.; Lane, P.A. Slope stability analysis by finite elements. Géotechnique 1999, 49, 387–403. [Google Scholar] [CrossRef]
Ijaz, N.; Ye, W.; Rehman, Z.; Dai, F.; Ijaz, Z. Numerical Study on Stability of Lignosulphonate-Based Stabilized Surficial Layer of Unsaturated Expansive Soil Slope Considering Hydro-Mechanical Effect. Transp. Geotech. 2022, 32, 100697. [Google Scholar] [CrossRef]
Bradshaw, A.S.; Tappin, D.R.; Rugg, D.A. The Kinematics of a Debris Avalanche on the Sumatra Margin. Int. Symp. Submar. Mass Mov. Conseq. 2010, 28, 117–125. [Google Scholar]
Schofield, A.N. Use of centrifugal model testing to assess slope stability. Rev. Can. Géotechnique 2011, 15, 14–31. [Google Scholar] [CrossRef]
Chen, W.; Pourghasemi, H.R.; Kornejady, A.; Zhang, N. Landslide spatial modeling: Introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques. Geoderma 2017, 305, 314–327. [Google Scholar] [CrossRef]
Marjanović, M.; Bajat, B.; Abolmasov, B.; Kovačević, M. Machine Learning and Landslide Assessment in a GIS Environment. In GeoComputational Analysis and Modeling of Regional Systems; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Pham, B.T.; Prakash, I.; Bui, D.T. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees. Geomorphology 2018, 303, 256–270. [Google Scholar] [CrossRef]
Tse, K.C.; Chiu, H.; Tsang, M.; Li, Y.; Lam, E.Y. An unsupervised learning approach to study synchroneity of past events in the South China Sea. Front. Earth Sci. 2019, 13, 628–640. [Google Scholar] [CrossRef]
Qi, C.; Tang, X. Slope stability prediction using integrated metaheuristic and machine learning approaches: A comparative study. Comput. Ind. Eng. 2018, 118, 112–122. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.; Tiede, D.; Aryal, J. Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Zhu, A.X. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the Wuning area (China). Environ. Earth Sci. 2017, 76, 652. [Google Scholar] [CrossRef]
Yang, Z.S.; Chen, W.; Chen, Z. Subaqueous Landslides system in the Huanghe River (Yellow River) Delta. Oceanol. Limnol. Sinica 1994, 20, 573–581. Available online: http://en.cnki.com.cn/Article_en/CJFDTOTAL-HYFZ199406000.htm (accessed on 31 August 2022).
Sun, Y.F.; Hu, G.H.; Song, Y.P. Study on the Key Technology of Prediction, Evaluation and Prevention of Offshore Submarine Geological Hazards; First Institute of Oceanography, MNR: Qingdao, China, 2016. [Google Scholar]
Mahmood, K.; Kim, J.M.; Ashraf, M.; Ziaurrehman. The Effect of Soil Type on Matric Suction and Stability of Unsaturated Slope under Uniform Rainfall. KSCE J. Civ. Eng. 2016, 20, 1294–1299. [Google Scholar] [CrossRef]
Łukasik, S.; Kowalski, P.A.; Charytanowicz, M.; Kulczycki, P. Clustering using flower pollination algorithm and Calinski-Harabasz index. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 2724–2728. [Google Scholar] [CrossRef]
Thangavel, K.; Aranganayagi, S. Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivakasi, India, 13–15 December 2007; pp. 13–17. [Google Scholar] [CrossRef]
Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 1, 224–227. [Google Scholar] [CrossRef]
Macqueen, J. Some Methods for Classification and Analysis of Multi Variate Observations. Proc Berkeley Symp. Math. Stat. Probab. 1965, 5, 281–297. Available online: https://www.docin.com/p-542657058.html (accessed on 31 August 2022).
Pedregosa, F. Scikit-learn: Machine Learning in Python. JMLR 2011, 12, 2825–2830. [Google Scholar]
Xiao, P.; Li, A.L. Prediction of Submarine Landslide Stability Based on GIS in the Yellow River Subaqueous Delta. Geol. Sci. Technol. Inf. 2016, 35, 221–226. [Google Scholar]
Sun, Y.F.; Huang, B.L. A Potential Tsunami impact assessment of submarine landslide at Baiyun Depression in Northern South China Sea. Geoenviron. Disasters 2014, 1, 1–11. [Google Scholar] [CrossRef][Green Version]
Masson, D.G.; Harbitz, C.B.; Wynn, R.B.; Pedersen, G.; Løvholt, F. Submarine landslides: Processes, triggers and hazard prediction. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2006, 364, 2009–2039. [Google Scholar] [CrossRef]
James, J.J. Development of a Database and Assessment of Seafloor Slope Stability Based on Published Literature. Ph.D. Thesis, University of Texas at Austin, Austin, TX, USA, 2003. [Google Scholar]
Gan, H.Y.; Wang, J.S.; Hu, G.W. Submarine Landslide Related to Natural Gas Hydrate within Benthal Deposit. J. Seismol. 2004, 24, 177–181. [Google Scholar] [CrossRef]
Jiang, M.J.; Sun, C.; Crosta, G.B.; Zhang, W.C. A study of submarine steep slope failures triggered by thermal dissociation of methane hydrates using a coupled CFD-DEM approach. Eng. Geol. 2015, 190, 1–16. [Google Scholar] [CrossRef]

Figure 1. Location map of the study area.

Figure 2. The general concept of the research method.

Figure 3. Location map of the study area. (a) Sediment type; (b) slope angle; (c) soil strength; (d) water depth; (e) wave height; (f) current velocity; (g) liquefaction; (h) erosion; (i) human engineering activities; (j) overlay map.

Figure 4. Cluster results of k-means, spectral clustering, and hierarchical clustering were validated with the Calinski–Harabasz index, silhouette index, and Davies–Bouldin index. (1)~(3) Validate cluster numbers of k-means. (4)~(6) Validate cluster numbers of spectral clustering. (7)~(9) Validate kernel function parameter of spectral clustering. (10)~(12) Validate cluster numbers of hierarchical clustering. (13)~(15) Validate affinity of hierarchical clustering. The red circle indicates the location where the model achieved the best prediction.

Figure 5. Distribution of submarine landslide susceptibility using k-means. The different colors represent different submarine landslide susceptibilities.

Figure 6. Distribution of submarine landslide susceptibility using spectral clustering.

Figure 7. Distribution of submarine landslide susceptibility using hierarchical clustering.

Figure 8. Comparison of clustering results validated with different internal validation measures. (a) Performance of different models under Calinski–Harabasz validation methods. (b) Performance of different models under silhouette validation methods. (c) Performance of different models under Davies–Bouldin validation methods. The higher the Calinski–Harabasz index and silhouette index are, the more accurate the clustering result will be. The lower the Davies–Bouldin index is, the more accurate the clustering result will be.

Figure 9. Distribution of liquefaction depth in the Yellow River Estuary. The area of ellipses A, B, C, and D represents the very deep, deep, shallow, and very shallow parts of liquefaction depth, respectively.

Figure 10. The normalized score of clustering results changed with different influence parameters. The normalized results were obtained by dividing the new scores of eight parameters by the scores of nine parameters.

Table 1. Parameters used in different models.

Model	Cluster Number	Gamma	Affinity	Linkage
k-means	4	None	None	None
Spectral	4	1	None	None
Hierarchical	4	None	Manhattan	Average

The parameters are not necessary for the model when it shows “None”.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, X.; Sun, Y.; Song, Y.; Xiu, Z.; Su, Z. Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models. Appl. Sci. 2022, 12, 10544. https://doi.org/10.3390/app122010544

AMA Style

Du X, Sun Y, Song Y, Xiu Z, Su Z. Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models. Applied Sciences. 2022; 12(20):10544. https://doi.org/10.3390/app122010544

Chicago/Turabian Style

Du, Xing, Yongfu Sun, Yupeng Song, Zongxiang Xiu, and Zhiming Su. 2022. "Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models" Applied Sciences 12, no. 20: 10544. https://doi.org/10.3390/app122010544

APA Style

Du, X., Sun, Y., Song, Y., Xiu, Z., & Su, Z. (2022). Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models. Applied Sciences, 12(20), 10544. https://doi.org/10.3390/app122010544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Submarine Landslide Susceptibility and Spatial Distribution Using Different Unsupervised Machine Learning Models

Abstract

1. Introduction

2. Study Area Description

3. Data and Methods

3.1. Overall Workflow

3.2. Landslide-Influencing Parameters

3.3. Unsupervised Machine Learning Models

3.3.1. k-Means

3.3.2. Spectral Clustering

3.3.3. Hierarchical Clustering

3.4. Validation

4. Results

5. Discussion

5.1. Model Performance Comparison

5.2. Comparison of Model Results with Other Studies

5.3. Importance of Landslide-Influencing Factors

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI