II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping

Xu, Yihang; Wu, Junxi; Zhao, Guoyan; Wang, Meng; Zhou, Xing

doi:10.3390/math12203195

Open AccessArticle

II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping

by

Yihang Xu

,

Junxi Wu

,

Guoyan Zhao

,

Meng Wang

^*

and

Xing Zhou

School of Resources and Safety Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(20), 3195; https://doi.org/10.3390/math12203195

Submission received: 19 September 2024 / Revised: 9 October 2024 / Accepted: 10 October 2024 / Published: 12 October 2024

(This article belongs to the Special Issue Numerical Model and Artificial Intelligence in Mining Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Rock mass discontinuities are an excellent information set for reflecting the geometric, spatial, and physical properties of the rock mass. Using clustering algorithms to analyze them is a significant way to select advantageous orientations of structural surfaces and provide a scientific theoretical basis for other rock mass engineering research. Traditional clustering algorithms often suffer from sensitivity to initialization and lack practical applicability, as discontinuity data are typically rough, low-precision, and unlabeled. Confronting these challenges, II-LA-KM, a learning-augmented clustering algorithm with improved initialization for rock discontinuity grouping, is proposed. Our method begins with heuristically selecting initial centers to ensure they are well-separated. Then, optimal transport is used to adjust these centers, minimizing the transport cost between them and other points. To enhance fault tolerance, a learning-augmented algorithm is integrated that iteratively reduces clustering costs, refining the initial results toward optimal clustering. Extensive experiments on a simulated artificial dataset and a real dataset from Woxi, Hunan, China, featuring both orientational and non-orientational attributes, demonstrate the effectiveness of II-LA-KM. The algorithm achieves a 97.5% accuracy on the artificial dataset and successfully differentiates between overlapping groups. Its performance is even more pronounced on the real dataset, underscoring its robustness for handling complex and noisy data. These strengths make our approach highly beneficial for practical rock discontinuity grouping applications.

Keywords:

cluster analysis; rock discontinuity sets; learning-augmented; center initialization; discontinuity characteristics

MSC:

86-10

1. Introduction

Rock masses are geological formations consisting of various types of rocks that contain weak structural surfaces [1]. These structural surfaces (such as bedding planes, joints, faults, fractures, etc.) segment the rock mass, resulting in mechanical properties that are discontinuous, non-uniform, and anisotropic [2]. Generally, the properties of rock masses, such as their strength, deformability, and permeability, depend more on the characteristics of these structural surfaces than on the attributes of the intact rock within the rock mass [3]. The stability of the rock mass is closely linked to these structural surfaces; for instance, failure of rock slopes often involves structures like faults, folds, and shear zones [4,5]. Therefore, grouping rock mass structures with similar properties aids in understanding the mechanical properties of the rocks and provides theoretical support for the stability analysis of related engineering projects.

The traditional approach to grouping structural surfaces involves the following sequence of steps [6]: (a) acquiring discontinuity data, (b) analyzing the data using joint set rose plots, pole figures, and pole density contour maps, and (c) manually grouping the structural surfaces. A major drawback of this approach is its lack of objectivity. The results depend heavily on the analyst’s expertise, skills, and purposes, leading to potential variations in outcomes from different analysts [7]. To introduce objectivity into the analysis, scholars have applied methods from statistics, mathematics, and computer science to the grouping of rock mass structural surfaces. Cluster analysis, an intersection of these three fields, has significantly improved the objectivity and rationality of rock mass structural surface grouping results. Shanley and Mahtab [8] were the first to apply cluster analysis to rock discontinuity sets. They clustered the orientational data of a porphyry copper deposit using an algorithm that imposed minimal constraints on the derived partitions. Although this method clearly identified the main fracture patterns, these patterns showed consistency across different analysis partitions. This algorithm was refined by incorporating a cluster definition rejection criterion [9], but it remained limited, as it only analyzed the orientational attributes of discontinuities. However, rock mass discontinuities are characterized by both material and spatial properties. Thus, relying solely on orientation data cannot fully capture the features of discontinuities. To address this, scholars have explored new clustering algorithms based on the ISRM’s suggested methods, which consider orientation, spacing, persistence, roughness, wall strength, aperture, filling, seepage, number of sets, and block size as crucial features of rock mass discontinuities [10]. Dershowitz et al. [11] proposed a probabilistic assignment algorithm. This algorithm delineates and segments fracture sets by randomly reallocating fractures to different sets based on their similarity to previously assigned fractures. It is important to note that this assessment of similarity includes non-orientational attributes in the classification of fracture sets.

As algorithmic research advances, a diverse range of clustering analysis methodologies has emerged. Two prominent representatives in this field are the classic K-means (KM) and fuzzy C-means (FCM) algorithms. Subsequent research and developments [12,13] have largely built upon these foundational algorithms, exploring variants and enhancements to further refine clustering techniques. For example, Hammah and Curran [14] introduced the fuzzy K-means algorithm, also known as the FCM algorithm, which autonomously identifies connector sets without preconceived assumptions about the number of joint sets, effectively categorizing the dataset based on auxiliary information. Zhou and Maerz [15] proposed a clustering analysis method based on multiple parameters.

These advancements highlight ongoing efforts to address various challenges in clustering algorithms. However, the accuracy of these methods is very sensitive to the selection of initial cluster centers. To address the selection problem, Rodriguez and Sitar [16] utilized a spectral clustering algorithm to identify rock discontinuity sets based on orientations. This method transforms data into a new space for unbiased and accurate initial cluster center selection, then applies the KM algorithm for subsequent clustering processing. However, this method can fall into local optimal solutions. Xu et al. [17] introduced an enhanced FCM algorithm that uses a mutative scale COA approach instead of Picard iterations, often achieving a global optimum and making outcomes less susceptible to initial chaotic variables. Based on a simplified Xie-Beni clustering validity index and the hierarchical clustering method, Ma et al. [18] introduced an enhanced KM clustering approach characterized by accelerated convergence and increased clustering stability. In addition to these mainstream algorithms, Liu et al. [19] proposed a modified affinity propagation algorithm that identifies rock discontinuity sets using a negative sine-squared similarity measure, optimizing clustering by avoiding initial center selection issues. Recently, a method using 3D trace data [20] has also improved discontinuity plane identification, enhancing clustering accuracy. As scholars delve deeper into cluster analysis, intelligent optimization algorithms such as ant colony optimization [21], the firefly algorithm [22], the artificial fish swarm algorithm [23], and the whale optimization algorithm [24,25] have been independently applied to the analysis of discontinuous datasets. Although these methods offer performance improvements, they introduce additional hyperparameters and optimization space, increasing algorithmic complexity, parameter sensitivity, and the risk of converging to local optima.

Despite the increasing capability of current algorithms in processing discontinuous datasets in rock bodies, with improvements in convergence speed and precision, these algorithms still suffer from a significant drawback: high sensitivity to the parameters of cluster centers, which can lead to erroneous results due to incorrect initialization. To address this, a heuristic approach is employed to maximize the distance between initial centers as much as possible. On this basis, the optimal transport algorithm is utilized to minimize the total transport cost between data points and initial centers, further optimizing the initialization results. Moreover, the existing datasets for the dominant grouping of rock structure surfaces are typically unlabeled, which is one of the reasons why it is limited to using clustering algorithms. In such scenarios, it is essential for the algorithms to possess a certain level of fault tolerance. Therefore, a learning-augmented method that allows for a certain error rate in initial clustering results while minimizing clustering costs to generate the final cluster centers is introduced. By integrating these modules, II-LA-KM, a learning-augmented KM clustering algorithm with improved initialization, is presented. Experimental results on artificial and real rock discontinuity datasets indicate the outstanding robustness and fault tolerance of II-LA-KM, advancing the field of rock discontinuity grouping.

2. Methodology

2.1. Preliminaries

2.1.1. Rock Discontinuity Clustering

To thoroughly characterize the geometric, spatial, and physical properties of structural surfaces, this work is built on the integration of experiences from previous works [6,26,27,28,29]. Five features are employed for the task of grouping structural surfaces: dip direction, dip angle, trace length, aperture, and joint infill. Dip direction and dip angle most directly describe the geometric state of a structural plane, making them the most commonly used features in grouping structural planes. Trace length is used to describe the exposed length of a fracture on an observation plane. Aperture determines the opening of fractures within the rock, directly impacting the rock’s permeability and structural strength, making these two features well-suited for characterizing the spatial features of structural planes. Infill within joints can significantly alter the mechanical and hydrological characteristics of the rock mass. Infill materials such as clay or minerals can reduce the rock’s shear strength, affecting the stability and sealing of fractures, which is crucial for assessing and managing groundwater flow, pollutant dispersion, and the overall load-bearing capacity of the rock mass. These factors are direct manifestations of the physical characteristics of structural planes. The comprehensive analysis of these features provides a holistic perspective to understand and predict the behavior of the rock mass under natural conditions or during human interventions, such as mining and construction activities.

The grouping of rock discontinuities can be regarded as a clustering problem [30,31,32]. Representative features of rock discontinuities can form a vector to represent their characteristics,

x \in R^{m}

, where m is the number of features. In this work, five features are adopted, including dip direction

α

, dip angle

β

, trace length l, aperture w, and joint infill

j f

. Since

α

and

β

are spherical coordinates, as demonstrated in Figure 1, they can define a Cartesian space

X_{c} = (x_{c}, y_{c}, z_{c})

as follows:

\begin{matrix} x_{c} = cos α sin β \\ y_{c} = sin α sin β \\ z_{c} = cos β \end{matrix}\}

(1)

where

{x_{c}}^{2} + {y_{c}}^{2} + {z_{c}}^{2} = 1

. Therefore, a discontinuity can be represented by a feature vector

X = (X_{c}, l, w, j f)

.

The KM algorithm is a widely used clustering method that partitions a dataset into k distinct, non-overlapping clusters by minimizing the variance within each cluster. It iteratively assigns data points to the nearest cluster center and updates the centers until convergence. Compared to soft clustering methods like FCM, the KM algorithm is straightforward and effective. However, its performance is highly dependent on the initialization of the cluster centers, which is a primary concern addressed in this paper.

Since spherical coordinates

X_{c}

are used, calculating the spherical distance allows for a more accurate measurement of the physical distance between rock discontinuities, preserving the true geometric characteristics of their curved surfaces. The sphere distance can be defined as follows:

d^{2} (X_{c}, Y_{c}) = 1 - {(X_{c} \cdot Y_{c})}^{2}

(2)

where

X_{c}

and

Y_{c}

are translated Cartesian vectors. The Euclidean distance is used as the measure of similarity for the remaining features:

d^{2} (X, Y) = \sum_{i = 1}^{m} {(X^{i} - Y^{i})}^{2}

(3)

where

X

and

Y

are two feature vectors of rock discontinuity, and

X^{i}

is the ith indicator of

X

.

Normalization in clustering algorithms eliminates the impact of different feature scales, enabling all features to be compared on the same scale, thus improving the accuracy and convergence speed of the algorithm. In the rock discontinuity clustering, the standard normalization is conducted on

\hat{X} = (l, w, j f)

, since

X_{c}

is already translated to a rational range:

{\hat{X}}_{n o r m a l i z e d} = \frac{\hat{X} - μ}{σ}

(4)

where

μ

and

σ

are the average and standard deviation, respectively. The final data distribution after normalization is close to the standard normal distribution

N (0, 1)

.

2.1.2. Optimal Transport

Optimal transport (OT) has been extensively developed and applied in previous research [33,34,35,36,37]. Given two probability measures

μ, ν \in R^{n}

, the optimal transport problem is to find a transport plan

π

that minimizes the cost of transporting mass from

μ

to

ν

. Mathematically, this can be written as follows:

min_{π \in Π (μ, ν)} \int_{R^{n} \times R^{n}} c o s t (x, y) d π (x, y)

(5)

where

c o s t (x, y)

denotes the cost of transporting a unit of mass from x to y, and

Π (μ, ν)

represents all transport plans set between

μ

and

ν

. The defined problem can be resolved using the Sinkhorn algorithm [38]. OT is exploited to improve the clustering center initialization by minimizing the transporting cost between data points and centers.

2.1.3. Learning-Augmented Method

Based on clustering with side information [39,40], the learning-augmented method [41,42] has been proposed for machine learning tasks [43,44,45]. Given a predictor used to output the class of data, the input dataset can be divided into clusters, though this partition is not entirely accurate. The learning-augmented method aims to refine this partition, bringing it closer to the optimal clustering [42]. Specifically, the predicted result is

Γ = Γ_{1} \cup \dots \cup Γ_{k}

, and the optimal clustering is

Γ^{*} = Γ_{1}^{*} \cup \dots \cup Γ_{k}^{*}

. For k clusters, the learning-augmented method aims to minimize the clustering cost in the situation with an error rate of

ϵ

such that:

|Γ_{i} \cap Γ_{i}^{*}| \geq (1 - ϵ) max (|Γ_{i}|, |Γ_{i}^{*}|)

(6)

Though there is no predictor for rock discontinuity, the learning-augmented method can refine the primary clustering result with an error rate of

ϵ

, making our approach robust and fault-tolerant.

2.2. Improved Initialization of Clustering Centers

The naive KM algorithm is highly sensitive to the initial clustering centers. To mitigate this issue, a heuristic initialization method inspired by previous studies [46,47] is introduced. It is started by selecting an initial center point at random from the dataset. After the first center is chosen, the subsequent centers are selected sequentially in an iterative process. During each iteration, the probability of selecting any given point as the next center is weighted by the point’s distance from the already chosen centers. Specifically, points that are farther away from the existing centers have a higher probability of being selected as a new center, which ensures that the centers are spread out across the data space, helping to improve clustering quality. This strategy ensures that the candidate centers

\hat{C}

can comprehensively cover and represent the entire dataset D.

After selecting the candidate centers, optimal transport is further employed to enhance them using the Sinkhorn algorithm [38]. This algorithm iteratively adjusts a cost matrix through row and column normalization to satisfy a doubly stochastic condition. It effectively identifies an optimal transport plan that minimizes transportation costs while adhering to specified constraints, incorporating Shannon entropic regularization to enhance computational stability and streamline efficiency. The cost matrix M is defined as follows:

M [p, q] = \sum_{i = 1}^{m} {(D [p, i] - \hat{C} [q, i])}^{2}

(7)

where p and q are indices of M, and m is the number of features. The modified initialization approach is provided in Algorithm 1.

The improved initialization is illustrated in Figure 2. In the original random initialization, there is a certain likelihood of selecting points from the same cluster as initial centers, especially when some clusters have a large number of points. This type of initialization may cause the algorithm’s subsequent updates to get stuck in local optima, leading to poor clustering results, as reflected in the experimental results shown in Figure 3 and Figure 4 Using a heuristic method to spread out the initial centers can effectively avoid this issue. Moreover, on this basis, The optimal transport algorithm is employed to minimize the transport cost from data points to the initial centers, ensuring that the final initial centers better represent the entire dataset rather than just some simple outliers. Clustering results obtained with such initial centers are more robust.

Algorithm 1 Improved Initialization of Clustering Centers

Require: dataset D and clustering number k
1: Initialize an empty set of candidate centers

\hat{C}

2: Select an initial center

\hat{c_{1}}

randomly from the dataset D and add it to

\hat{C}

3: for

i = 2

to k do
4: For each point

x \in D

, find the minimum squared distance

d {(x, \hat{C})}^{2}

to any center in

\hat{C}

5: Choose a new center

\hat{c_{i}}

from D with probability proportional to

d {(x, \hat{C})}^{2}

6: Add

\hat{c_{i}}

to

\hat{C}

7: end for
8: Compute the cost matrix M between candidate centers in

\hat{C}

and points in D
9: Apply the Sinkhorn algorithm to M to obtain the optimal transport plan

Π

10: Use

Π

to adjust the centers in

\hat{C}

to minimize the overall cost to the dataset D
11: return final initial centers C

2.3. Learning-Augmented Refinement

After being initialized to a better state, the KM clustering algorithm can generate accurate and robust rock discontinuity grouping results. However, due to the inherent limitations of KM, there remains a possibility of erroneous outcomes. This is the motivation for further adopting the learning-augmented method to achieve optimal clustering. Provided dataset D in the size of m, primary clustering result

Γ

, and error-tolerant rate

ϵ

, the average of D is denoted by

\bar{D}

, and the projection onto the j-th dimension of the i-th cluster

Γ_{i}

is denoted by

Γ_{i, j}

. Since there is no access to optimal clustering, the algorithm focuses on filtering out the wrong discontinuities and choosing better centers. Supposing the error rate is

ϵ

, in each iteration, a subset in the size of

(1 - ϵ) m_{i}

with the minimum cost is found. The clustering centers after learning-augmented refinement are updated by the averages of the subsets. The detailed process is provided in Algorithm 2. It is proved that this algorithm can work with

ϵ < \frac{1}{2}

and achieve a

(1 + (\frac{ϵ}{1 - ϵ} + \frac{4 ϵ}{(1 - 2 ϵ) (1 - ϵ)}))

-approximation of optimal clustering [42].

Algorithm 2 Learning-Augmented Refinement

Require: dataset D with m rock discontinuities, primary clustering result

Γ = Γ_{1} \cup \dots \cup Γ_{k}

, and error-tolerant rate

ϵ

1: for

i = 1

to k do
2: for

j = 1

to n do
3: Let

ω_{i, j}

be the collection of all subsets of

(1 - ϵ) m_{i}

discontinuities in

Γ_{i, j}

4: For collection Z, define

c o s t (Z, \bar{Z}) = \sum_{z \in Z} z^{2} - \frac{1}{| Z |} {(\sum_{z^{'} \in Z} z^{'})}^{2}

5:

F_{i, j} \leftarrow {argmin}_{Z \in ω_{i, j}} cost (Z, \bar{Z})

6: end for
7: Let

{\tilde{c}}_{i} = {(\bar{F_{i, j}})}_{j \in [n]}

8: end for
9: return final centers

\tilde{C} = \{\tilde{c_{1}}, \dots, \tilde{c_{k}}\}

The effects of the learning-augmented refinement are demonstrated in Figure 2. When applying clustering algorithms to complex or noisy datasets, the presence of data overlap and outliers may lead to suboptimal clustering results. The learning-augmented algorithm can iteratively identify and correct these errors, with the correction criterion being the optimal clustering that has the minimal clustering cost. Notably, the learning-augmented algorithm remains applicable even when the erroneous points are near the boundary between two clusters, as its objective is to directly minimize the clustering cost. The experimental results provided in Figure 5 verify the effectiveness of this algorithm.

2.4. Overall Algorithm Steps

Based on the improved initialization of Algorithm 1, learning-augment refinement in Algorithm 2, and the KM algorithm, a novel rock discontinuity clustering approach is proposed, II-LA-KM. Given the dataset D, II-LA-KM can generate clustering centers

\tilde{C}

, and each rock discontinuity will be grouped with the closest center. The process steps are as follows and are shown in Figure 2:

Determine the initial clustering number k. If the method is conducted on an artificial dataset, k is fixed; otherwise, it is initialized to 2.
Translate $α$ and $β$ to $X_{c}$ according to Equation (1).
Normalize the dataset D according to Equation (4). The clustering result will be more robust after appropriate normalization since it can avoid the influence of feature magnitude.
Initialize an empty candidate center set $\hat{C}$ and add a random discontinuity in D. Then, sequentially add new centers that are as far away from the existing ones as possible.
Leverage optimal transport to further minimize the transportation cost between centers and the dataset and obtain the final initial centers C.
Apply the KM algorithm to D and C to obtain primary clustering result $Γ = Γ_{1} \cup \dots \cup Γ_{k}$ .
Apply learning-augmented refinement to D, $Γ$ , and error rate $ϵ$ . The refined centers $\tilde{C}$ are considered the final clustering centers, and each discontinuity is grouped with the closest center.
For the real dataset, perform $k = k + 1$ and re-conduct the approach until k reaches the threshold. Determine the final clustering number through evaluation metrics.

3. Results

3.1. Experimental Setup

The experiments are conducted with two rock discontinuity datasets composed of an artificial dataset and a real dataset to conduct comprehensive validation of our proposed II-LA-KM algorithm. For Algorithm 2, the error rates on the artificial dataset and the real dataset are set as 0.05 and 0.15, respectively. The cluster number is 5 for the artificial dataset and is determined via evaluation metrics for the real dataset. To better evaluate the performance of our proposed approach, the clustering results are compared with three widely used and effective baselines, K-means (KM), Fuzzy C-means (FCM), and K-means++ (KM++) [46].

For a general assessment of our method, four metrics that are also used in most related works are exploited. In clustering problems, the quality of clustering results is often reflected by the compactness within clusters and the separation between clusters. The XB Index (XB) [48,49] and Silhouette Score (SS) [50] are introduced to comprehensively measure the above two aspects. As for the artificial dataset, clustering accuracy (ACC) is further used, since the data labels can be accessed. Clustering cost (CC) can directly evaluate the distance of points and centers, which is utilized in the experiment of fault-tolerance. The definitions of the metrics are given below:

X B = \frac{\sum_{i = 1}^{m} {min}_{j} {∥ x_{i} - c_{j} ∥}^{2}}{m \cdot {min}_{k, l} {∥ c_{k} - c_{l} ∥}^{2}}

(8)

S S = \frac{b (i) - a (i)}{max {a (i), b (i)}}

(9)

A C C = \frac{m_{c o r r e c t}}{m}

(10)

C C = \sum_{i = 1}^{n} \sum_{j = 1}^{k} r_{i j} {∥ x_{i} - c_{k} ∥}^{2}

(11)

where

a (\cdot)

and

b (\cdot)

are the average distance between discontinuities within a cluster and the average distance to discontinuities in the nearest cluster,

m_{c o r r e c t}

is the number of discontinuities correctly clustered, and

r_{i j}

is the indicator based on discontinuity x and center c (1 for grouped

x, c

and 0 for others). Note that to ensure the reliability of the results, the experiments are run 10 times under different random seeds, and the best and average values for each metric are recorded.

As mentioned in Section 2.1.1, the dataset consists of five-dimensional features. The proposed algorithm directly utilizes the full set of five-dimensional features for clustering. Therefore, selecting only two dimensions (such as dip and dip direction) for visualization and analysis would be insufficient, as it may result in the loss of important information from other dimensions. To address this issue, Principal Component Analysis (PCA) was applied to reduce the original data to two dimensions (

f_{1}

and

f_{2}

), allowing for a clearer representation of all dimensions while preserving the relationships between data points. In the subsequent experimental results, unless otherwise specified, the default visualization will use the PCA-reduced data.

3.2. Artificial Dataset

An artificial rock discontinuity dataset is created using Monte Carlo random sampling. The dataset contains five groups simulated with certain distributions. Specifically, dip direction

α

and dip angle

β

conform to a bi-normal distribution, trace length l conforms to a normal distribution, aperture w conforms to a uniform distribution, and joint infill

j f

is a constant. The details of the distribution parameters are provided in Table 1. The relation of the rock discontinuity groups is also shown in Figure 6, where the third and fourth groups are intentionally designed to exhibit a certain degree of ambiguity in all features to demonstrate the advantages of our method in complex cases. Note that the joint infill stays the same across groups, since negligible differences of this feature will lead to overly simplistic clustering.

The clustering results and quantitative metrics of the II-LA-KM algorithm and baselines are illustrated in Figure 3 and Table 2 and Table 3. The results after dimension reduction are provided in Figure 3 and Table 2 for straightforwardness. As mentioned in the preliminaries, dip direction and dip angle are representative features of rock discontinuity; they are chosen to illustrate the results for simplicity. As shown in Figure 3 and Table 2, KM and FCM struggle to differentiate between G3 and G4 and even merge them into a single cluster, which in turn causes another group to be split into two parts. It is supposed this is due to inappropriate initialization, highlighting the instability of KM and FCM. Although G1, G2, and G5, which are clearly separated from the other groups, are clustered accurately, KM++ also fails to provide precise clustering results for G3 and G4. In contrast, II-LA-KM successfully resolves the data confounding issue, achieving the best clustering results. For quantitative analysis in Table 3, it can be observed that the metrics of KM and FCM are obviously worse than the others. Though KM++ has an accuracy close to the proposed approach, its XB and SS are both lower, indicating inferior compactness within clusters and separation between clusters. II-LA-KM achieves the best performance across all metrics, demonstrating the strength of our algorithm in producing superior clustering results. Moreover, as indicated in Table 2, contrary to KM and KM++ that favor creating centers near the average point for completely correct clusters, our approach is designed to identify the centers of optimal clustering, which accounts for the enhanced performance of our quantitative results.

Having the labels of rock discontinuities in the artificial dataset, it is feasible to evaluate the fault-tolerant capability of the proposed II-LA-KM. Considering the real cases, noise is added to the dataset by randomly changing the label of the marginal discontinuities inside each group with a set of predefined proportions. Then, Algorithm 2 is applied to these noisy data, and it provides the variance in clustering cost in Figure 5. As indicated in Figure 5, the learning-augmented algorithm can effectively correct errors in the clustering results and move them toward optimal clustering. The improvement becomes more pronounced as the error rate gradually increases to 0.3, but the degree of subsequent improvement diminishes. It is supposed that when the error rate approaches 0.5, the difficulty of correcting errors will significantly increase. This characteristic of fault tolerance makes II-LA-KM perform better in realistic complicated situations, which will be discussed in the next section.

3.3. Real Dataset

The real dataset for this paper is derived from various mining fields within four mining sections of a phosphate mine located in Woxi, Hunan, China. The Woxi of Chenzhou Mining in Hunan is located in the southeastern margin of the Yangtze Block in Western Hunan within a structurally complex zone. The mining area is primarily composed of the Mesoproterozoic and Neoproterozoic Lengjiaxi and Banxi groups, with rock types including sandstone, slate, phyllite, and argillaceous rocks. The geological structure is highly complex, with well-developed fault zones, leading to significant fracturing of the rock mass. In some areas, weathering has further compromised rock stability, making the overall rock mass less stable. The ore body is hosted within the purple–red sandy slate of the Banxi Group, and its occurrence and extraction are strongly influenced by geological structures [51].

The primary focus of data collection includes the sections of roadways and nearby tunnels that have not yet been supported after excavation, particularly those where the surrounding rock exhibits clearly stratified joint layers. Through a two-month field investigation of engineering and hydrogeological conditions, 237 discontinuity data sets were obtained, which included characteristics such as dip direction, dip angle, trace length, aperture, and joint infill. The exploration site is provided in Figure 7. Notably, to ensure the centrality and generalizability of the data features, minor joint fractures were excluded during the data collection process. Since there is no label for the real dataset, the data distribution utilizing the Gaussian kernel density estimation is shown in Figure 8.

The clustering results of the different methods are provided in Figure 4 and Table 4 and Table 5 after the same preprocessing as the artificial dataset. To better illustrate the results, centers with detailed feature values without normalization or PCA are also provided in Table 6. Four is selected as the cluster number according to the performance. It can be demonstrated that II-LA-KM successfully clusters the complex real dataset. In contrast, KM++ and FCM misclassify some discontinuities from G0 to G1, reducing the compactness within G0. KM fails to distinguish between G0, G1, and G2, leading to clustering degradation. Furthermore, unlike the artificial dataset where clusters are distinctly separated, the real dataset features blurred inter-cluster boundaries. This results in KM outperforming FCM, which excels in the artificial dataset experiments, highlighting FCM’s deficiencies in robustness. Table 5 shows the quantitative metrics of the approaches. In the real dataset, which is more complicated than the artificial dataset, II-LA-KM displays an obviously better performance than the baselines across all metrics. This can be attributed to the robustness and fault tolerance of our algorithm, which provides a more pronounced advantage in the real dataset. This suggests that these capabilities are crucial for realistic scenarios.

As mentioned in the experimental setup, the cluster number k is determined by the clustering performance. To conduct a comprehensive evaluation, the XB and SS of different k values (2–6) are measured in II-LA-KM. Table 7 shows that the metrics are distinct when

k = 4

and

k = 5

. It can be further found that XB is obviously higher and the clustering is more defined if

k = 4

, so k is finally set to 4 in the real dataset experiments.

4. Discussion

In clustering algorithms, the number of clusters k is a critical parameter. In this work, the value of k is initially constrained within a predefined range and then determined based on clustering performance metrics. In practical applications, the value of k can also be informed by local geological prior knowledge. During the execution of the clustering algorithm, a fixed maximum number of iterations (set to 1000 in this work) is typically defined according to the size and complexity of the dataset. To accelerate the algorithm, if the difference between the results of two consecutive iterations falls below a specified threshold, the algorithm is allowed to terminate early and return the results.

In the learning-augmented algorithm, the preset error tolerance rate

ϵ

may impact the final results. Intuitively, the tolerance rate is related to the complexity of the data. In this work, the distribution of the artificial dataset is relatively well-defined, allowing for a smaller

ϵ = 0.05

. In contrast, the real dataset exhibits greater complexity, leading to the use of

ϵ = 0.15

. In practical applications, this value can be adjusted flexibly according to the characteristics of the dataset. However, it should be noted that setting

ϵ

too small may limit the algorithm’s ability to correct initial results, while setting it too large may result in incorrect modification of accurate results.

5. Conclusions

An improved clustering method that optimizes initial clustering results and enhances fault tolerance is presented in this paper. The method effectively addresses issues in rock structure surface clustering analysis caused by excessive sensitivity to parameters, as well as inaccuracies and the unlabeled nature of the data, which have previously limited the application of clustering algorithms. The presented method has facilitated the optimization of drilling and blasting operations at the phosphate mine, contributing to enhanced operational efficiency. After applying our method to the mining site in Woxi, Hunan, China, subsequent parameters can be set more scientifically and rationally. Additionally, while ensuring safety, mining efficiency has also improved. In practical applications, due to the high robustness and strong fault tolerance of the proposed method, applying it to large-scale real-world datasets is recommended. This approach is expected to yield significantly better results compared to traditional methods.

There are certain limitations to our proposed method. Since the cluster center initialization and the learning-augmented algorithm are processes that are independent of the clustering, it requires more execution time and computational overhead. This increased complexity can be a drawback, particularly when dealing with large datasets or time-sensitive applications. Additionally, the improvement is often marginal when the initial clustering results are already near-optimal. This suggests that the benefits may be less pronounced in scenarios where traditional clustering methods perform adequately.

Despite these challenges, the potential of the proposed method remains significant. Future work will focus on addressing these limitations by optimizing the initialization and learning-augmented processes to reduce computational demands. Moreover, the advantages of our method in real-world applications will be further explored, where data variability and complexity present unique challenges and opportunities for our approach to demonstrate its strengths. By refining the algorithm and testing it in diverse practical contexts, its efficiency and effectiveness can be enhanced, making it a more robust approach for rock discontinuity groupings.

Author Contributions

Conceptualization, Y.X.; methodology, J.W. and M.W.; software, J.W., Y.X. and M.W.; validation, G.Z., X.Z. and Y.X.; formal analysis, X.Z.; investigation, G.Z.; resources, G.Z.; data curation, J.W. and Y.X.; writing—original draft preparation, Y.X.; writing—review and editing, Y.X., J.W. and M.W.; visualization, X.Z.; supervision, J.W.; project administration, G.Z.; funding acquisition, G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2018YFC0604606).

Data Availability Statement

The datasets utilized in this study can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Priest, S.D. Discontinuity Analysis for Rock Engineering; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1993. [Google Scholar]
Brady, B.H.; Brown, E.T. Rock Mechanics: For Underground Mining; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Giani, G.P. Rock Slope Stability Analysis; CRC Press: Boca Raton, FL, USA, 1992. [Google Scholar]
Guzzetti, F.; Cardinali, M.; Reichenbach, P. The influence of structural setting and lithology on landslide type and pattern. Environ. Eng. Geosci. 1996, 2, 531–555. [Google Scholar] [CrossRef]
Brideau, M.A. The Influence of Tectonic Structures on Rock Mass Quality and Implications for Rock Slope Stability. Master’s Thesis, Simon Fraser University, Burnaby, BC, Canada, 2005. [Google Scholar]
Dong, F.R.; Wang, S.H.; Hou, Q.K. Study of multi-parameter dominant grouping method of rock mass discontinuity based on the principal component analysis. Rock Soil Mech. 2022, 43, 3. [Google Scholar]
Hammah, R.E. Intelligent Delineation of Rock Discontinuity Data Using Fuzzy Cluster Analysis. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2000. [Google Scholar]
Shanley, R.J.; Mahtab, M. Delineation and analysis of clusters in orientation data. J. Int. Assoc. Math. Geol. 1976, 8, 9–23. [Google Scholar] [CrossRef]
Mahtab, M.; Yegulalp, T. A rejection criterion for definition of clusters in orientation data. In Proceedings of the ARMA US Rock Mechanics/Geomechanics Symposium, Berkeley, CA, USA, 25–27 August 1982. [Google Scholar]
Brown, E.T.; International Society for Rock Mechanics. Rock Characterization, Testing & Monitoring: ISRM Suggested Methods; Pergamon Press: Oxford, UK, 1981. [Google Scholar]
Dershowitz, W.; Busse, R.; Geier, J.; Uchida, M. A stochastic approach for fracture set definition. In Proceedings of the ARMA North America Rock Mechanics Symposium, Montreal, QC, Canada, 19–21 June 1996. [Google Scholar]
Shirazi, A.; Hezarkhani, A.; Pour, A.B. Fusion of lineament factor (Lf) map analysis and multifractal technique for massive sulfide copper exploration: The Sahlabad area, East Iran. Minerals 2022, 12, 549. [Google Scholar] [CrossRef]
Shirazi, A.; Hezarkhani, A.; Beiranvand Pour, A.; Shirazy, A.; Hashim, M. Neuro-Fuzzy-AHP (NFAHP) technique for copper exploration using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and geological datasets in the Sahlabad mining area, east Iran. Remote Sens. 2022, 14, 5562. [Google Scholar] [CrossRef]
Hammah, R.; Curran, J. Fuzzy cluster algorithm for the automatic identification of joint sets. Int. J. Rock Mech. Min. Sci. 1998, 35, 889–905. [Google Scholar] [CrossRef]
Zhou, W.; Maerz, N.H. Implementation of multivariate clustering methods for characterizing discontinuities data from scanlines and oriented boreholes. Comput. Geosci. 2002, 28, 827–839. [Google Scholar] [CrossRef]
Jimenez-Rodriguez, R.; Sitar, N. A spectral method for clustering of rock discontinuity sets. Int. J. Rock Mech. Min. Sci. 2006, 43, 1052–1061. [Google Scholar] [CrossRef]
Xu, L.M.; Chen, J.P.; Wang, Q.; Zhou, F.J. Fuzzy C-Means Cluster Analysis Based on Mutative Scale Chaos Optimization Algorithm for the Grouping of Discontinuity Sets. Rock Mech. Rock Eng. 2013, 46, 189–198. [Google Scholar] [CrossRef]
Guo, H.; Kaiser, W.J.; Mocarski, E.S. Manipulation of apoptosis and necroptosis signaling by herpesviruses. Med. Microbiol. Immunol. 2015, 204, 439–448. [Google Scholar] [CrossRef]
Liu, J.; Zhao, X.D.; Xu, Z.H. Identification of rock discontinuity sets based on a modified affinity propagation algorithm. Int. J. Rock Mech. Min. Sci. 2017, 94, 32–42. [Google Scholar] [CrossRef]
Mehrishal, S.; Kim, J.; Song, J.J.; Sainoki, A. A semi-automatic approach for joint orientation recognition using 3D trace network analysis. Eng. Geol. 2024, 332, 107462. [Google Scholar] [CrossRef]
Su, X.L.; Jia, X.J.; Xie, C.D.; Peng, K.C. Preparation of multipartite entangled states used for quantum information networks. Sci. China Phys. Mech. Astron. 2014, 57, 1210–1217. [Google Scholar] [CrossRef]
Song, T.J.; Chen, J.P.; Zhang, W.; Song, S.Y. Clustering Analysis of Dominative Attitudes of Rock Mass Structural Plane Based on Firefly Algorithm. Dongb Daxue Xuebao/J. Northeast. Univ. 2015, 36, 284–287. [Google Scholar]
Wang, S.H.; Ren, Y.P.; Chen, J.Z.; Zhang, Z.S. An Improved Fish Swarm Clustering Algorithm for Structural Grouping. J. Northeast. Univ. 2019, 40, 420. [Google Scholar]
Hemasian-Etefagh, F.; Safi-Esfahani, F. Group-based whale optimization algorithm. Soft Comput. 2020, 24, 3647–3673. [Google Scholar] [CrossRef]
Yi, X.; Feng, W.; Wu, W.; Zhou, Y.; Dong, S. An Effective Approach for Determining Rock Discontinuity Sets Using a Modified Whale Optimization Algorithm. Rock Mech. Rock Eng. 2023, 56, 6143–6155. [Google Scholar] [CrossRef]
Liu, T.; Zheng, J.; Deng, J. A new iteration clustering method for rock discontinuity sets considering discontinuity trace lengths and orientations. Bull. Eng. Geol. Environ. 2021, 80, 413–428. [Google Scholar] [CrossRef]
Liu, Y.; Chen, J.; Tan, C.; Zhan, J.; Song, S.; Xu, W.; Yan, J.; Zhang, Y.; Zhao, M.; Wang, Q. Intelligent scanning for optimal rock discontinuity sets considering multiple parameters based on manifold learning combined with UAV photogrammetry. Eng. Geol. 2022, 309, 106851. [Google Scholar] [CrossRef]
Wu, W.; Feng, W.; Yi, X.; Zhao, J.; Zhou, Y. Sparrow search algorithm-driven clustering analysis of rock mass discontinuity sets. Comput. Geosci. 2024, 28, 615–627. [Google Scholar] [CrossRef]
Hou, Q.; Wang, S.; Yong, R.; Xiu, Z.; Han, W.; Zhang, Z. A method for clustering rock discontinuities with multiple properties based on an improved netting algorithm. Geomech. Geophys. Geo-Energy Geo-Resour. 2023, 9, 23. [Google Scholar] [CrossRef]
Xu, L.; Chen, J.; Wang, Q. Study of method for multivariate parameter dominant partitioning of discontinuities of rock mass. Yantu Lixue/Rock Soil Mech. 2013, 34, 189–195. [Google Scholar]
Li, Y.; Wang, Q.; Chen, J.; Xu, L.; Song, S. K-means algorithm based on particle swarm optimization for the identification of rock discontinuity sets. Rock Mech. Rock Eng. 2015, 48, 375–385. [Google Scholar] [CrossRef]
Wang, J.; Zheng, J.; Lü, Q.; Guo, J.; He, M.; Deng, J. A multidimensional clustering analysis method for dividing rock mass homogeneous regions based on the shape dissimilarity of trace maps. Rock Mech. Rock Eng. 2020, 53, 3937–3952. [Google Scholar] [CrossRef]
Laclau, C.; Redko, I.; Matei, B.; Bennani, Y.; Brault, V. Co-clustering through optimal transport. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1955–1964. [Google Scholar]
Del Barrio, E.; Cuesta-Albertos, J.A.; Matrán, C.; Mayo-Íscar, A. Robust clustering tools based on optimal transportation. Stat. Comput. 2019, 29, 139–160. [Google Scholar] [CrossRef]
Chakraborty, S.; Paul, D.; Das, S. Hierarchical clustering with optimal transport. Stat. Probab. Lett. 2020, 163, 108781. [Google Scholar] [CrossRef]
Fajgelbaum, P.D.; Schaal, E. Optimal transport networks in spatial equilibrium. Econometrica 2020, 88, 1411–1452. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Li, Z.; Yoshie, O.; Sun, J. Ota: Optimal transport assignment for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 303–312. [Google Scholar]
Sinkhorn, R. Diagonal equivalence to matrices with prescribed row and column sums. Am. Math. Mon. 1967, 74, 402–405. [Google Scholar] [CrossRef]
Awasthi, P.; Balcan, M.F.; Voevodski, K. Local algorithms for interactive clustering. J. Mach. Learn. Res. 2017, 18, 1–35. [Google Scholar]
Vikram, S.; Dasgupta, S. Interactive bayesian hierarchical clustering. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2081–2090. [Google Scholar]
Ergun, J.C.; Feng, Z.; Silwal, S.; Woodruff, D.P.; Zhou, S. Learning-Augmented k-means Clustering. In Proceedings of the Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. [Google Scholar]
Nguyen, T.D.; Chaturvedi, A.; Nguyen, H.L. Improved Learning-augmented Algorithms for k-means and k-medians Clustering. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Lin, H.; Luo, T.; Woodruff, D. Learning augmented binary search trees. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 13431–13440. [Google Scholar]
Chen, J.; Silwal, S.; Vakilian, A.; Zhang, F. Faster fundamental graph algorithms via learned predictions. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 3583–3602. [Google Scholar]
Mitzenmacher, M.; Vassilvitskii, S. Algorithms with predictions. Commun. ACM 2022, 65, 33–35. [Google Scholar] [CrossRef]
Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
Kapoor, A.; Singhal, A. A comparative study of K-Means, K-Means++ and Fuzzy C-Means clustering algorithms. In Proceedings of the 2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, India, 9–10 February 2017; pp. 1–6. [Google Scholar]
Xie, X.L.; Beni, G. A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 841–847. [Google Scholar] [CrossRef]
Cui, X.; Yan, E.c. A clustering algorithm based on differential evolution for the identification of rock discontinuity sets. Int. J. Rock Mech. Min. Sci. 2020, 126, 104181. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Sun, J. Study on the Existing State and Process Mineralogy of Gold and Tungsten in Xiangxi Gold Deposit. Master’s Thesis, Central South University, Changsha, China, 2013. [Google Scholar]

Figure 1. Sphere representation of a rock discontinuity orientation.

Figure 2. The overall pipeline of II-LA-KM involves several key steps, including data preprocessing, center initialization, the learning-augmented algorithm, and clustering number determination.

Figure 3. Polar clustering diagram of baselines and II-LA-KM on the artificial dataset. While KM and FCM confuse G3 and G4, both our method and KM++ successfully distinguish them.

Figure 4. Polar clustering diagram of baselines and II-LA-KM on the real dataset.

Figure 5. Clustering costs of different error rate settings. The learning-augmented algorithm can effectively correct noisy data, and its error correction capability is strongest when the error rate is between 0.25 and 0.35.

Figure 6. Data distribution of the artificial dataset with group labels. The dataset contains five groups of rock discontinuities, where the third and the fourth groups are overlapping.

Figure 7. Exploration site of the real dataset.

Figure 8. Data distribution of the real dataset using a density estimate. Based on the data distribution, the rock discontinuities in the real dataset can be divided into four groups, with some overlap between the two groups on the left.

Table 1. Distribution statistical parameters of the artificial rock discontinuity dataset. The artificial dataset is simulated with these parameters using the Monte Carlo method.

Group	Bi-Normal						Normal			Uniform		Constant	Num.
	$α$ ( $°$ )			$β$ ( $°$ )			$l$ (m)			$w$ (cm)		$jf$
	$μ$	$σ$	$R$	$μ$	$σ$	$R$	$μ$	$σ$	$R$	$μ$	$R$	$val$
1	135	80	115–164	55	20	47–67	0.5	0.10	0.23–1.21	1.0	0.8–1.18	1.0	70
2	300	120	281-326	30	24	19–41	2.0	0.40	0.63–3.50	1.2	1.11–1.30	1.0	40
3	30	75	11–44	70	40	50-86	3.7	0.45	2.00–5.22	0.5	0.31–0.70	1.0	40
4	45	120	19–73	65	40	56–81	2.5	0.18	1.83–3.19	0.7	0.51–0.79	1.0	30
5	255	40	238–271	40	30	28–51	6.0	0.40	5.26–7.51	0.2	0.15–0.25	1.0	20

Table 2. Clustering results after normalization of baselines and II-LA-KM on the artificial dataset. The results contain clustering centers (

f_{1}

and

f_{2}

) and sizes (

m_{i}

). Avg. denotes the average discontinuity of each group.

Table 2. Clustering results after normalization of baselines and II-LA-KM on the artificial dataset. The results contain clustering centers (

f_{1}

and

f_{2}

) and sizes (

m_{i}

). Avg. denotes the average discontinuity of each group.

Method	Clustering Results $f_{1}, f_{2} (m_{i})$
Method	G1	G2	G3	G4	G5
Avg.	−1.25, −0.42(70)	−0.87, 1.11(40)	1.31, −0.23(40)	0.53, −0.49(30)	2.71, 0.44(20)
KM	−1.08, −0.56(36)	−0.87, 1.11(40)	0.97, −0.34(70)	−1.43, −0.26(34)	2.71, 0.44(20)
FCM	−1.23, −0.43(70)	−0.69, 1.17(25)	0.97, −0.33(70)	−1.13, −0.90(15)	2.68, 0.42(20)
KM++	−1.25, −0.42(70)	−0.87, 1.11(40)	1.41, −0.22(33)	0.58, −0.45(37)	2.71, 0.44(20)
Ours	−1.24, −0.45(70)	−0.89, 1.12(40)	1.37, −0.20(35)	0.55, −0.48(35)	2.70, 0.43(20)

Table 3. Quantitative metrics of baselines and II-LA-KM on the artificial dataset. Metric and Metric denote the best and the average results of 10 runs, respectively. Bold denotes the best performance in all methods. ↑ and ↓ represent that a higher value is better and a lower value is better for the metric, respectively. Our method outperforms the baselines across all metrics.

Method	ACC ↑	XB ↓	SS ↑	ACC↑	XB↓	SS↑
KM	0.830	0.744	0.565	0.815	0.773	0.532
FCM	0.850	0.569	0.615	0.850	0.638	0.590
KM++	0.965	0.197	0.616	0.965	0.197	0.616
Ours	0.975	0.194	0.617	0.973	0.196	0.617

Table 4. Clustering results after normalization of baselines and II-LA-KM on the real dataset. The results contain clustering centers (

f_{1}

and

f_{2}

) and sizes (

m_{i}

).

Table 4. Clustering results after normalization of baselines and II-LA-KM on the real dataset. The results contain clustering centers (

f_{1}

and

f_{2}

) and sizes (

m_{i}

).

Method	Clustering Results $f_{1}, f_{2} (m_{i})$
Method	G1	G2	G3	G4
KM	−0.23, −0.23(79)	2.09, 0.35(32)	−1.18, −0.02(94)	1.95, 0.28(32)
FCM	0.09, −0.54(65)	−1.24, 1.23(55)	−1.19, −0.96(55)	2.04, 0.21(62)
KM++	−0.17, −0.29(75)	−1.25, 1.32(45)	−1.06, −1.02(55)	2.06, 0.29(62)
Ours	0.26, −0.75(43)	−1.06, 0.83(77)	−1.31, −1.08(55)	2.19, 0.09(62)

Table 5. Quantitative metrics of baselines and II-LA-KM on the real dataset. Metric and Metric denote the best and the average results of 10 runs, respectively. Bold denotes the best performance in all methods. ↑ and ↓ represent that a higher value is better and a lower value is better for the metric, respectively. Our method outperforms the baselines across all metrics.

Method	XB ↓	SS ↑	XB↓	SS↑
KM	1.274	0.311	1.284	0.306
FCM	0.263	0.450	0.263	0.448
KM++	0.237	0.467	0.237	0.467
Ours	0.214	0.468	0.217	0.468

Table 6. Clustering results of II-LA-KM on the real dataset. The values are transformed to the scale of the original dataset.

Group	$α$ ( $°$ )	$β$ ( $°$ )	l (m)	w (cm)	$jf$	Num.
1	114.397	32.599	0.612	0.660	0.259	43
2	283.067	48.103	3.016	0.618	0.514	77
3	54.348	69.034	1.459	0.643	0.990	55
4	161.645	45.073	0.213	0.100	0.000	62

Table 7. Quantitative metrics of II-LA-KM on the real dataset when changing the cluster number k. Bold denotes the best performance.

Clustering Number k	XB	SS
2	0.289	0.432
3	0.368	0.404
4	0.214	0.468
5	0.274	0.488
6	0.359	0.446

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Wu, J.; Zhao, G.; Wang, M.; Zhou, X. II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping. Mathematics 2024, 12, 3195. https://doi.org/10.3390/math12203195

AMA Style

Xu Y, Wu J, Zhao G, Wang M, Zhou X. II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping. Mathematics. 2024; 12(20):3195. https://doi.org/10.3390/math12203195

Chicago/Turabian Style

Xu, Yihang, Junxi Wu, Guoyan Zhao, Meng Wang, and Xing Zhou. 2024. "II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping" Mathematics 12, no. 20: 3195. https://doi.org/10.3390/math12203195

APA Style

Xu, Y., Wu, J., Zhao, G., Wang, M., & Zhou, X. (2024). II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping. Mathematics, 12(20), 3195. https://doi.org/10.3390/math12203195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

II-LA-KM: Improved Initialization of a Learning-Augmented Clustering Algorithm for Effective Rock Discontinuity Grouping

Abstract

1. Introduction

2. Methodology

2.1. Preliminaries

2.1.1. Rock Discontinuity Clustering

2.1.2. Optimal Transport

2.1.3. Learning-Augmented Method

2.2. Improved Initialization of Clustering Centers

2.3. Learning-Augmented Refinement

2.4. Overall Algorithm Steps

3. Results

3.1. Experimental Setup

3.2. Artificial Dataset

3.3. Real Dataset

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI