Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity

Che, Yuhang; Cui, Ximin; Wang, Yuanjian; Li, Peixian

doi:10.3390/w17020234

Open AccessArticle

Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity

School of Geoscience and Surveying Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(2), 234; https://doi.org/10.3390/w17020234

Submission received: 20 December 2024 / Revised: 12 January 2025 / Accepted: 14 January 2025 / Published: 16 January 2025

Download

Browse Figures

Versions Notes

Abstract

Predicting the height of the water-conducting fracture zone (WCFZ) is crucial for preventing water inrush and ensuring safe underground mining operations. In this study, we propose a novel model combining CatBoost, XGBoost, and AdaBoost with SSA, HHO, and LEA. Key stratum data (DK, TK) and dip mining intensity data were integrated into the existing parameters for WCFZ height prediction. The main influence angle tangent, derived from the probability integral method, replaces the hard rock ratio coefficient. A total of 104 field datasets with eight input parameters were used, with WCFZ height as the dependent variable. The model was validated using five-fold cross-validation and evaluated with root mean square error (

R M S E

), mean absolute error (

M A E

),

R^{2}

, and mean relative error (

M R E

). The Preference Ranking Organization Method for Enrichment Evaluations (PROMETHEE) was applied to rank the models. The CAT-HHO model demonstrated the best performance. Using this model, predictions of WCFZ height under varying dip mining intensities showed an approximately linear relationship. SHAP analysis identified mining thickness as the most influential factor. Removing key stratum data from models significantly reduced prediction accuracy. The results highlight the model’s ability to improve WCFZ height prediction, offering insights for water inrush prevention in coal mining operations and providing guidance for applying machine learning to similar challenges.

Keywords:

water-conducting fracture zone; machine learning; key stratum; mining intensity; intelligent optimization algorithms

1. Introduction

Coal plays a vital role as a primary energy source in China, accounting for approximately 70% of the country’s total energy production and consumption, and is crucial for ensuring energy security. During the underground coal mining process, as goaves are formed, the overlying rock mass shifts, creating collapse zones, fracture zones, and subsidence zones—collectively referred to as the “three zones.” The interaction between the collapse and fracture zones results in the formation of a water-conducting fracture zone (WCFZ) [1]. The development height of the WCFZ and its spatial relationship with the overlying aquifers are key factors in determining the risk of water inrush into coal mine tunnels [2]. When the WCFZ intersects with the overlying aquifer, water can flow into the goaf, potentially causing roof water inrush and a drop in water levels, both of which pose significant risks to the safety of coal mine operations. In many geological settings, groundwater flows through networks of interconnected pores and fractures, primarily recharged by rainfall and discharged via streams, springs, or wells. However, where a WCFZ exists, these natural pathways can be substantially altered. The WCFZ may form or expand conduits that accelerate groundwater movement, particularly if it bridges previously disconnected aquifers or spans from deeper strata to a surface recharge area. Such shifts in flow can ultimately degrade water quality and undermine the sustainable management of groundwater resources. Methods for predicting the height of a WCFZ include field measurements, theoretical calculations, empirical formulas, similar-material simulation, and numerical simulations. Of these, field measurements provide the most direct and accurate data, using techniques such as water injection leakage testing [3], transient electromagnetic methods [4], borehole viewing [5], microseismic monitoring [6], and electrical methods [7]. While these methods provide accurate field data, their application is often limited by factors such as time constraints, location, and cost.

Theoretical calculations are often based on rock mechanics models, where the height of a WCFZ is predicted by modeling the deformation and failure of rock layers. GUO and Zhu, through rock mechanics, have theoretically modeled the process from “suspended” rock layers to their stress-induced failure [8,9]. These models treat rock masses as being composed of both single-layer and multi-layer rock formations. Complementarily, Wu et al. introduced a thin plate theory-based method to calculate WCFZ height by incorporating the mechanical and structural properties of overlying rock layers [10]. The deformation of the rock mass is primarily governed by the layer with the highest resistance to deformation. Typically, the firm rock layer at the base of the rock mass, known as the key stratum, plays a crucial role in deformation and failure. Studying this key stratum provides valuable insights into the development of a WCFZ. Miao and Wang, through similar material model experiments and theoretical analysis, demonstrated that the distance between the key stratum and the coal seam plays a critical role in controlling the development height of a WCFZ. When this distance exceeds a certain threshold (typically 7–10 times the mining thickness), the key stratum remains intact due to its robust nature, effectively limiting the further development of the WCFZ. Conversely, when the distance is less than this critical value, the failure of the key stratum intensifies the development of the WCFZ, potentially extending it to the top of the bedrock or even to the surface [11,12]. Based on the key stratum theory and the mining intensity of overlying strata, Bai concluded that critical failure occurs in the overlying strata during mining. He analyzed the contact state of the strata after critical failure, noting that the height of the WCFZ does not increase following critical failure. Instead, the degree of rock mass damage intensifies as shear stress increases [13].

Empirical formulas are among the most widely used methods in China. Liu, utilizing field data from over 200 boreholes across 27 mines, developed formulas specific to four rock types: strong, medium-strong, soft, and weathered soft rock, as shown in Table 1 [14].

However, the applicability of the empirical formula is generally limited, as it is only valid for cases where the single mining thickness ranges from 1 to 3 m and the total mining thickness does not exceed 15 m. To enhance the formula’s adaptability, Guo and Li proposed a regression-based empirical formula derived from field data that is applicable to varying mining face lengths and different mining areas [15,16,17].

Numerical simulations and physical similarity experiments have also been extensively used to study the height of WCFZs. These methods typically require validation through field data. By combining these approaches, Teng, Tan, and colleagues developed WCFZ height prediction models for specific mining areas and conditions, including comprehensive mining in subsiding loess regions, gob-side entry-driving mining faces, shallow coal seams, gently inclined extra-thick coal seams, and underwater thick coal seams [5,18,19,20,21]. Zhang et al. utilized the discrete element method to highlight that key strata significantly regulate the formation and propagation of WCFZ by influencing overburden deformation and fracture mechanisms [22]. Li et al. combined numerical simulations with field measurements to effectively simulate the progressive fracturing and layered fracturing behavior of strata, thereby enhancing the accuracy of WCFZ height predictions in the context of super great mining heights [23,24]. Meanwhile, numerical simulations play a pivotal role in hydrogeological research, not only in assessing groundwater quantity and quality but also in offering crucial insights into how a water-conducting fracture zone affects these resources and guiding potential remediation strategies [25,26].

With the growing application of machine learning in the mining field, researchers have developed various models to predict WCFZ height. Li introduced a sensitivity quantification model based on DPS to analyze influencing factors, while Xu proposed a combined optimization model integrating Random Forest, Bagging, XGBoost, and AdaBoost for improved prediction accuracy [27,28]. Gao applied the IRMO algorithm to optimize BP Neural Networks, establishing the IRMO-BP-NN model for Jurassic coal fields [29]. Additionally, Li utilized BOTDR technology and Grey Theory principles to model WCFZ subsidence [30].

Although these ensemble learning algorithms offer excellent predictive performance, they often face challenges in practical applications, such as the high complexity of hyperparameter tuning and the time-consuming nature of the optimization process. To address these issues, this study utilizes three machine learning algorithms—CatBoost [31], AdaBoost [32], and XGBoost [33]—and combines them with three intelligent optimization methods (SSA [34], HHO [35], LEA [36]). Currently used calculation parameters include the hard rock lithology proportion coefficient, mining depth, panel width, mining thickness, and coal seam dip angle. While these parameters can describe production conditions to some extent, they fail to capture the complex occurrence conditions of overlying strata and the specific layout of the mining panel. Relying solely on these parameters risks oversimplification. Therefore, by introducing four new parameters—the distance between the key stratum and the coal seam, thickness of the key stratum, mining intensity, and the tangent of the main influence angle—into the traditional WCFZ dataset, the model’s ability to accurately describe mining site conditions is enhanced, enriching the dataset with more detailed information. Based on this, a prediction model for the height of a WCFZ is established, and the model’s reliability and accuracy are evaluated. The developed model improves the precision of WCFZ height prediction.

2. Factors Affecting the Height of the WCFZ

2.1. Mining Thickness

As the sole parameter used to calculate the height of the WCFZ in empirical formulas, mining thickness has been demonstrated in subsequent studies to play a crucial role in predicting the WCFZ height [14]. Different mining thicknesses result in varying degrees of damage to the overlying rock strata. Greater mining thickness creates larger voids due to mining activity, leading to more severe damage to the overlying rock layers and, consequently, higher development of the WCFZ.

2.2. Mining Depth

As mining depth increases, the self-weight of the overlying rock strata also increases, resulting in higher original stress within the roof. This elevated stress causes more significant roof failure, intensifies the development of joints and fractures, and ultimately leads to an increase in the height of the WCFZ.

2.3. Dip Mining Intensity

During coal seam mining, the mining intensity transitions along the strike direction from subcritical mining to critical mining and eventually to ultra-critical mining. Along the dip, however, the mining intensity typically remains subcritical. Mining intensity is influenced comprehensively by factors such as mining depth, rock properties, and panel dimensions. As mining intensity increases, the height of the WCFZ grows with the advancement of the working face until the coal seam reaches critical mining conditions. At this stage, the WCFZ height reaches its maximum and no longer increases with further mining activity. During this process, The WCFZ forms a characteristic arch-like structure.

Currently, four main methods are used to determine mining intensity: the width-to-depth ratio representation, the loose layer reduction method, the ratio of mining width to bedrock thickness, and the full subsidence angle method [37]. The first three methods do not adequately account for the influence of lithology and mining conditions on mining intensity. To accurately assess mining intensity, these methods require comparisons between calculated coefficients and actual mining and geological conditions.

In contrast, the full subsidence angle method determines the critical mining width based on the full subsidence angles of both the bedrock and the loose layer. Mining intensity is then evaluated as the ratio of the actual mining width to the critical width. This method provides clear geometric relationships and produces relatively accurate and intuitive results. The calculation process is illustrated in Figure 1.

The formula for calculating mining intensity is Formula (1):

M I = \frac{L}{L_{s}} = \frac{s}{2 (s_{s} c o t ψ_{s} + s_{j} c o t ψ_{j})}

(1)

where

s_{s}

represents the thickness of the loose layer,

s_{j}

represents the thickness of the bedrock,

ψ_{s}

represents the full subsidence angle of the loose layer, and

ψ_{j}

represents the full subsidence angle of the bedrock.

2.4. Overlying Rock Properties

The uniaxial compressive strength of the roof rock and the interaction between soft and hard rock layers are critical factors influencing the height of the WCFZ. However, determining the overall strength of the roof is highly challenging. Previous studies have introduced a proportionality coefficient, denoted as

b

to quantify this factor. This coefficient represents the ratio of the cumulative thickness of the hard rock above the coal seam to the height of the mining-affected area. The calculation method is given by Equation (2):

b = \frac{\sum h}{(18 ~ 28) m}

(2)

where

m

represents the mining thickness, and

\sum h

represents the cumulative thickness of hard rock within the mining-affected area.

However, this method provides a relatively general expression of the roof’s strength. Under certain conditions, different combinations of rock layers and hardness levels may yield similar values of

b

. This approach fails to fully account for the complexity of rock layer interactions. Therefore, this study proposes using two parameters—the main influence angle tangent (

t a n β

) and the subsidence coefficient (

q

)—which are commonly employed in surface subsidence prediction methods. These parameters will replace

b

to better express the degree of mining influence. The two parameters provide a more accurate representation of both subsurface and surface damage, as well as mining intensity. Furthermore, they are directly influenced by the geological conditions of the rock layers and more effectively capture the impact of varying geological conditions on rock layer displacement and damage.

2.5. Key Stratum

The key stratum governs the fracturing and movement of mining rock strata, directly influencing the evolution of the WCFZ. As previously mentioned, the hard rock ratio coefficient is derived from a mathematical homogenization approach, which overlooks the controlling role of the key stratum in the fracturing and movement of the overburden. Under specific mining conditions, this simplification can lead to significant discrepancies between the predicted and actual heights of the WCFZ, potentially resulting in water inrush accidents. As the working face advances, the WCFZ develops upward. When it encounters the key stratum, further fracture development is halted until the key stratum fractures and loses stability. At this point, the WCFZ continues to develop upward. In summary, the fracturing of key strata closer to the coal seam may facilitate the further upward development of the WCFZ, while key strata located farther from the coal seam tend to limit its upward development, as shown in Figure 2.

2.6. Dip Angle

Coal seams are typically categorized as horizontal–gentle, moderately inclined, or steeply inclined based on their dip angle. The dip angle plays a crucial role in determining the sliding behavior of fractured strata, which in turn affects the development process, spatial distribution, and maximum height of the WCFZ, as shown in Figure 3. By altering the movement patterns of collapsed overlying strata in the goaf, the dip angle influences the height of the WCFZ. This study focuses on predicting the WCFZ height under horizontal–gentle seam conditions. Given that most coal seams in the dataset have dip angles within 12°, and previous research indicates that the dip angle has a minimal effect compared to other parameters [2,38,39], its influence is excluded from this analysis.

2.7. Faults

Faults have a certain impact on the development and evolution of water-conducting fracture zones in underground coal mining. Mining-induced stress redistribution often triggers fault reactivation, resulting in the generation or expansion of fractures both vertically and laterally. This phenomenon is particularly pronounced when faults are connected to water-bearing strata, as reactivated faults can serve as conduits for groundwater flow and thereby heighten the risk of water inrush. Moreover, the geometry and scale of faults—specifically their dip angle, length, and associated hydrogeological conditions—can substantially affect the fracture zone’s height and lateral extent [40,41]. Due to the complexity and distinctive nature of fault-related impacts, not all mining areas possess faults capable of influencing the water-conducting fracture zone. Therefore, this study does not currently consider the effect of faults.

3. Material

3.1. The Model of WCFZ

Based on the previous analysis, this work selects the following parameters to establish a predictive model for estimating the height (

H

, m) of the WCFZ caused by coal mining: mining depth (

s

, m), mining intensity (

M I

), main influence angle tangent (

t a n β

), mining thickness (

m

, m), mining width (

L

, m), distance from the key stratum to the coal seam (

D K

, m), key stratum thickness (

T K

, m), and mining method (

M T

) (fully mechanized/top-coal caving). The model for the WCFZ is shown in Figure 4. This study utilizes a dataset comprising 104 records of influencing factors and WCFZ heights, gathered from various working faces and boreholes. Given the relatively small sample size, a five-fold cross-validation approach is adopted to ensure robust and reliable results. The dataset is divided into five subsets, with the model iteratively trained on four subsets and validated on the remaining one, ensuring that each data point contributes to both the training and validation processes.

In this study, to enhance the accuracy and reliability of predicting the height of the WCFZ caused by coal seam mining, three machine learning algorithms—AdaBoost, XGBoost, and CatBoost—are combined with three intelligent optimization algorithms—HHO, LEA, and SSA—to form nine distinct ensemble prediction models. These models are designed to leverage the strengths of each algorithm, overcome the limitations of individual models, and ultimately improve prediction accuracy. With the support of intelligent optimization algorithms, optimal solutions within the hyperparameter space can be efficiently identified, minimizing reliance on manual parameter tuning and offering a more time-effective alternative to traditional grid search methods. The optimized ensemble models facilitate more accurate parameter selection, making better use of the available dataset and further enhancing prediction precision. This approach provides a more accurate and efficient solution for predicting the height of a WCFZ induced by coal seam mining. The technical framework of the model is shown in Figure 5.

This study compares nine models with traditional empirical formulas. To evaluate their performance comprehensively and objectively, we select key performance indicators:

R M S E

,

M A E

, coefficient of determination (

R^{2}

), and

M R E

. These indicators offer insights from multiple perspectives, including prediction error, bias, model fit, and data interpretability, aiding in the identification of the most suitable model for predicting the height of WCFZ for this dataset.

3.2. Database

The statistical analysis of the dataset used in this study is presented in Figure 6. For example, considering the mining thickness, the minimum and maximum values are approximately 1 and 14.6, respectively, located at the bottom and top of the VIP. The median is around 6.1. The highest density value occurs at 5.0 (represented by the widest part of the violin plot), while values above 10 m show lower density. This is because, compared to smaller mining thicknesses, larger mining thicknesses tend to have relatively fewer data points.

Figure 7 displays the distribution of each indicator and its relationship with the WCFZ height. The upper section of Figure 7 illustrates the pairwise correlations and their coefficients between the parameters. The number of “*” symbols represents the degree of correlation. For the height of the WCFZ, the pairwise relationships and corresponding correlation coefficients of all feature values in the dataset with the height are also calculated. This analysis helps explore the relationships and relative importance of the various influencing factors. As shown in Figure 6, the height of the WCFZ is most strongly correlated with mining thickness (

m

), which is consistent with previous research and field engineering experience. The second most significant parameter is the distance from the main key stratum to the coal seam (DK), a factor that has been overlooked in prior studies. The lower part of Figure 7 illustrates the distribution of data for each parameter, with different colors representing different mining methods: red for fully mechanized mining and green is for top-coal caving. As shown in Figure 7, the data distribution patterns for the different mining methods differ. Therefore, it is essential to consider the mining method as an input parameter in the model.

4. Methodology

4.1. Machine Learning Algorithms

4.1.1. Adaptive Boosting (AdaBoost)

AdaBoost (Adaptive Boosting), proposed by Freund, is an iterative ensemble learning algorithm. The core concept is to adjust the sample weights to strengthen the learning ability of the model for difficult-to-classify samples, thereby creating a strong classifier.

The objective function is defined as follows:

O b j = \sum_{i = 1}^{n} w_{i} e x p (- y_{i} H (x_{i}))

(3)

where

w_{i}

is the weight of the

i

-th sample,

y_{i}

is the true label,

H (x_{i}) = \sum_{t = 1}^{T} α_{t} h_{t} (x_{i})

represents the ensemble model, and

α_{t}

is the weight of the

t

-th weak classifier.

The training process is as follows:

Initialize the sample weights:

w_{i}^{(1)} = \frac{1}{n}, i = 1,2, \dots, n .

(4)

In each iteration, a weak classifier $h_{t} (x)$ is trained to minimize the weighted classification error.

ϵ_{t} = \sum_{i = 1}^{n} w_{i}^{(t)} I (h_{t} (x_{i}) \neq y_{i})

(5)

Calculate the weight assigned to the weak classifier.

ϵ_{t} = \sum_{i = 1}^{n} w_{i}^{(t)} I (h_{t} (x_{i}) \neq y_{i})

(6)

Update the sample weights in each iteration.

w_{i}^{(t + 1)} = w_{i}^{(t)} e x p (- α_{t} y_{i} h_{t} (x_{i}))

(7)

4.1.2. Extreme Gradient Boosting (XGBoost)

XGBoost is an ensemble learning algorithm based on gradient-boosting decision trees that Chen introduced. As an advanced version of the gradient boosting algorithm, XGBoost combines multiple weak classifiers through weighted aggregation to form a strong classifier, thereby demonstrating exceptional performance in classification, regression, and ranking tasks. TK’s advantage of XGBoost is its ability to optimize computational efficiency and accuracy simultaneously. By applying gradient boosting, XGBoost refines the objective function and constructs new decision trees in each iteration based on the gradient information, which accelerates model convergence and enhances prediction accuracy.

The objective function of XGBoost integrates both the loss and regularization terms and is defined as follows:

O b j (r) = \sum_{i = 1}^{m} L (y_{i}, {\overset{`}{y}}_{i}^{(r)}) + \sum_{k = 1}^{r} Ω (g_{k})

(8)

Ω (g_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

(9)

where

L (y_{i}, {\overset{`}{y}}_{i}^{(r)})

is the loss function, with squared error employed in this work;

Ω (g_{k})

is the regularization term aimed at preventing model overfitting;

T

denotes the number of leaf nodes in the decision tree;

w_{j}

represents the weight of the leaf node; and

γ

and

λ

are the regularization parameters.

4.1.3. Categorical Boosting (CatBoost)

CatBoost is a machine learning algorithm proposed by Prokhorenkova that is specifically optimized for handling categorical features. It effectively addresses the issues of target leakage and overfitting in categorical data processing by introducing unbiased boosting techniques and target statistic encoding. This method was highly effective in mining the data used in this study. By constructing multiple permutations of the data, CatBoost ensures that each data split is independent of future data, thereby minimizing the risk of overfitting.

The objective function of CatBoost, similar to that of other gradient-boosting algorithms, is defined as follows:

O b j (Θ) = \sum_{i = 1}^{n} L (y_{i}, {\overset{`}{y}}_{i}^{(t)}) + \sum_{t = 1}^{T} Ω (f_{t})

(10)

Ω (f_{t}) = γ T_{t} + \frac{1}{2} λ {∥w_{t}∥}^{2}

(11)

CatBoost encodes categorical features using the statistical properties of the target variable, thus avoiding the overfitting that can occur when directly using the target values. The encoded value can be represented as

E n c (x) = \frac{\sum_{j \neq i} y_{j}}{c o u n t (x) - 1} + ϵ

(12)

4.2. Hyperparameter Optimization Algorithms

4.2.1. Sparrow Search Algorithm (SSA)

The SSA is inspired by the flexible role allocation mechanism between scouts and followers in the foraging process. This mechanism allows scouts to explore a wider solution space, whereas followers focus on local optimization. Furthermore, SSA introduces an “alert mode”, simulating the sparrow group’s rapid response to potential threats, which enhances the algorithm’s robustness.

The core mathematical model of SSA is as follows:

x_{i}^{t + 1} = \{\begin{array}{l} x_{i}^{t} \cdot e x p (- \frac{α \cdot t}{T}) + c_{1} \cdot (g_{best} - x_{i}^{t}), & if x_{i} i s s c o u t \\ x_{i}^{t} + c_{2} \cdot (x_{j}^{t} - x_{i}^{t}), & if x_{i} i s f o l l o w e r \end{array}

(13)

where

α

is the parameter that controls the convergence rate,

T

refers to the maximum number of iterations,

c_{1}

and

c_{2}

are weight factors,

g_{best}

indicates the global best position, and

x_{j}^{t}

represents the

j

-th random position.

4.2.2. Harris Hawks Optimization Algorithm (HHO)

HHO is a population-based optimization algorithm inspired by the cooperative hunting behavior of Harris hawks proposed by Heidari et al. The algorithm models the collaborative hunting strategy of hawks and efficiently integrates global exploration with local exploitation by simulating their dynamic behavior as they encircle their prey.

The objective function of HHO is formulated as follows:

O b j (x) = f (prey) + α \cdot (f (prey) - f (best))

(14)

where

p r e y

refers to the current position of the target solution,

b e s t

denotes the current best position, and

α

is a dynamic scaling factor. The hunting behavior of the algorithm was implemented in two distinct modes.

The eagle group explored the prey’s position randomly, with the update rule defined as follows:

x_{new} = x_{current} + r_{1} \cdot (x_{best} - x_{current})

(15)

where

r_{1}

is a random factor.

Local exploitation phase: The hawk population gradually surrounds the prey by using the Lemaitre flight strategy.

x_{new} = x_{current} + S_{H H O} \cdot L_{H H O}

(16)

where

S_{H H O}

is the scaling factor, and

L_{H H O}

is the Lemaitre flight vector.

4.2.3. Lotus Effect Optimization Algorithm (LEA)

The LEA mimics the self-cleaning ability of lotus leaves by simulating their rapid adaptability and efficient search mechanisms in challenging environments, thereby facilitating global optimization. This algorithm is primarily used to solve high-dimensional, multimodal, and complex optimization problems. By combining global and local search strategies, it demonstrates strong global exploration capabilities and precise local optimization abilities, effectively avoiding trapping in local optima. The positions of the solutions are updated based on the following two behaviors:

Global Search Phase

In the global search phase, LEA simulates the movement of water droplets on the lotus leaf surface to explore potential solutions in the solution space. The goal of the algorithm at this stage is to explore the solution space extensively to avoid becoming trapped in the local optima. The update formula is

X_{new} = X_{current} + α \cdot (X_{best} - X_{current}) + β \cdot (X_{random} - X_{current})

(17)

where

X_{current}

is the position of the current solution,

X_{best}

is the position of the global optimum,

X_{random}

is a randomly selected solution, and

α

and

β

are control factors that govern the relative importance of global exploration and local search.

Local Optimization Stage

The local exploitation phase simulates the cleaning ability of the microstructure on the surface of the lotus leaf. As the solution space narrows in the global search phase, the LEA transitions to the local exploitation phase, where the algorithm focuses on optimizing local solutions. The updated formula is as follows:

X_{new} = X_{current} + γ \cdot (X_{best} - X_{current}) \cdot (1 - e x p (- α \cdot t))

(18)

where

γ

is the factor that controls the intensity of local development, affecting the rate of local search.

t

represents the number of iterations, determining the rate of convergence, and

e x p (- α \cdot t)

is an exponential decay function that controls the shrinking of the search range.

4.3. Model Evaluation

This study combined three intelligent algorithms with three hyperparameter optimization algorithms to form nine distinct combined prediction models. The models were trained and tested on the training and test sets, respectively, to evaluate their predictive performance. The models were assessed using the root mean square error (

R M S E

),

M A E

, coefficient of determination

(R^{2}

), and

M R E

as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(H_{i} - {\overset{`}{H}}_{i})}^{2}}

(19)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |H_{i} - {\overset{`}{H}}_{i}|

(20)

M R E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{H_{i} - {\overset{`}{H}}_{i}}{H_{i}}| \times 100 %

(21)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(H_{i} - {\overset{`}{H}}_{i})}^{2}}{\sum_{i = 1}^{n} {(H_{i} - \overset{⃐}{H})}^{2}}

(22)

where

H_{i}

and

{\overset{`}{H}}_{i}

represent the true and predicted values, respectively, and

n

denotes the number of samples.

To comprehensively assess the model, this study employs the Preference Ranking Organization Method for Enrichment Evaluations (PROMETHEE). PROMETHEE is a widely recognized tool in multi-criteria decision analysis that ranks alternatives based on preference relationships among multiple evaluation criteria, providing a holistic assessment of model performance. The use of PROMETHEE helps minimize the bias introduced by relying on a single criterion, offering a scientifically grounded and comprehensive evaluation framework by considering the relative importance of different indicators. In this study, given the broad range of WCFZ height, the relative error (

M R E

) is prioritized over the

M A E

to ensure effective predictions across all intervals. Therefore, the

M R E

is assigned a higher weight (0.4), while the

R^{2}

,

M A E

, and

R M S E

each receive standard weights (0.2).

5. Results and Discussion

The purpose of combining machine learning models with intelligent algorithms is to optimize their hyperparameters. Experiments conducted on this dataset reveal that the hyperparameters that most significantly impact model performance include the subsample, learning rate, max depth, and n_estimators. For AdaBoost, only two hyperparameters are considered: n_estimators and learning rate. The search ranges are provided in Table 2.

The hyperparameter optimization process in this study is guided by a fitness function that integrates multiple evaluation metrics, as discussed in Section 4.3. The fitness function used in this study is defined as follows:

F (z) = 0.4 * {M R E}_{n o r m} + 0.2 * {M A R}_{n o r m} + 0.2 * {R M S E}_{n o r m} - 0.2 {R^{2}}_{n o r m}

(23)

where

F (z)

represents the fitness value, and

{M R E}_{n o r m}

,

{M A R}_{n o r m}

,

{R M S E}_{n o r m}

, and

{R^{2}}_{n o r m}

are the normalized values of

M R E

,

M A E

,

R M S E

, and

R^{2}

, respectively.

Normalization was performed to ensure that all metrics are scaled between 0 and 1, avoiding the influence of different numerical magnitudes on the optimization process. This normalization step enhances the reliability of the optimization results by ensuring that each metric contributes proportionally to the fitness function. The optimization algorithm is designed to minimize

F (z),

thereby identifying the hyperparameter combination that achieves the best trade-off among the evaluation metrics. By reducing the value of

F (z)

, the model achieves lower prediction errors (

M R E

,

M A E

, and

R M S E

) and higher explanatory power (

R^{2}

), ensuring balanced and robust performance.

The optimization process employed an intelligent algorithm and was carried out over 50 iterations. This iteration number was carefully chosen based on several preliminary experiments to achieve a balance between optimization effectiveness and computational efficiency. Conducting too few iterations risked failing to identify the optimal parameters, while excessive iterations would unnecessarily prolong computation time. The algorithm explored the parameter space systematically, refining hyperparameter combinations in each iteration by evaluating their fitness values, ultimately converging towards the optimal solution. The optimized hyperparameters for the machine learning models are presented in Table 3.

To validate the predictive performance of the model, predictions were made for the same 20 randomly selected validation samples after training. Performance metrics, including MRE,

R^{2}

,

M A E

, and

R M S E

, were then calculated and compared to those of traditional empirical formulas. The prediction results and corresponding metrics are shown in Figure 8.

As shown in Figure 8, all nine combined models performed well on this dataset. Overall, the CatBoost-based combined model achieved the best performance (

R^{2}

= 0.9504,

M A E

= 5.4110,

R M S E

= 5.9807,

M R E

= 4.69%), followed by XGBoost, while AdaBoost performed the worst. However, when compared to the empirical formula (

R^{2}

= −0.4147,

M A E

= 44.6268,

R M S E

= 59.4667), the combined models still demonstrated strong performance. This suggests that combining intelligent algorithms with machine learning models and using algorithmic search to replace manual parameter tuning is an effective approach for improving model performance.

To further analyze the predictive performance of the models used in this study, the error comparison between the predictions of each model and the empirical formula is presented in Figure 9. As shown in Figure 9, the empirical formula exhibits significant errors, particularly when the height of the WCFZ exceeds 100 m. In contrast, the combined machine learning models provide more reliable results, with errors generally falling within the range of [−10, 10], except for a few outliers. Notably, even for greater WCFZ heights, the models maintain a relatively high level of accuracy.

To scientifically compare and summarize the predictive performance of each model, the PROMETHEE method was employed to assign scores and rank the models. It is important to note that the scores obtained through the PROMETHEE method are relative to the same evaluation dataset. The model’s quality is determined by its ranking relative to other models—higher scores indicate better performance. The numerical value of the score itself holds no intrinsic meaning. The ranking results are provided in Table 4, where the best values in each column are underlined for clarity and emphasis. Based on these results, CAT-HHO is selected as the final prediction model.

To study the effect of dip mining intensity on the height of the WCFZ, working face 122106 of a certain mine was selected as the research subject. The parameters of this panel are shown in Table 5.

According to Equation (1), the mining intensity is calculated as follows:

M I = \frac{L}{L_{s}} = \frac{s}{2 (s_{s} c o t ψ_{s} + s_{j} c o t ψ_{j})} = \frac{350}{376.3} = 0.93

Using the CAT-HHO model, predictions were conducted for various levels of mining intensity, with the results presented in Figure 10. The findings indicate that, aside from a few outliers, there is an approximately linear relationship between mining intensity and the height of the WCFZ. The corresponding fitted relationship is also depicted in the figure. The results clearly show that as mining intensity increases, the height of the WCFZ increases proportionally. This observation differs from the stepped increase pattern reported in existing literature [42], where the “steps” are primarily attributed to the layered structure of the overlying strata. Notably, the input data for this model do not include parameters reflecting the stratified structure of the overlying strata, an aspect that warrants further exploration in future studies. Nevertheless, the consistent trend of increasing WCFZ height with escalating dip mining intensity is observed. From an engineering perspective, regulating dip mining intensity appears to be a viable strategy for effectively controlling the development height of the WCFZ.

6. Sensitivity Analysis

To analyze the sensitivity of geological and mining factors on the height prediction of the WCFZ, the SHAP method was utilized to evaluate the importance and contribution of each feature. The findings highlight mining thickness as the most influential factor, while key stratum parameters, including the distance from the coal seam (DK) and its thickness (TK), also play critical roles in WCFZ height. For effective water-surge disaster prevention, it is essential to consider not only coal seam thickness but also the relative position of the key stratum. Additionally, optimizing the working face width (L) and regulating mining intensity (MI) can further control the WCFZ height.

It should be noted that the sensitivity of the mining method is extremely low, which is inconsistent with findings from previous studies [1]. This discrepancy arises because, in the present dataset, there is a strong correlation between mining thickness and mining method. Higher mining thicknesses are often associated with the integrated mining method, causing the sensitivity of the mining method to be influenced by mining thickness. As a result, the importance of the mining method is not fully reflected in predicting the height of the hydraulic fracture zone.

As shown in Figure 7 and Figure 11, the second most important factor influencing model performance, after mining thickness, is the distance between the key stratum and the coal seam. This finding underscores the crucial role of key stratum data in the prediction model. To further validate the importance of key stratum data, a series of comparison experiments were conducted by excluding key stratum data from the models. The performance of three algorithms—Ada-SSA, XG-LEA, and CAT-HHO—was evaluated. The results, comparing models with and without key stratum data, are summarized in Table 6. CAT-SSA-1, XG-HHO-1, and ADA-LEA-1 represent models trained without including key stratum data.

It was observed that, after removing TK stratum data, the evaluation metrics of all models decreased to varying extents, further highlighting the significant impact of key stratum data on model accuracy. These results suggest that including key stratum data enhances the model’s ability to effectively utilize input features, thereby improving its representation of the mining site.

7. Conclusions

In this study, machine learning algorithms and intelligent optimization algorithms were combined to develop a WCFZ prediction model that integrates key stratum data and dip mining intensity. The model uses eight input features and outputs the predicted height of the WCFZ. The SSA, LEA, and HHO algorithms were employed to optimize the model’s hyperparameters. Model performance was evaluated using the PROMETHEE method, with

M R E

,

R^{2}

,

M A E

, and

R M S E

as the evaluation metrics. The CAT-HHO model outperformed other models, achieving

R^{2}

= 0.9498,

M A E

= 5.4110,

R M S E

= 6.0564, and

M R E

= 4.80%. This represents a significant improvement over the empirical formula, which had

R^{2}

= -0.4147,

M A E

= 44.4269,

R M S E

= 59.4667, and

M R E

= 28.64%. These results demonstrate the feasibility of the proposed method for predicting the height of the WCFZ. Additionally, the SHAP value method was used to analyze the impact of model parameters on the WCFZ height. Mining thickness (m) was identified as the most critical factor, aligning with field empirical results. The second most influential factor was the distance between the key stratum and the coal seam, a variable previously overlooked in WCFZ height prediction models. To further assess the importance of key stratum data, three models were trained on datasets without key stratum information. The results showed that models trained without key stratum data exhibited reduced reliability, confirming the crucial role of key stratum parameters (DK, TK) in accurately predicting WCFZ height. for the 122106 mining panel, predictions revealed a linear relationship between dip mining intensity and the development height of the WCFZ. These findings highlight the importance of considering key factors such as mining thickness, key stratum data, and dip mining intensity when formulating anti-water-rupture measures. Although this study does not account for certain factors—such as the dip angle of coal seams, fault structures, and the hydrogeological properties of overlying aquifers—further and more detailed investigations are needed to fully elucidate their effects. Nevertheless, the approach outlined here provides a solid conceptual framework for advancing research on the development of the water-conducting fracture zone. Future research should focus on more comprehensive utilization of geological mining data to better describe mining site conditions, increasing data collection from the working face and further expanding the dataset. By integrating advanced machine learning models and optimization algorithms, the applicability and accuracy of hydraulic fracture zone height predictions can be further enhanced.

Author Contributions

Conceptualization, Y.C. and X.C.; methodology, Y.C. and X.C.; data curation, Y.C., P.L. and Y.W.; writing—original draft preparation, Y.C.; writing—review and editing, Y.C., Y.W., P.L. and X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant nos. 52274169 and 52174160); the Ecological-Smart Mines Joint Research Fund of the Natural Science Foundation of Hebei Province (grant no. E2020402086).

Data Availability Statement

Data will be made available on request.

Acknowledgments

A special acknowledgment should be shown to the anonymous reviewers for their constructive and valuable comments. We thank them for taking time from their busy schedule to provide guidance.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wei, X.F.; Wang, J.G.; Ding, Y.J. Progress and development trend of clean and efficient coal utilization technology. Bull. Chin. Acad. Sci. 2019, 34, 409–416. [Google Scholar]
Liu, T. Coal Mine Ground Movement and Strata Failure, 1st ed.; China Coal Industry Publishing House: Beijing, China, 1981; pp. 165–214. [Google Scholar]
Gao, B.; Liu, Y.; Pan, J.; Yuan, T. Detection and analysis of the height of water flowing fractured zone in underwater mining. Chin. J. Rock. Mech. Eng. 2014, 33, 3384–3390. [Google Scholar]
Chang, S.; Yang, Z.; Guo, C.; Ma, Z.; Wu, X. Dynamic monitoring of the water flowing fractured zone during the mining process under a river. Appl. Sci. 2018, 9, 43. [Google Scholar] [CrossRef]
Chang, X.; Wang, M.; Zhu, W.; Fan, J.; Liu, M. Study on height development characteristics of water-conducting fracture zone in fully mechanized mining of shallow thick coal seam under water. Sustainability 2023, 15, 7370. [Google Scholar] [CrossRef]
Luo, B.; Zhang, C.; Zhang, P.; Huo, J.; Liu, S. A Combined Method Utilizing Microseismic and Parallel Electrical Monitoring to Determine the Height of the Water-Conducting Fracture Zone in Shengfu Coal Mine. Water 2024, 16, 3047. [Google Scholar] [CrossRef]
Su, B.Y.; Yue, J.H. Research of the electrical anisotropic characteristics of water-conducting fractured zones in coal seams. Appl. Geophys. 2017, 14, 216–224. [Google Scholar] [CrossRef]
Guo, W.; Zhao, G.; Lou, G.; Wang, S. A new method of predicting the height of the fractured water-conducting zone due to high-intensity longwall coal mining in China. Rock Mech. Rock. Eng. 2019, 52, 2789–2802. [Google Scholar] [CrossRef]
Zhu, T.; Li, W.; Wang, Q.; Hu, Y.; Fan, K.; Du, J. Study on the height of the mining-induced water-conducting fracture zone under the Q2l loess cover of the Jurassic coal seam in northern Shaanxi, China. Mine Water Environ. 2020, 39, 57–67. [Google Scholar] [CrossRef]
Wu, F.; Gao, Z.; Liu, H.; Yu, X.; Gu, H. Theoretical Discrimination Method of Water-Flowing Fractured Zone Development Height Based on Thin Plate Theory. Appl. Sci. 2024, 14, 6284. [Google Scholar] [CrossRef]
Miao, X.; Cui, X.; Xu, J. The height of fractured water-conducting zone in undermined rock strata. Eng. Geol. 2011, 120, 32–39. [Google Scholar] [CrossRef]
Wang, F.; Xu, J.; Chen, S.; Ren, M. Method to predict the height of the water -conducting fractured zone based on bearing structures in the overlying strata. Mine Water Environ. 2019, 38, 767–779. [Google Scholar] [CrossRef]
Bai, E.; Guo, W.; Tan, Y.; Guo, M.; Wen, P.; Liu, Z.; Ma, Z.; Yang, W. Regional division and its criteria of mining fractures based on overburden critical failure. Sustainability 2022, 14, 5161. [Google Scholar] [CrossRef]
State Administration of Work Safety; National Coal Mine Safety Administration; National Energy Administration. Specifications for Coal Pillar Retention and Compressed Coal Min. In Buildings, Water Bodies, Railways, and Main Shafts; China Coal Industry Publishing House: Beijing, China, 2017. [Google Scholar]
Guo, X.; Liu, Y.; Gu, Z. Detection and calculation of the height of the water flowing fractured zone of coal roof in Binchang mining area. J. Min. Strata Control Eng. 2023, 5, 91–100. [Google Scholar]
Li, B.; Wu, H.; Li, T. Height prediction of water-conducting fractured zone under fully mechanized mining based on weighted multivariate nonlinear regression. J. Min. Saf. Eng. 2022, 39, 536. [Google Scholar]
Feng, D.; Hou, E.; Xie, X.; Hou, P. Research on Water-conducting Fractured Zone height under the condition of large mining height in Yushen mining area, China. Lithosphere 2023, 2023, 8918348. [Google Scholar] [CrossRef]
Teng, Y.; Yi, S.; Zhu, W.; Jing, S. Development patterns of fractured water-conducting zones under fully mechanized mining in wet-collapsible loess areas. Water 2022, 15, 22. [Google Scholar] [CrossRef]
Tan, Y.; Xu, H.; Yan, W.; Guo, W.; Sun, Q.; Yin, D.; Zhang, Y.; Zhang, X.; Jing, X.; Li, X.; et al. Development law of water-conducting fracture zone in the fully mechanized caving face of gob-side entry driving: A case study. Minerals 2022, 12, 557. [Google Scholar] [CrossRef]
Cao, J.; Su, H.; Wang, C.; Li, J. Research on the evolution and height prediction of WCFZ in shallow close coal seams mining. Geomech. Geophys. Geo-Energy Geo-Resour. 2023, 9, 128. [Google Scholar] [CrossRef]
Zhou, Y.; Yu, X. Study of the evolution of Water-Conducting Fracture Zone in overlying rock of a fully mechanized caving face in gently inclined extra-thick coal seams. Appl. Sci. 2022, 12, 9057. [Google Scholar] [CrossRef]
Zhang, X.; Hu, B.; Zou, J.; Liu, C.; Ji, Y. Quantitative characterization of overburden rock development pattern in the goaf at different key stratum locations based on DEM. Adv. Civ. Eng. 2021, 1, 8011350. [Google Scholar] [CrossRef]
Li, M.; Zhang, J.; Huang, Y.; Gao, R. Measurement and numerical analysis of influence of key stratum breakage on mine pressure in top-coal caving face with super great mining height. J. Cent. South Univ. 2017, 24, 1881–1888. [Google Scholar] [CrossRef]
Li, J.; Huang, Y.; Zhang, J.; Li, M.; Qiao, M.; Wang, F. The influences of key strata compound breakage on the overlying strata movement and strata pressure behavior in fully mechanized caving mining of shallow and extremely thick seams: A case study. Adv. Civ. Eng. 2019, 1, 5929635. [Google Scholar] [CrossRef]
Song, Z.; Wang, Y.; Wang, J.; Huan, H.; Li, H. Design of Pump-and-Treat Strategies for Contaminated Groundwater Remediation Using Numerical Modeling: A Case Study. Water 2024, 16, 3665. [Google Scholar] [CrossRef]
Alberti, L.; Antelmi, M.; Oberto, G.; La Licata, I.; Mazzon, P. Evaluation of fresh groundwater Lens Volume and its possible use in Nauru island. Water 2022, 14, 3201. [Google Scholar] [CrossRef]
Li, X.; Li, Q.; Xu, X.; Zhao, Y.; Li, P. Multiple Influence Factor Sensitivity Analysis and Height Prediction of Water-conducting fracture zone. Geofluids 2021, 2021, 8825906. [Google Scholar] [CrossRef]
Xu, C.; Zhou, K.; Xiong, X.; Gao, F.; Zhou, J. Research on height prediction of water-conducting fracture zone in coal mining based on intelligent algorithm combined with extreme boosting machine. Expert Syst. Appl. 2024, 249, 123669. [Google Scholar] [CrossRef]
Gao, Z.; Jin, L.; Liu, P.; Wei, J. Height Prediction of Water-Conducting Fracture Zone in Jurassic Coalfield of Ordos Basin Based on Improved Radial Movement Optimization Algorithm Back-Propagation Neural Network. Mathematics 2024, 12, 1602. [Google Scholar] [CrossRef]
Li, J.; He, Z.; Piao, C.; Chi, W.; Lu, Y. Research on Subsidence Prediction Method of Water-Conducting Fracture Zone of Overlying Strata in Coal Mine Based on Grey Theory Model. Water 2023, 15, 4177. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Neural Information Processing Systems 31 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–137. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Dalirinia, E.; Jalali, M.; Yaghoobi, M.; Tabatabaee, H. Lotus effect optimization algorithm (LEA): A lotus nature-inspired algorithm for engineering design optimization. J. Supercomput. 2024, 80, 761–799. [Google Scholar] [CrossRef]
Fang, J.; Huang, H.; Jin, T.; Bai, J.B. Movement rules of overlying strata around longwall mining in thin bedrock with thick surface soil. Chin. J. Rock. Mech. Eng. 2008, 27, 2700–2706. [Google Scholar]
Liu, Q.; Liang, Z.; Zi, J. A SMOGN-based MPSO-BP model to predict the height of a hydraulically conductive fracture zone. Coal Geol. Explor. 2024, 52, 72–85. [Google Scholar]
Guo, W.; Lou, G. Definition and distinguishing method of critical mining degree of overburden failure. China Coal Soc. 2019, 44, 755–766. [Google Scholar]
Cao, Z.; Yang, X.; Li, Z.; Du, F. Evolution mechanism of water-conducting fractures in overburden under the influence of water-rich fault in underground coal mining. Sci. Rep. 2024, 14, 5081. [Google Scholar]
Wang, X.; Zhu, S.; Yu, H.; Liu, Y. Comprehensive analysis control effect of faults on the height of fractured water-conducting zone in longwall mining. Nat. Hazards 2021, 108, 2143–2165. [Google Scholar] [CrossRef]
Du, F.; Gao, R. Development Patterns of Fractured Water-Conducting Zones in Longwall Mining of Thick Coal Seams—A Case Study on Safe Mining Under the Zhuozhang River. Energies 2017, 10, 11. [Google Scholar] [CrossRef]

Figure 1. Illustration of the calculation for the critical width of critical mining.

Figure 2. The inhibition of the key stratum affects the development of the WCFZ [11].

Figure 3. The profile of the fractured water-conducting zone (Liu, 1981): (1) water-conducting failure zone; (2) caving zone [2].

Figure 4. The model of the WCFZ.

Figure 5. Technical framework of the prediction model research.

Figure 6. The data distribution of the WCFZ database.

Figure 7. The correlation analysis of various variables. (The number of ‘*, **, ***’ symbols represents the degree of correlation).

Figure 8. Prediction performance of the models and the empirical formula.

Figure 9. Error comparison of the models and the empirical model.

Figure 10. The relationship between the height of the WCFZ and mining intensity.

Figure 11. Importance map of global features.

Table 1. Formula for calculating the height of WCFZ in layered mining of thick coal seams.

Rock Type	USC/MPa	Water Conductivity	Representative Rock	Calculation Formula (liu) [2]	Calculation Formula (1)
Hard and strong	>40	High or good	Quartz sandstone, limestone, sandy shale, conglomerate	$H = \frac{100 \sum M}{1.2 \sum M + 2.0} \pm 8.9$	$H = 30 \sqrt{Σ M} + 10$
Medium hard	20–40	Medium or worse	Sandstone, argillaceous limestone, sandy shale, shale	$H = \frac{100 \sum M}{1.6 \sum M + 3.6} \pm 5.6$	$H = 20 \sqrt{Σ M} + 10$
Soft and weak	10–20	Low or bad	Mudstone, argillaceous sandstone	$H = \frac{100 \sum M}{3.1 \sum M + 5.0} \pm 4.0$	$H = 0 \sqrt{\sum M} + 5$
Weathered soft and weak	<10	Low or bad	Bauxitic rock, weathered mudstone, clay, sandy clay	$H = \frac{100 \sum M}{5 \sum M + 8.0} \pm 3.0$	-

where

\sum M

represents the cumulative mining thickness.

Table 2. Hyperparameter search space.

Hyperparameters	Min	Max
n_estimators	100	1000
learning_rate	0.01	0.2
max_depth	3	10
subsample	0.5	1

Table 3. The optimized hyperparameters of each model.

Model	learning_rate	n_estimators	subsample	max_depth
CAT-HHO	0.1817	275	0.6744	8
CAT-SSA	0.1055	354	0.4950	5
CAT-LEA	0.0963	199	0.6472	9
XG-LEA	0.1391	285	0.5290	3
XG-HHO	0.0450	458	0.6382	4
XG-SSA	0.1133	141	0.5474	7
ADA-LEA	0.0277	588
ADA-SSA	0.1209	367
ADA-HHO	0.1770	120

Table 4. Evaluation scores and rankings of different prediction models.

Model	$R^{2}$	$M A E$	$R M S E$	$M R E$	Score	Rank
CAT-HHO	0.9498	5.4110	6.0564	4.80%	1.2251	1
CAT-SSA	0.9504	5.5320	5.9807	5.01%	1.2065	2
CAT-LEA	0.9469	5.6284	6.7283	4.69%	1.1894	3
XG-LEA	0.9352	6.2956	7.1946	5.62%	1.0373	4
XG-HHO	0.9309	6.2587	7.7799	5.27%	1.023	5
ADA-LEA	0.9386	7.0475	8.8222	6.55%	0.9428	6
XG-SSA	0.9229	7.1573	8.8989	6.01%	0.8494	7
ADA-SSA	0.9242	7.8036	9.2554	6.64%	0.7901	8
ADA-HHO	0.9112	8.7173	11.0537	7.79%	0.4976	9
Empirical formula	−0.4147	44.4269	59.4667	28.64%	−8.7611	10

Table 5. Mining technical parameters of working face 122106.

m/m	s	s_l	Fully Subsidence Angle/°	DK/m	TK/m	tanβ	L/m
6	325	45	40.2°	215	13.25	4.35	350

Table 6. Evaluation scores and rankings of prediction models with and without key stratum data.

Model	$R^{2}$	$M A E$	$R M S E$	$M R E$	score
CAT-SSA	0.950426	5.532034	5.980704	5.01%	2.2688
XG-HHO	0.930867	6.258662	7.779931	5.27%	0.9759
ADA-LEA	0.938585	7.04749	8.822191	5.55%	0.4534
CAT-SSA-1	0.927963	7.778905	8.135209	8.01%	−0.0167
XG-HHO-1	0.922523	6.83779	9.399699	10.27%	−0.8649
ADA-LEA-1	0.898977	9.432286	11.31085	8.55%	−1.1664

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Che, Y.; Cui, X.; Wang, Y.; Li, P. Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity. Water 2025, 17, 234. https://doi.org/10.3390/w17020234

AMA Style

Che Y, Cui X, Wang Y, Li P. Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity. Water. 2025; 17(2):234. https://doi.org/10.3390/w17020234

Chicago/Turabian Style

Che, Yuhang, Ximin Cui, Yuanjian Wang, and Peixian Li. 2025. "Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity" Water 17, no. 2: 234. https://doi.org/10.3390/w17020234

APA Style

Che, Y., Cui, X., Wang, Y., & Li, P. (2025). Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity. Water, 17(2), 234. https://doi.org/10.3390/w17020234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Model for Predicting the Height of the Water-Conducting Fracture Zone Considering the Influence of Key Stratum and Dip Mining Intensity

Abstract

1. Introduction

2. Factors Affecting the Height of the WCFZ

2.1. Mining Thickness

2.2. Mining Depth

2.3. Dip Mining Intensity

2.4. Overlying Rock Properties

2.5. Key Stratum

2.6. Dip Angle

2.7. Faults

3. Material

3.1. The Model of WCFZ

3.2. Database

4. Methodology

4.1. Machine Learning Algorithms

4.1.1. Adaptive Boosting (AdaBoost)

4.1.2. Extreme Gradient Boosting (XGBoost)

4.1.3. Categorical Boosting (CatBoost)

4.2. Hyperparameter Optimization Algorithms

4.2.1. Sparrow Search Algorithm (SSA)

4.2.2. Harris Hawks Optimization Algorithm (HHO)

4.2.3. Lotus Effect Optimization Algorithm (LEA)

4.3. Model Evaluation

5. Results and Discussion

6. Sensitivity Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI