Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye

Bilgilioğlu, Süleyman Sefa; Gezgin, Cemil; Iban, Muzaffer Can; Bilgilioğlu, Hacer; Gündüz, Halil Ibrahim; Arslan, Şükrü

doi:10.3390/app15063139

Open AccessArticle

Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye

by

Süleyman Sefa Bilgilioğlu

^1,*

,

Cemil Gezgin

¹

,

Muzaffer Can Iban

²

,

Hacer Bilgilioğlu

³

,

Halil Ibrahim Gündüz

¹

and

Şükrü Arslan

⁴

¹

Department of Geomatics Engineering, Aksaray University, Aksaray 68100, Türkiye

²

Department of Geomatics Engineering, Mersin University, Mersin 33343, Türkiye

³

Department of Geological Engineering, Aksaray University, Aksaray 68100, Türkiye

⁴

Ministry of Interior, Disaster and Emergency Management Presidency, Konya Provincial Directorate of Disaster and Emergency, Konya, 42100, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 3139; https://doi.org/10.3390/app15063139

Submission received: 12 February 2025 / Revised: 7 March 2025 / Accepted: 12 March 2025 / Published: 13 March 2025

Download

Browse Figures

Versions Notes

Abstract

Sinkholes, naturally occurring formations in karst regions, represent a significant environmental hazard, threatening infrastructure, agricultural lands, and human safety. In recent years, machine learning (ML) techniques have been extensively employed for sinkhole susceptibility mapping (SSM). However, the lack of explainability inherent in these methods remains a critical issue for decision-makers. In this study, sinkhole susceptibility in the Konya Closed Basin was mapped using an interpretable machine learning model based on SHapley Additive exPlanations (SHAP). The Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) algorithms were employed, and the interpretability of the model results was enhanced through SHAP analysis. Among the compared models, the RF model demonstrated the highest performance, achieving an accuracy of 95.5% and an AUC score of 98.8%, and was consequently selected for the development of the final susceptibility map. SHAP analyses revealed that factors such as proximity to fault lines, mean annual precipitation, and bicarbonate concentration difference are the most significant variables influencing sinkhole formation. Additionally, specific threshold values were quantified, and the critical effects of these contributing factors were analyzed in detail. This study underscores the importance of employing eXplainable Artificial Intelligence (XAI) techniques in natural hazard modeling, using SSM as an example, thereby providing decision-makers with a more reliable and comparable risk assessment.

Keywords:

sinkhole susceptibility mapping; explainable artificial intelligence; SHAP; random forest; Konya; karstic hazards

1. Introduction

Sinkholes, a commonly used term to express collapse dolines, are natural phenomena with sizes and depths ranging from centimeters to meters, occurring in karstic regions containing carbonate and evaporitic rocks that cover approximately 20% of the Earth’s surface [1]. Sinkhole formations occur under the control of many topographic, geological–tectonic, environmental–anthropogenic, hydrogeological, and climatic (meteorological) factors. These determining factors can be exemplified as the surface and groundwater level that enable dissolution with soluble carbonate and evaporitic karst rocks, the flow direction and hydro-chemical properties of water, precipitation, evaporation, stratification that facilitates the movement of water in rocks, porosity and permeability, and cracks and fractures [1,2]. Sinkholes, where anthropogenic effects may trigger their formation, pose a serious risk to residential areas and human life, as well as agricultural areas, energy investment areas, energy production areas, transmission lines, pipelines, and transportation networks.

The spatial distribution of conditioning factors that cause the initiation and development of subsidence processes on the surface has an important role in understanding the formation mechanism, determining the areas suitable for sinkhole formation, assessing the relevant spatial hazard and preventing potential damage, as well as in better managing land use activities in areas prone to sinkhole formation. Susceptibility maps show the relative probability of sinkhole formation at any location in the future, while hazard maps show the probability of sinkhole formation in each hazardous area [3,4]. Therefore, in order to accurately predict the location of possible sinkholes, it is necessary to determine the conditioning factors that may cause sinkhole formation and to collect data on these factors. Sinkhole susceptibility studies aim to evaluate the relationship between existing sinkhole formations and conditioning factors, determine the effects of these factors in sinkhole formation, and map the spatial distribution of potentially susceptible areas [5]. However, due to the influence of many distinct factors, it is challenging to model sinkhole formations in temporal and spatial dimensions [6,7,8]. For this reason, the modeling of sinkhole susceptibility maps is a complex process, as in other types of natural phenomena, and performing these operations with classical methods can be a very time-consuming and puzzling task for the decision-makers. Geographic Information System (GIS), an advantageous tool in collecting and analyzing spatial data, has been used effectively in solving these and many similar spatial problems for many years. Especially in recent years, various methods have been developed to produce sinkhole susceptibility maps.

Models developed for SSM can be broadly classified into qualitative, semi-quantitative, and quantitative methods. Qualitative methods are based on expert judgment and field observations; however, due to their inherent subjectivity, they can produce varying results when applied by different researchers [9,10]. Although these methods are particularly useful in data-scarce regions, they become limited in terms of consistency and repeatability. Semi-quantitative methods integrate both qualitative and quantitative approaches. One of the most frequently used methods in this category is the Analytic Hierarchy Process (AHP). AHP is considered a semi-quantitative method because it quantifies the weighting of factors based on expert judgment [11,12,13,14]. Additionally, other multi-criteria decision-making methods, such as the Best–Worst Model (BWM), can also be classified within this category [15]. Quantitative methods encompass deterministic, statistical, and ML models. Deterministic models attempt to analyze sinkhole formation based on hydrogeological and geotechnical data within the framework of physical and mechanical principles. However, these methods’ practical application is limited due to the high demand for detailed field data and their restricted scalability over large areas [9,16]. Statistical methods, on the other hand, investigate the relationships between sinkhole inventories and conditioning factors using data-driven approaches usually. Techniques such as Frequency Ratio (FR) [17,18,19], Logistic Regression (LR) [16,20,21], Linear Discriminant Analysis (LDA) [22], and Weight of Evidence (WoE) [6,23] have been widely employed in generating sinkhole susceptibility maps. Nevertheless, these approaches generally assume linear relationships and are insufficient for modeling the complex interactions among factors. Natural hazard processes, especially geomorphological phenomena such as sinkhole formation, are characterized by complex and nonlinear dynamics. Consequently, traditional statistical methods often fail to adequately represent the interactions among the complicated geological, hydrogeological, and environmental factors involved [22]. Recently, machine learning (ML) models have proven to be a powerful alternative, addressing the shortcomings of traditional methods by effectively modeling nonlinear relationships, managing large datasets, and providing improved accuracy. Techniques such as Random Forest (RF) [7,17,24], Maximum Entropy (MaxEnt) model [25,26], Support Vector Machines (SVMs) [27], Artificial Neural Network (ANN) [28], and Navie Bayes (NB) [16] have been applied to model nonlinear processes and generate more reliable and scalable sinkhole susceptibility maps.

Facing a substantial obstacle, managers and planners grapple with issues related to interpretability, explainability, transparency, scientific consistency, and integration of knowledge into the ML approach while implementing ML algorithms in real-world situations [29]. These models are frequently perceived as opaque, signifying a dearth of understanding regarding their decision-making mechanisms, which result from their training on vast datasets and the utilization of intricate algorithms [30]. Studies suggest that ML methods are the primary choice for researchers and decision-makers when it comes to evaluating and mapping the sinkhole susceptibility. Furthermore, newer ML algorithms demonstrate superior computational efficiency without encountering the limitations observed in models generated by traditional statistical approaches. These advancements mitigate challenges related to prediction accuracy and utility, addressing concerns such as data quantity and quality and the inherent spatial–temporal patterns of the phenomenon and conditioning factors [16]. Nevertheless, underscoring the significance of comprehending and deciphering model results is crucial. The primary goal of XAI is to tackle the problem of non-transparent artificial intelligence (AI) systems by creating models that are easy to understand and interpret. XAI focuses on designing interactive visualization tools and frameworks that enable users to analyze and engage with AI systems, thereby clarifying how decisions are generated. The advantages of XAI are extensive, as it promotes confidence in AI technologies, ensures responsible and fair application, and helps detect and mitigate embedded biases. Implementing transparent methodologies, such as SHapley Additive exPlanations (SHAP), improves the reliability of ML models in critical applications by offering intuitive explanations of their results, thereby increasing accountability and user trust [31,32,33].

This study introduces an innovative approach by implementing an explainable ML method to map sinkhole susceptibility in the Konya Closed Basin, Türkiye. The main objective involves clarifying the contributions of topographic, geological–tectonic, hydrogeological, climatic, and environmental–anthropogenic factors to the susceptibility model. It includes assessing their respective significance, as well as grasping the fundamental rationale behind particular decisions. This paper aims to clarify how ML methods produce specific results in predicting sinkhole occurrences, a novel endeavor in the sinkhole literature. The research also aims to analyze model results using diverse SHAP plots. In summary, key objectives include (i) developing a spatial ML framework for mapping sinkhole susceptibility; (ii) analyzing the relationship between the model and conditioning factors through SHAP outputs; and (iii) investigating spatial variation in model results for sinkhole susceptibility prediction in the study area.

This study introduces an innovative approach by implementing an explainable ML framework to map sinkhole susceptibility in the Konya Closed Basin, Türkiye. The primary objectives include clarifying the contributions of topographic, geological–tectonic, hydrogeological, climatic, and environmental–anthropogenic factors to the susceptibility model, assessing their significance, and elucidating the decision-making rationale behind ML predictions—a novel contribution to sinkhole literature. To achieve these goals, the research employs ensemble ML algorithms, including Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), to develop a Sinkhole Susceptibility Map (SSM). These algorithms are selected for their ability to prioritize critical factors and ensure accuracy in SSM development [34], while their inherent explainability via SHapley Additive exPlanations (SHAP) enhances model transparency [35,36]. After training and evaluating the algorithms, the final SSM is generated using the top-performing model. The choice of ensemble methods is further justified by their robustness against noise, mitigation of overfitting, and improved model stability, which collectively enhance the reliability of susceptibility assessments [37]. By analyzing SHAP outputs, this study quantifies spatial variations in model results and investigates the interplay between conditioning factors and susceptibility predictions, addressing both the explainability and practical applicability of the framework.

This study makes a dual contribution to the existing body of literature. Firstly, while RF has been widely utilized in natural hazard susceptibility studies, its application in SSM domain remains limited [17]. Conversely, the performance of newer decision-tree-based ensemble algorithms, namely XGBoost and LightGBM, has not been explored for SSM generation to the best of our knowledge. The primary aim is to comparatively evaluate the predictive performance of these three ensemble algorithms and discern their responsiveness in predicting sinkhole occurrences. Secondly, this research pioneers the integration of machine learning-based SSM generation with XAI, specifically the SHAP technique. SHAP enables analysts to understand the factors contributing more to sinkhole formation predictions, offering insights into how these factors influence single predictions [38]. This study employs a large geo-data frame with 34 independent variables adopted from previous studies and collected from 567 sinkhole locations. By bridging the black box gap, SHAP provides informative statistics and plots that reveal, among other things, the positive or negative contribution of each input factor, interaction effects in the model, and the impact of varying feature values on outcomes, including potential threshold effects. Identifying threshold values is crucial for environmental management, offering insights into how the impact of each factor changes beyond specific distance, density, or amount [39]. In short, through the application of decision tree-based ensemble ML models and SHAP, this study advances the understanding of sinkholes by providing comprehensive insights into the natural environment factors and their values that contribute more to sinkhole formation.

Numerous saucepan-shaped and cylindrical sinkholes, numbering in the hundreds, have been documented and identified within the Konya Closed Basin. These formations exhibit dimensions spanning from a few meters to tens of meters in width [40]. Given the increasing threat of sinkholes in the Konya Closed Basin, there is a pressing need to focus on Turkey’s sinkhole risk management strategy. Specifically, the escalation of agricultural endeavors leads to a swift decline in groundwater levels due to extensive irrigation pumping. With over 50,000 active water wells in the area, the risk factors for sinkholes in Konya are heightened. Given the region’s significance for solar energy, intensive agricultural practices, and upcoming power plant projects, it becomes crucial to assess the probability of sinkholes and explore methods for prediction and prevention [41]. Hence, it is crucial to establish a reliable and efficient framework for SSMs, exemplified by an explainable ML model. This model plays a vital role in helping decision-makers comprehend model outputs, pinpoint significant parameters, and adeptly mitigate the risk of sinkhole incidents.

2. Study Area and Data

The Konya Closed Basin is one of the largest inland water basins in Turkey and hosts a significant groundwater reservoir. Intensive agricultural activities, industrialization, and population growth in the basin have increased the demand for groundwater, leading to an average decline of 40 m in the groundwater level over the past 40 years due to severe extraction. This situation has intensified the degradation of karst-soluble rocks in the region, leading to widespread sinkhole formation. Sinkholes generally develop within Miocene–Pliocene lacustrine limestones, which are derived from older Triassic, Jurassic, and Cretaceous carbonate rocks and are currently connected to subsurface karst formations that remain covered at the surface [41,42,43,44]. Hydrogeological studies indicate that discontinuities between these systems play a conductive role in groundwater movement and accelerate the karstification process. Especially along the Sultaniye Plain, located north of Karapınar, recent sinkhole collapses have intensified due to the decline in groundwater levels, rendering this area one of the critical zones in terms of risk. In recent years, a serious increase in the number of new sinkhole formations has been noticed within the boundaries of Konya Province [45]. As a result of the decline in groundwater levels, sinkholes that once occurred at higher elevations are now shifting to lower-lying areas. This shift poses a threat to rural settlements, agricultural lands, major transportation routes, and energy infrastructure. Since the 2000s, there has been a significant increase in the frequency of sinkhole formations. By 2017, 299 sinkholes had been recorded, and since 2018, more than 20 new sinkholes have occurred annually and as of 2022, a total of more than 610 active sinkholes have been observed in the basin [46].

Under the initiatives of the Disaster and Emergency Management Authority (AFAD) and the Konya Provincial Directorate, supported by the Konya Plain Project (KOP) Administration and carried out by the Geological Surveys Department of the General Directorate of Mineral Research and Exploration (MTA), areas prone to sinkhole formation have been identified in the region, taking into account the lithological units and geotechnical characteristics of the provinces of Konya, Karaman, Aksaray, and Niğde. In this project, regions where sinkhole development is possible or not have been identified based on geological and hydrogeological data [47]. In this study, the area designated as “sinkhole-prone” within the scope of the project mentioned above and located within the boundaries of Konya Province has been selected as the study area and covers a field of 16,325.88 km² (Figure 1).

The study area is formed by the Late Miocene–Pliocene aged Insuyu Formation and the Quaternary-Holocene Hotamış Formation. The Insuyu Formation consists of limestone, dolomitic limestone, clayey limestone, and marls, and is highly susceptible to karstic processes. In addition, the normal faults developed during the Neotectonic period in the region increase the brittle nature of the rocks, accelerating karstic processes and groundwater movement. In the Quaternary-Holocene Hotamış Formation, there is a transition from coarse clastics near the coast to finer-grained siliceous, clayey, and carbonaceous layers towards deeper areas. This formation, containing evaporitic rocks in its upper layers, leads to the formation of collapse sinkholes due to the presence of soluble rocks that react with water. Also, in some areas, subsidences occurring in the underlying Insuyu Formation also affect the overlying Hotamış Formation, leading to surface deformations [47].

2.1. Sinkhole Inventory Map

A total of 597 sinkholes were identified in the study area by both investigating historical data records and field studies, and these point-type data were transferred to the GIS and a sinkhole inventory map was created. In addition, 597 non-sinkhole samples were randomly selected to cover the whole region. In the sinkhole susceptibility map prepared in GIS, “1” values are given for sinkhole points, and “0” values are given for non-sinkhole points in 1194 inventory data. In addition, 70% of both sinkhole and non-sinkhole samples were randomly selected as training and 30% as test.

2.2. Conditioning Factors

Many conditioning factors that affect each other and are difficult to predict are effective in the formation and development of sinkholes [8,13]. According to karstification mechanism studies, geological conditions, topographic and geomorphological factors, hydrological factors, and anthropogenic activities can be given as examples of the main driving factors affecting sinkhole formations [7]. In sinkhole susceptibility modeling, it is important to identify and create a geographic database of sinkhole conditioning factors. However, there is no common view for the grouping of factors that cause sinkholes, and these factors may differ depending on the geological, lithological, and hydrogeological characteristics of the study area [48]. The selection of factors contributing to sinkhole formation in susceptibility studies is based on a comprehensive literature review and knowledge and experience of previous studies conducted in the region. The conditioning factors selected within the scope of this study are divided into five classes: topographical, meteorological, geological and tectonics, environmental and anthropological, and hydrogeological factors (Table 1). The coordinates of all data from different sources were converted to raster data format after being scaled in the Universal Transverse Mercator (UTM) projection system (36 N), and the data were resampled with pixel dimensions of 20 m × 20 m.

2.2.1. Geological and Tectonic Factors

The geological and tectonic factors that caused the formation of sinkholes were chosen as proximity to faults (PTF) and the lithology of the study area (LTG), and these factors were evaluated in the same group within the scope of this study. As it is known, sinkholes are formed as a result of the erosion of lithological units containing permeable and soluble rocks such as limestone, dolomite, and travertine, due to the penetration of water [49,50]. Lithology, which examines the composition, structural features, origin, and distribution of rocks in the earth’s crust, is seen as the main factor of sinkhole formations because it contains information about the properties of underground rocks, such as dissolution potential, porosity, and permeability [21,25]. As mentioned above, a high density of faults and cracks in carbonate rocks can create permeability conditions suitable for the formation of sinkholes by causing the formation of cavities for the movement of groundwater. Therefore, areas close to fault zones are considered favorable regions for sinkhole formation [44,51]. Lithology and fault data were digitized from 1/25,000 scale geological maps and revised by field studies (Figure 2).

2.2.2. Hydrogeological Factors

Hydrogeological factors, such as surface and groundwater level, movement and strength of the water in the region, and hydro-chemical and physical properties of groundwater, have a significant impact on the formation and development of sinkholes. Within the scope of this study, 14 factors related to groundwater level change and geochemical properties of groundwater, namely bicarbonate (DBC), carbonate (DCR), dissolved CO₂ (DDC), dissolved O₂ (DDO), conductivity (DCT), calcium (DCA), chloride (DCH), magnesium (DMG), pH (DPH), potassium (DPT), sodium (DSD), sulfate (DSU), and total ion exchange (DTI) were used. The formation and development of sinkholes related to the dissolution of rocks, surface or groundwater movement, and other geo-environmental conditions [52]. Monitoring the groundwater level and its chemical components are essential parameters for determining potential sinkhole areas, as it contains information about the dissolution rate of rocks and the location of aquifer compression areas [44]. Regarding hydrogeological factors that are mentioned above, geochemical analysis was carried out on water samples from both the wet and dry spells from 519 water wells distributed homogeneously in the study area; also, groundwater levels were measured in both spells. Surfaces for each parameter for both spells were created with the IDW interpolation method and the difference between the dry and wet spell was determined by subtracting the surfaces from each other. The hydrogeological maps obtained as a result of these processes are given in Figure 3.

2.2.3. Topographical Factors

Within the scope of this study, topographic factors that indirectly or directly affect the formation of sinkholes were determined as follows: proximity to drainage (PDR), slope (SLP), aspect (ASP), curvature (CRV), plan (PLC) and profile curvature (PRC), elevation (ELV), stream power index (SPI), and topographic wetness index (TWI). Proximity to drainage, slope, plan, profile curvature, and elevation are considered as an influencing factor in sinkhole formation due to their relationship with the flow rate and direction of surface and groundwater and the rate of surface water that is not absorbed completely by the soil on the relevant area [53]. While the topographic wetness index refers to hollows where water can generally accumulate and the soil may become more saturated, the stream power index is a measure of the erosive power of flowing water [54,55]. Aspect is a directly effective factor in the soil moisture and evaporation process and indirectly reflects the water accumulation capacity of the slopes, as well as affecting the rainfall, wind, and vegetation of the relevant area [56,57,58]. The digital elevation model (DEM) of the study area was produced from contour lines of 1/25,000 scale standard topographic maps, and the data for other topographic parameters given above were derived from these data (Figure 4).

2.2.4. Meteorological Factors

The climate factors chosen to produce the sinkhole susceptibility map are monthly average precipitation (MAP), temperature (MAT), and water vapor pressure (MAW). Heavy rainfall can increase the potential for sinkhole formation, as it accelerates the dissolution of rocks by causing surface water to leak underground. Long-term dry periods with little precipitation and high temperatures cause rapid evaporation of groundwater, which can cause a decrease in groundwater and cause the rocks on the surface to collapse [59,60,61]. Due to the importance of underground and groundwater levels in sinkhole formations, climatic factors should be taken into account in sinkhole sensitivity studies. These data were produced from the observations of 32 meteorological stations obtained from the Turkish General Directorate of Meteorology. Since the historical data of meteorological stations differ from each other, data covering the period between January 2018 and December 2022 were used to evaluate the data in the same time period at all stations. The maps produced in this context are shown in Figure 5.

2.2.5. Environmental and Anthropological Factors

Well density (WDS), land use (LDU), proximity to settlements (PST) and roads (PRD), NDVI, and soil depth (SDP) are the environmental and anthropological factors selected to produce the sinkhole susceptibility map. Due to the relationship between sinkhole formations and high groundwater use, the density of water wells in a region is frequently used in sinkhole susceptibility studies because it expresses the level of use and movement of groundwater resources in that area. Land use and NDVI can significantly affect the processes that lead to the development of sinkholes because they reflect the natural hydrological and lithological characteristics of the region [48]. In addition, roads and settlements play an important role as factors in the development of sinkholes, as the increase in loads and pressure on the Earth can accelerate the process of sinkhole formation [41]. Soil depth is also an important parameter for sinkhole formation and development, as the sinkhole formation process may accelerate in regions where the soil depth to the bedrock is shallow, and a thicker layer on the surface may slow down this process [61]. Produced maps of the factors mentioned above are given in Figure 6.

3. Methodology

3.1. Multicollinearity Assessment

The reliability and accuracy of the developed SSMs depend critically on the careful selection of conditioning factors that closely align with site-specific environmental variables. This necessitates a rigorous assessment of multicollinearity to detect and address interdependencies among the selected variables, ensuring robust model performance. This step is crucial before initiating classifier training to circumvent the curse of dimensionality. Multicollinearity, if not addressed, can lead to several issues such as overfitting, model instability, incorrect interpretation of feature importance, computational difficulties, and redundancy in the dataset. Overfitting occurs when the model performs well on training data but poorly on unseen data due to high correlation among predictors [62]. Model instability arises when small changes in the data lead to large fluctuations in the model’s coefficients, making the results unreliable [63]. Incorrect interpretation of feature importance can mislead decision-making, especially in explainable AI methods like SHAP [64]. Computational difficulties may arise in matrix inversion processes, particularly in linear models, leading to numerical instability [65]. Finally, redundancy in the dataset increases model complexity without adding meaningful information, reducing the model’s efficiency [66].

Scholars commonly employ a multicollinearity assessment to identify and address problematic conditioning factors, as high correlations among predictors can lead to unreliable model outcomes and misinterpretation of feature importance [67,68]. Specifically, multicollinearity can inflate the variance of regression coefficients, making it difficult to assess the individual contribution of each predictor. To mitigate these issues, we conducted a multicollinearity assessment using two primary measures: the Variance Inflation Factor (VIF) and Tolerance (TOL). VIF quantifies how much the variance of an estimated regression coefficient increases due to multicollinearity, while TOL measures the proportion of variance in a predictor that is not explained by other predictors. If any chosen conditioning factor exhibits a VIF surpassing 10 and/or a TOL below 0.1, it indicates the presence of multicollinearity within the dataset, necessitating the removal of these factors until VIF and TOL values achieve acceptable levels. This approach ensures that the final set of predictors is free from significant interdependencies, thereby enhancing the model’s robustness and interpretability.

In addition, we conducted Pearson’s correlation coefficient calculations to supplement the multicollinearity assessment. These coefficients cover a scale from −1 to +1, where values approaching +1 signify a significant positive correlation between two factors, while values near −1 denote a strong negative correlation. When the values are near zero, it suggests independence between two variables. In numerous prior research, the benchmarks of +0.8 and −0.8 have been commonly acknowledged as thresholds, signifying that values surpassing these benchmarks may give rise to challenges associated with multicollinearity [69].

3.2. Model Selection and Classification Scheme

ML models are widely used for various types of problems in natural hazard susceptibility mapping, including classification, clustering, and regression [2,10,70]. In the development of susceptibility maps for natural hazards using ML models, the data about previous incidents (inventory data) plays a crucial role in training and testing the model. Classification algorithms are particularly suitable for predicting discrete outcomes, such as the presence or absence of sinkholes, while clustering algorithms are used to group similar data points based on their features, and regression models are employed to predict continuous variables, such as the size or depth of sinkholes [71,72]. In this study, we focused on classification tasks due to the presence of labeled data, where each sample was classified as either “sinkhole” or “non-sinkhole”. Classification algorithms are specifically designed to handle labeled data and predict discrete class labels, making them suitable for our objective of mapping sinkhole susceptibility. In contrast, clustering algorithms are typically used for unsupervised learning tasks with unlabeled data, where the goal is to group similar data points based on their features. Since our dataset already contained predefined labels, we employed supervised classification algorithms to ensure high accuracy and interpretability in predicting sinkhole susceptibility.

In the development of susceptibility maps for natural hazards using ML models, the data about previous incidents (inventory data) plays a crucial role in training and testing the model. In the training phase, the ML model learns from this inventory data to find patterns and understand how various factors contribute to the occurrence of natural hazards. Later, in the testing phase, the model’s accuracy is checked using a new set of data that it has not seen before (not used during training). This helps the analyst see how well the model can predict natural hazard susceptibility in different situations [73,74].

To make a thorough SSM, it is crucial to include both locations with sinkholes and those without. So, we carefully chose locations without sinkholes to match the number of locations with sinkholes in our data. This ensures an even coverage across the entire study area. We pay close attention to randomly selecting these non-sinkhole locations, which is essential for keeping a balance in the distribution of positive (sinkhole) and negative (non-sinkhole) samples in the dataset [16].

While no specific standard dictates the proportion of data allocated to training and testing subsets for ML models, a prevailing practice involves designating 70% of the inventory data for training and model refinement, with the remaining 30% reserved for testing and assessing model performance. In alignment with this convention, this study adheres to a similar split. Additionally, ensuring an equitable distribution of positive and negative samples in both the training and test datasets is crucial. Imbalances in class distribution can adversely impact the performance of classifier algorithms, underscoring the importance of maintaining symmetry in sample representation [11,75]. In this study, 567 sinkhole locations and an equal number of non-sinkhole locations were used for analysis in the GIS environment, with their respective labels as “0” and “1”. Of the resulting 1194 samples, partitioned into 70% training and 30% test data, 396 samples are designated for testing (198 positive, 198 negative samples), while 798 samples serve as training data (399 positive, 399 negative samples).

This research develops an SSM by leveraging ensemble algorithms, recognized for their efficacy in handling high-dimensional and intricate datasets. The ensemble algorithms employed in this study encompass Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The rationale for selecting ensemble algorithms in generating an SSM is threefold. Firstly, these algorithms possess a unique capability to discern and rank significant factors, aiding in the identification of crucial influencers for accurate SSM development [34]. Additionally, their explainability through techniques like SHAP enhances transparency in understanding the decision-making process [76,77]. Secondly, ensemble algorithms exhibit robustness against dataset noise, effectively mitigating overfitting and ensuring the generalizability of the SSM across diverse scenarios. This approach capitalizes on individual algorithm strengths while compensating for weaknesses, contributing to a more comprehensive and accurate sinkhole susceptibility assessment [78]. Thirdly, ensemble algorithms enhance model stability by aggregating predictions from multiple base models, particularly advantageous in the context of sinkhole susceptibility mapping where uncertainties and data variations exist. This mitigates the impact of individual model variability, resulting in more reliable predictions [79].

The split of the training-test dataset, and evaluation of model performances were executed utilizing the Python-based scikit-learn library. The RF model was trained using scikit-learn, while other models were trained using the XGBoost and Microsoft LightGBM libraries. The hyperparameters of all classifiers were optimized using the Optuna framework, a Bayesian optimization tool that implements the Tree-structured Parzen Estimator algorithm. Optuna addresses inefficiencies in conventional hyperparameter tuning methods like grid search and random search by improving computational performance and minimizing the potential for suboptimal parameter selection. This approach ensures more effective exploration of hyperparameter spaces while avoiding convergence to local optima [80].

3.2.1. Random Forest (RF)

Leveraging decision trees while enhancing their accuracy, Random Forest (RF) stands out as a highly effective ML method employed for both regression and classification purposes [81]. RF is acknowledged as a robust classification approach, and its training is carried out through the bagging method. In the initial phase of training, a bootstrapped dataset is generated. The training process involves the use of a specific subset of variables, where decision trees are constructed, and a data point is assigned a class based on the majority of votes garnered from a varying number of decision trees. The most basic classification model involves the partitioning of variables based on the predictor variables of the parent node, resulting in the refinement of the child nodes. Three core hyperparameters shape the performance of the decision tree model: the number of trees (NT), the number of splits (NS), and the depth (d). NT determines the total trees in the ensemble, where each tree votes to classify data points. While a higher NT can improve accuracy, it also increases computational complexity. NS specifies the minimum samples required to split a node; overly high NS values risk underfitting by restricting splits before nodes achieve purity. In RF models, decision trees generate splits to isolate distinct classes. Although more splits enhance data variability capture, excessive splitting may cause overfitting. Balancing these parameters is essential for optimizing model accuracy and generalization.

The RF classifier produces its output as follows:

\hat{Y} = \frac{1}{q} \sum_{N = 1}^{N} h_{k} (X)

where

h (X)

represents the prediction of the

k

th decision tree in the ensemble, and

X

denotes the input feature vector.

3.2.2. eXtreme Gradient Boosting Machine (XGBoost)

XGBoost, a widely recognized advanced ML algorithm, achieves high performance in classification by systematically assembling weak learners—typically decision trees—in a sequential manner, enhancing their accuracy through progressive optimization iterations [82]. What sets XGBoost apart is that it adds a special term to make the model more robust and prevent it from overfitting. It also uses second-order Taylor series expansion to more accurately estimate how well the model is performing. XGBoost is designed to work quickly on large sets of data and for tasks where building a model fast without sacrificing accuracy is needed. The algorithm focuses on finding the best balance between accurately predicting outcomes and avoiding unnecessary complexity that could lead to overfitting.

A key distinction lies in understanding the differences between bagging (e.g., RF) and boosting (e.g., XGBoost). Bagging operates by aggregating predictions from multiple independent base learners (trees) to reduce variance and stabilize outcomes. In contrast, boosting constructs learners sequentially, where each subsequent model addresses errors from prior iterations to reduce bias.

The XGBoost algorithm’s central goal is to optimize a regularized loss function defined as follows:

L (Φ) = \sum_{i} l ({\hat{y}}_{i}, y_{i}) + \sum_{k} Ω (f_{k})

(1)

In Equation (1), the first term quantifies the difference between the observed target

y_{i}

and predicted value

{\hat{y}}_{i}

. The second term, detailed in Equation (2), penalizes model complexity to prevent overfitting:

Ω (f) = γ T + \frac{1}{2} λ {‖w‖}^{2}

(2)

Here,

T

is the number of terminal tree leaves,

w

denotes leaf weights, and

γ

and

λ

are regularization parameters that control the penalty strength. XGBoost iteratively minimizes this objective function at each step

t

, as shown in Equation (3):

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

(3)

3.2.3. Light Gradient Boosting Machine (LightGBM)

Light Gradient Boosting Machine (LightGBM) stands out as a prominent advancement in ML, distinguished by its effectiveness in classification tasks. A key innovation is its unique capability to process categorical features efficiently via Gradient-based One-Side Sampling (GOSS), a technique optimized for large-scale datasets with high-dimensional categorical data. Unlike XGBoost, LightGBM diverges from traditional tree-building methods by employing a leaf-centric growth strategy, where splits focus on nodes that maximize loss reduction. This approach enhances computational efficiency, particularly in datasets with complex feature relationships and substantial size [83].

The algorithm’s iterative process is formalized in Equation (4):

F_{n} (x) = α_{0} f_{0} (x) + α_{1} f_{1} (x) + \dots + α_{n} f_{n} (x)

(4)

Here, the model initializes with

n

decision trees, assigning uniform weights (

1 / n

) to training samples. Each weak learner

f (x)

is trained to determine its optimal coefficient

α

. The classifier iteratively refines sample weights and integrates new learners until the final ensemble

F_{n} (x)

is achieved.

3.3. Performance Metrics

Various measures are employed to assess classifier performance. True Positives (TP) are the instances correctly labeled as positive, and True Negatives (TN) are the instances correctly labeled as negative. False Positives (FP) are instances mistakenly labeled as positive, and False Negatives (FN) are instances mistakenly labeled as negative. These metrics are computed using a confusion matrix, forming the foundation for evaluating performance. For a single class C_i, the terms TP_i, FN_i, TN_i, and FP_i are utilized to evaluate class-specific metrics. The numerical values for C_i are instrumental in calculating the following metrics:

S e n s i t i v i t y = {T P}_{i} / ({T P}_{i} + {F N}_{i})

S p e c i f i c i t y = {T N}_{i} / ({T N}_{i} + {F P}_{i})

A c c u r a c y = ({T P}_{i} + {T N}_{i}) / ({T N}_{i} + {F P}_{i} + {T P}_{i} + {T N}_{i})

P r e c i s i o n = {T P}_{i} / ({T P}_{i} + {F P}_{i})

F 1 S c o r e = 2 * P r e c i s i o n * S e n s i t i v i t y / (P r e c i s i o n + S e n s i t i v i t y)

Additionally, two widely used metrics for binary classification are the Receiver Operator Characteristic (ROC) curve and the Area under the ROC Curve (AUC). The ROC curve assesses the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR), while the AUC represents the overall accuracy:

T P R (S e n s i t i v i t y) = T P / (T P + F N)

F P R = F P / (F P + T N)

AUC values surpassing 0.9 reflect classification accuracy nearing optimal levels, signifying a model with highly robust predictive capability.

3.4. Enhancing Model Explainability Through SHAP

In this study, the SHAP (SHapley Additive eXplanations) technique, rooted in the principles of game theory, is employed to gain profound insights into the class outputs predicted by ML classifiers. The adoption of the SHAP technique facilitates a comprehensive understanding of model predictions by computing feature importance scores. Unlike conventional approaches that evaluate individual features in isolation, SHAP leverages the theoretical underpinnings of Shapley values from game theory. The SHAP method is unique because it carefully examines how each feature contributes to a prediction by considering all possible combinations of features. Unlike just looking at features on their own, this method gives a complete view of how features work together in the model’s decision-making. Thus, SHAP provides a sophisticated and easy-to-understand way to figure out the detailed dynamics behind the predictions made by ML models [84]. The SHAP technique has been instrumental in improving the clarity of spatial predictive models. Its impact has been documented in various applications, including modeling landslides [38,85,86], floods [35,87], earthquakes and soil liquefaction [88,89], wildfires [31,74,90], snow avalanches [69,91], and erosion [92].

In essence, the Shapley value [93], as expressed in Equation (5), quantifies the importance of a feature

i

within the context of a broader set of features.

ϕ_{i} = \frac{1}{|N|!} \sum_{S \subseteq N \{i\}} \frac{|S|! (|N| - |S| - 1)!}{N} [f (S \cup \{i\}) - f (S)]

(5)

The Shapley value (

ϕ_{i}

) for a feature

i

is derived by evaluating all feature subsets (

S

) within the complete set (

N

) and determining the feature’s significance through its average marginal contribution across every possible sequence of feature combinations. Mathematically, this involves averaging the difference in the model’s prediction when feature

i

is included versus excluded from a subset

S

, weighted by the probability of

S

occurring. This approach inherently accounts for the interdependencies and ordering of features, which is critical when addressing correlated predictors, as the sequence of feature inclusion can alter the model’s behavior [94].

In this study, the SHAP framework was applied via the Python-based SHAP library to explain ML models. The library provides robust analytical tools, including intuitive visualizations to elucidate how individual features influence predictions, thereby enhancing transparency in model decision-making [64].

4. Results

4.1. Multicollinearity Results

The calculated VIF and TOL values indicate the existence of multicollinearity problems in specific factors (Table S1). We reached this conclusion by noticing that some tolerance values were less than 0.1, and some VIF values were higher than 10. To address this, we followed a step-by-step elimination process. Starting with the factor linked to the highest VIF value, we removed factors until we reached a point where no factors were causing multicollinearity issues. After this elimination process, we kept 23 factors, which can be seen in Table S2. The highest VIF was 7.29, and the lowest TOL was 0.14. Importantly, we found that these kept factors were suitable for use in classification processes because none of them had issues with multicollinearity.

To reinforce the findings of the multicollinearity assessment, the Pearson’s correlation coefficients among each factor were computed for the dataset. As illustrated in the Figure S1, the factor pair with the highest correlation value is DTI–DCH, registering at 0.73. Across the matrix, it is evident that the majority of factor pairs exhibit no substantial correlation, with partially correlated pairs falling below the selected threshold of 0.8.

To sum up, two factor selection methods were employed: the correlation coefficient and the multicollinearity assessment. The latter offers a comprehensive analysis of how factors interact, complementing the relationship insights provided by the correlation coefficient. Trimming a dataset from a larger set of factors to a more focused subset, such as from 34 down to 23, is a frequent practice used to better tailor the factors to suit the ML-based classification framework. This process ensures that the resulting data suit the study region, enhancing their relevance and usability. Eliminated factors underwent careful scrutiny, ensuring key factors, such as PTF, mentioned in the literature, were retained for their importance in model performance and analysis.

4.2. Hyperparameters Tuning

ML algorithms require careful optimization of their unique hyperparameters to maximize classifier performance. In this research, hyperparameter tuning was conducted using the Optuna framework [95], which automates the exploration of predefined parameter ranges (search space) to identify optimal configurations. By iteratively assessing combinations of hyperparameters, Optuna systematically refines model settings, enhancing accuracy on the target dataset. The finalized hyperparameters, derived from this optimization process, are subsequently presented below:

RF: {n_estimators: 822}, {min_samples_split: 4}, {max_depth: 13}, { min_samples_leaf: 4}
XGBoost: {n_estimators: 829}, {eta (learning_rate): 0.010394}, {max_depth: 15}, {subsample: 0.992398}
LightGBM: {n_estimators: 470}, {eta (learning_rate): 0.010939}, {max_depth: 12}, {subsample: 0.851277}

4.3. Predictive Performance of Classifier Models

The performance of the ML models was analyzed using confusion matrices (Figure 7a–c), which quantify key performance indicators, such as sensitivity, specificity, accuracy, precision, and F1-score, as aggregated in Figure 7d. Additionally, ROC curves and corresponding AUC values are illustrated in Figure 7e. Among the evaluated algorithms, the RF classifier demonstrated the strongest performance, attaining a test accuracy of 95.5%, marginally exceeding XGBoost and LightGBM, which both achieved 94.9%. This represents a 0.6% improvement over other ensemble methods. RF also attained the highest precision (96.4%), specificity (96.5%), and F1-score (95.4%), with XGBoost trailing closely in precision (95.4%). While all classifiers exceeded 94% across metrics—signifying robust predictive capability—RF exhibited a slight edge in overall accuracy. Further analysis of AUC values (Figure 7e) highlights the strong discriminatory power of all models, with RF achieving the highest AUC (98.8%), followed by LightGBM (98.5%) and XGBoost (98.3%). Higher AUC values reflect a model’s ability to minimize misclassification, with RF showing marginally superior class separation in this case study.

Analysis of the classifiers’ confusion matrices highlighted the RF model’s precision in distinguishing sinkhole and non-sinkhole instances within the test dataset. In the sinkhole category, RF correctly identified 187 out of 198 samples while achieving 191 correct classifications among 198 non-sinkhole samples. This corresponds to classification accuracies of 94.5% for sinkholes and 96.5% for non-sinkholes, underscoring RF’s balanced proficiency across both classes. Owing to its consistent superiority in accuracy and reliability, the RF model was selected to generate the final SSM and to undergo SHAP-based interpretation, enabling deeper insights into its decision logic.

4.4. SHAP-Driven Feature Significance Analysis

Figure 8a illustrates a detailed SHAP summary visualization, mapping individual sinkhole instances and their SHAP values relative to input predictors. After training the RF classifier, factors were prioritized based on their influence, with the x-axis denoting SHAP values (magnitude and direction of impact) and the y-axis listing the predictors. Each dot corresponds to a sinkhole sample, colored by predictor magnitude—lighter hues (low values) to darker hues (high values). The RF model identified PTF, MAP, and DBC as the three most critical predictors.

Figure 8b further quantifies predictor importance using absolute SHAP values, emphasizing their relative contributions to sinkhole prediction. Higher mean absolute SHAP values signify stronger influence, with PTF, MAP, and DBC again dominating. Notably, DBC exhibited a paradoxical relationship: elevated DBC values correlated with negative SHAP values (third row of Figure 8a), implying that higher bicarbonate differences amplify sinkhole susceptibility. Conversely, lower PTF and MAP values (lighter dots) aligned with positive SHAP values, indicating reduced PTF/MAP levels heighten sinkhole likelihood. The analysis also revealed ASP, DTI, and SDP as the least impactful factors. While SHAP summary plots clarify directional influence, they lack granularity on how predictor magnitudes affect outcomes. To address this, SHAP dependence plots are recommended to dissect nonlinear relationships between feature values and model predictions.

4.5. SHAP Dependence Plots

This study involves the creation of SHAP dependence plots (Figure 9) for all input factors. These plots serve to elucidate the association between factor values and their respective SHAP values. The utilization of dependence plots aims to unveil potential threshold effects by analyzing the variations in SHAP values based on different factor values.

Geological–tectonic factors reveal that samples with PTF values ranging from 56 to 1847 m yield positive SHAP values in the study area. Notably, sinkhole samples in close proximity to faults exhibit a positive contribution to sinkhole formation. Conversely, locations farther from faults exhibit lower sinkhole formation probability. The SHAP values associated with the LTG factor indicate that only specific lithological formations (Ngi, Ngv, Qh, and some Mof formations) contribute significantly to sinkhole formation.

In the context of the sole climatic factor, MAP values between 33 and 36 mm yield positive SHAP values, suggesting that the region’s minimum and maximum precipitation levels do not significantly contribute to sinkhole formation. Instead, moderate precipitation levels contribute positively to sinkhole susceptibility.

Environmental–anthropogenic factors also play a crucial role, with WDS values exceeding 0.48 indicating a higher likelihood of sinkhole formation. The PST factor reveals that locations in close proximity to settlements (within 4000 m) generate predominantly positive SHAP values, implying that closer proximity to settlements increases sinkhole susceptibility. Similar trends are observed with the PRD factor, where locations near roads (within 1000 m) exhibit positive SHAP values, indicating an elevated susceptibility to sinkholes. The NDVI factor values above 0.20 yield positive SHAP values, indicating that samples located in areas with healthy vegetation contribute positively to sinkhole formation. SDP values, excluding soil depth class A (very deep soils with a depth exceeding 150 cm), generally contribute positively to sinkhole susceptibility.

Examining topographic factors, the ELV factor does not exhibit a distinct threshold effect. SHAP values peak between elevations of 1070 and 1080 m, while lower elevations near 1000 m are scattered around zero SHAP values. Regarding the SLP factor, most slope levels exceeding 0.2 generate positive SHAP values, with steeper slopes contributing higher SHAP values. Conversely, flat locations predominantly yield negative SHAP values, indicating low susceptibility to sinkholes. The ASP factor, deemed the least influential, lacks a clear threshold effect. All aspect directions contribute similarly to both positive and negative SHAP values. However, samples with a northeast (NE) direction predominantly yield positive SHAP values. PRC values close to zero primarily result in negative SHAP values, but as PRC values approach +1 (indicating more convex locations), SHAP values increase. Analyzing PDR values, samples within the 0 to 300 m range exhibit a nearly equal distribution of negative and positive SHAP values. Between 300 and 2100 m, a clear positive SHAP value trend emerges, with locations farther from drainage generating negative SHAP values. In essence, locations with distances to drainage between 300 and 2100 m contribute more significantly to sinkhole formation.

Considering hydrogeological factors, positive SHAP values are observed when DBC exceeds 0.68, DPH is less than −0.19, DDO is less than −0.01, and DCH is greater than 0. Additionally, most DDC values above 10.1, DMG values ranging from 0.12 to 1.74, DSU values between 0 and 0.6, and GWD values exceeding 4.71 contribute positively to sinkhole formation. Conversely, most DTI and DPT values scattered around 0 yield more negative SHAP values, although their dependence plots lack a clear threshold effect.

4.6. Generated Sinkhole Susceptibility Map

This study presents a sinkhole susceptibility map generated using SHAP values. The study area consists of a total of 40,213,167 pixels, with SHAP values calculated for each criterion at every pixel. The SHAP value of each criterion in the overall model was multiplied by the corresponding SHAP value of that criterion at each pixel, and a susceptibility index was computed for the entire area.

The susceptibility indices, representing the likelihood of sinkhole formation, were then classified into distinct categories to facilitate interpretation and visualization of the results. Several classification methods, such as equal interval, quantile, and geometric interval, are commonly used in susceptibility mapping. However, the Jenks Natural Breaks Classification Method was selected for this study due to its ability to identify natural groupings within the data and effectively highlight significant variations in susceptibility levels [96]. Unlike equal interval or quantile methods, which may produce arbitrary or misleading class boundaries, the Jenks method maximizes the differences between classes while minimizing the variance within each class, ensuring a natural distribution of susceptibility values. The calculated susceptibility indices were classified into five categories using the Jenks Natural Breaks Classification Method. These categories are defined as very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility. The sinkhole susceptibility map generated for the study area is visualized in Figure 10.

Susceptibility map given in Figure 10 reveals the high-resolution spatial distribution of sinkhole formation potential within the study area. The obtained classes provide detailed information for identifying sensitivity areas, and SHAP-based approach enables the calculation of sensitivity indices not only with pixel-based data but also by associating them with variable effects across the model. An illustration of the spatial distribution of the susceptibility classes (see Figure 11) reveals that a significant portion of the study area falls within the moderate and high susceptibility categories. According to the susceptibility map, 15.75% of the study area exhibits very low susceptibility, 18.14% displays low susceptibility, 20.53% is moderately susceptible, 25.99% is highly susceptible, and 19.59% is very highly susceptible.

5. Discussion

In this study, SHAP-based machine learning models were employed to generate SSMs, and the spatial distribution of sinkhole formation potential within the study area was examined. The primary objective of this study is to clarify the spatial effects of the factors contributing to sinkhole formation and detail their impact on the model. The results are largely consistent with similar studies in the literature, thereby highlighting the advantages of the adopted methodology. SHAP analysis has provided detailed insights into the effects of factors on the model at both the global and local scale, and this approach has enhanced the transparency of model outcomes compared to traditional methods. This study provides significant insights, particularly in terms of performance comparisons of machine learning models, variable selection processes, and sustainable land management.

5.1. Comparative Performance of Machine Learning Models in SSM

Performance comparisons among machine learning models have shown that the Random Forest (RF) algorithm delivers the highest performance in sinkhole susceptibility analysis. RF outperformed other popular models such as XGBoost and LightGBM by approximately 0.6%, achieving a test accuracy of 95.5%. Additionally, the RF algorithm achieved the highest values in metrics such as precision (96.4%), specificity (96.5%), and F1 score (95.4%), producing more consistent results in predicting sinkhole formations compared to the other models. Comparing sinkhole susceptibility maps from previous studies in the literature that utilized machine learning models, it is noted that the RF model demonstrated high performance [7,17,25]. These results highlight that RF is an effective method for sinkhole susceptibility analysis, demonstrating high accuracy, low misclassification rates, and repeatability. Similarly, the literature indicates that RF, particularly in heterogeneous and complex datasets, has shown high performance [81,97]. Although other models, such as XGBoost (98.3%) and LightGBM (98.5%), exhibited comparable performance, the consistency and interpretability of RF were the key advantages highlighted in this study. Moreover, the fact that all models achieved accuracy above 94% demonstrates that the selected methods generally possess high predictive capacity. The findings obtained in this study support the effectiveness of machine learning models in the creation of sinkhole susceptibility maps. Multicollinearity issues negatively affect the performance of machine learning models, while the variable selection process enhances model accuracy [98]. The 34 criteria used in the study were examined through multicollinearity analyses (VIF, TOL, and Pearson correlation), and 11 criteria with high linear relationships were removed from the model. By including only the significant independent variables in the model, this process enhanced both the overall accuracy of the model and the reliability of the results obtained through SHAP analysis. This method highlights the importance of thoughtfully selecting variables, particularly when modeling intricate karstic processes. In karstic formations, the complex relationships between topographic, hydrogeological, geological, and environmental factors make the variable selection process critical [2]. The results obtained from multicollinearity analyses in this study have made the model less complex and more interpretable. This, in turn, has enhanced the effectiveness of explainable AI methods, such as SHAP analysis.

5.2. Explainable AI in Sinkhole Susceptibility: Insights from SHAP

The “black-box” nature of machine learning (ML) models has long been a critical issue in natural hazard susceptibility mapping, as it limits the interpretability and trustworthiness of model predictions [30,64]. To address this challenge, explainable artificial intelligence (XAI) techniques, such as SHapley Additive exPlanations (SHAP), have emerged as powerful tools for enhancing model transparency [32,84]. SHAP provides both global and local explanations by quantifying the contribution of each input feature to the model’s predictions, making it particularly valuable for understanding complex, nonlinear relationships in geospatial data [38,85]. In this study, we leverage SHAP to transition from traditional “black-box” models to an explainable framework, marking a significant advancement in sinkhole susceptibility mapping. By applying SHAP, we not only identify the most influential factors but also reveal how specific values of these factors contribute to sinkhole formation at both regional and local scales. This dual-level explanation provides a more reliable basis for decision-making, particularly in the Konya Closed Basin, where understanding the underlying drivers of sinkhole formation is crucial for sustainable land-use planning and risk mitigation. The following analysis explores the SHAP-based interpretation of key factors influencing sinkhole susceptibility, offering actionable insights for stakeholders and policymakers.

According to the results obtained from the SHAP analysis, the five key factors were identified as proximity to fault lines, annual average rainfall, bicarbonate difference, proximity to drainage lines, and well density. Sinkhole formation reaches its highest levels in areas up to approximately 2000 m from fault lines. The literature also indicates that fault lines create fractured zones, which accelerate groundwater movement and, consequently, increase the risk of karst collapse [2]. Additionally, in karst regions, fault zones have low impermeability, allowing water to rapidly infiltrate underground, which in turn enhances rock dissolution [99]. Annual average rainfall ranging between 33 and 36 mm supports the findings that consistent water flow accelerates karst dissolution and triggers the sinkhole formation process [100]. Moreover, it has been stated that both low and excessive rainfall in karst systems directly affect dissolution and mechanical erosion processes [101]. Particularly, changes in seasonal rainfall patterns and sudden rainfall following periods of extreme drought disrupt the stability of underground voids, enhancing sinkhole formation [102]. Another key factor, the bicarbonate difference being above 0.68, indicates that changes in the chemical composition of water contribute directly to sinkhole formation by dissolving carbonate rocks more rapidly. Previous studies have also noted that the high bicarbonate content in groundwater weakens ground stability by corroding carbonate rocks [103]. It can be stated that changes in groundwater chemistry, particularly, lead to the acceleration of karstic processes, which in turn increase the risk of sudden collapse in the long term. Areas up to approximately 2000 m from drainage lines fall within the group of areas most susceptible to sinkhole formation. It has been stated that drainage systems alter water flow paths, leading to sudden fluctuations in groundwater levels, and the interaction between drainage lines and groundwater movement accelerates the karstic cavities formation process, increasing sinkhole susceptibility [104]. The density of wells is one of the most important factors directly influencing sinkhole formation. Specifically, uncontrolled wells drilled for agricultural irrigation lead to significant declines in groundwater levels, destabilizing the ground and increasing sinkhole formation. In the literature, the decline in groundwater levels caused by excessive water extraction is considered one of the key processes that accelerate the collapse of karstic cavities [105]. Especially in areas with intensive agricultural production, the widespread cultivation of crops with high water demands, such as sugar beets, corn, and alfalfa, has led to a noticeable increase in sinkholes in these regions. Additionally, uncontrolled and excessive groundwater use, along with unplanned agricultural activities, have long-term effects on the sinkhole formation process and disrupt regional hydrogeological balances, as stated by many researchers [106]. These findings emphasize the need to reassess agricultural water usage and land management policies in the Konya Closed Basin, home to millions of people. In particular, it is crucial to regularly monitor groundwater extraction, explore alternative water sources, and promote sustainable agricultural practices in the region.

5.3. Factors Influencing Sinkhole Formation

The geological and tectonic factors, which are one of the five main factor groups, have emerged as the most influential in sinkhole formation. As explained in detail above, PTF has been identified as one of the key factors, indicating that sinkhole occurrence significantly increases in areas close to fault lines. As emphasized by [102], fault lines increase weakness zones in karst units, reconfigure groundwater flow paths, and accelerate dissolution processes, ultimately triggering karstic phenomena. Additionally, LTG has been associated with high susceptibility levels in areas dominated by carbonate rocks. The susceptibility of carbonate rocks to dissolution leads to more frequent sinkhole occurrences in these regions [45,99]. These findings align with the observed distribution of sinkholes in the study area. Furthermore, a negative correlation was observed between lithology (LTG) and elevation (ELV), suggesting that sinkhole formation is more prevalent in areas with lower elevations and specific lithological characteristics. This finding is consistent with previous studies that have highlighted the role of lithology and elevation in controlling sinkhole susceptibility [4,52].

Hydrogeological factors serve a critical role in explaining the nonlinear effects of the model, clearly demonstrating the impact of changes in GWL and GWD on karst processes. Particularly, DBC has been identified as the most influential variable among hydrogeochemical parameters in the SHAP analysis. It is known that differences in bicarbonates, which influence carbonate solubility, accelerate karst processes and lead to sinkhole formation [107]. Similarly, other hydrogeochemical variables, such as DDO and DPH, indicate that changes in the chemical balance of groundwater trigger karst processes. These findings highlight the role of hydrogeochemical processes in sinkhole formation within karst systems. Additionally, the strong positive correlations observed between chloride (DCH), magnesium (DMG), and sulfate (DSU) concentrations suggest that these ions play a significant role in the dissolution of carbonate and evaporitic rocks. Magnesium, in particular, accelerates the dissolution of carbonate rocks, while chloride and sulfate ions increase the salinity of groundwater, further enhancing the dissolution process [2,99]. These findings align with previous studies that have emphasized the importance of hydrogeochemical processes in sinkhole formation [100,104]. Furthermore, a negative correlation was observed between pH (DPH) and bicarbonates (DBC), implying that a decrease in pH levels may reduce bicarbonate concentrations, which in turn can accelerate the dissolution of carbonate rocks [103,104]. This finding is consistent with previous studies that have highlighted the role of pH in controlling the solubility of carbonate rocks in karst systems. Moreover, a negative correlation was identified between potassium (DPT) and pH (DPH), suggesting that an increase in potassium concentrations may lower the pH of groundwater, creating more acidic conditions that enhance rock dissolution [100,101]. These findings underscore the complex interplay between hydrogeochemical factors in sinkhole formation and highlight the importance of considering these interactions in susceptibility assessments.

Topographic factors have a significant role in determining the spatial distribution of sinkhole formation. Variables such as ELV and SLP have shown moderate importance in the SHAP analysis. These findings indicate that karst processes are generally more active in areas with lower elevations and shallow slopes. Especially in areas with mild slopes, the accumulation of water and its infiltration into the ground can accelerate karst dissolution processes [2]. Furthermore, factors related to water flow, such as PDR and SPI, illustrate the impact of surface waters on the dissolution of karstic rocks. This finding supports the conclusion by [99], who indicated that areas near drainage lines are highly sensitive to sinkhole formation. Similarly, many studies in the literature have emphasized that water flow pathways contribute to the expansion of karstic cavities and the formation of sinkholes [4,14]. A positive correlation between slope (SLP) and elevation (ELV), suggesting that sinkhole formation is more likely in areas with gentle slopes and lower elevations. This finding aligns with previous studies that have highlighted the importance of topographic factors in controlling sinkhole susceptibility [4,52].

Meteorological factors play a crucial role in understanding the climatic conditions that contribute to sinkhole formation. MAP has been identified as one of the most influential factors in the SHAP analysis. An increase in precipitation elevates groundwater levels, accelerating the dissolution of karstic rocks. This finding is also supported by [108], emphasizing the direct impact of rainfall on karstic processes. Also, variables such as MAT and MAW indicate that temperature and evaporation rates influence groundwater movement. In particular, high temperatures can increase evaporation rates, leading to a drop in groundwater levels and the collapse of karstic cavities [4,44]. This aligns with the observed sinkhole formations in the study area.

Environmental and anthropogenic factors have played a significant role in explaining the impact of human activities on karstic processes. WDS has emerged as a highly significant factor in the SHAP analysis. Intensive groundwater use, particularly wells drilled for agricultural irrigation, leads to a decline in groundwater levels and deterioration of karstic cavities. This finding supports the view that groundwater extraction accelerates sinkhole formation [40,41,44]. Moreover, the variables PST and PRD demonstrate that areas with high human activity expose increased sinkhole susceptibility. This reflects the adverse effects of increased surface load on karstic rocks [1,59]. The NDVI variable, representing vegetation cover, has shown that vegetation has a slowing effect on karstic processes. A positive correlation was observed between well density (WDS) and proximity to settlements (PST), suggesting that areas with high human activity and intensive groundwater use are more susceptible to sinkhole formation. This finding aligns with previous studies that have highlighted the role of anthropogenic factors in controlling sinkhole susceptibility [40,41].

5.4. Implications for Sustainable Land Management

The findings obtained from this study provide valuable insights for sustainable land management and, consequently, the human life cycle. One of the most influential factors in sinkhole formation is the uncontrolled and severe consumption of groundwater resources. Agricultural activities and increased urban water consumption due to population growth directly contribute to the decline in groundwater levels in the region, expediting karst processes. The widespread cultivation of water-intensive agricultural crops such as corn, sugar beets, and alfalfa, which are not suited to the climatic conditions of the basin under study, further exacerbates this process. The re-evaluation and improvement of water management policies is critical in reducing or preventing the damage caused by geological disasters such as sinkholes. Many researchers in the literature have highlighted the importance of promoting modern irrigation methods, supporting crops with low water consumption and utilizing water resources more conscientiously [23,41,109]. Moreover, it is recommended to limit urbanization in sinkhole-prone areas and to regulate agricultural activities in these regions. On the other hand, incorporating field observations and temporal analyses will make the model more comprehensive for future studies. Especially, monitoring long-term groundwater level changes and investigating their impacts on sinkhole formation is believed to further enhance the predictive capacity of the models to be developed. In addition, integrating climate change scenarios into the model could assist in more accurately predicting future sinkhole formation risks. These types of research will make significant contributions to the development of sustainable land management strategies in karst regions.

5.5. Enhancing Risk Management Strategies

The findings of this study have significant implications for risk management, particularly in areas prone to sinkhole formation. By utilizing SHAP-based machine learning models to assess sinkhole susceptibility, this research offers an effective framework for identifying and mitigating risks associated with sinkholes in karst regions. The model’s ability to provide a clear and interpretable analysis of the factors contributing to sinkhole formation is invaluable for decision-makers and urban planners. One of the key implications is the ability to prioritize areas at high risk of sinkhole formation. This can guide land-use planning, zoning regulations, and infrastructure development, ensuring that vulnerable areas are either avoided or subjected to enhanced monitoring and precautionary measures. For example, regions identified as being at high risk due to factors such as proximity to fault lines, excessive groundwater extraction, or vulnerable geological formations can be subject to stricter building codes, groundwater management policies, and environmental monitoring programs. Additionally, understanding the dynamic nature of sinkhole susceptibility through SHAP analysis can help local authorities develop early warning systems. These systems could monitor changes in groundwater levels, precipitation patterns, and other contributing factors to predict potential sinkhole events before they occur. This proactive approach would significantly reduce the risks to human life and infrastructure, enhancing community resilience to sinkhole-related disasters. In conclusion, the application of machine learning models like the one developed in this study provides a powerful tool for risk management in karstic regions. By integrating these models into risk assessment frameworks and decision-making processes, authorities can better manage sinkhole risks and promote sustainable development in vulnerable areas.

5.6. Limitations and Future Studies

While this study has made significant progress in sinkhole susceptibility mapping using explainable machine learning models, certain limitations must be acknowledged. In particular, geological factors such as the contact surface between soluble and insoluble rocks, the dip angle of rock layers, and the thickness of rock layers were not included in the model due to data unavailability. These factors are important parameters that could influence sinkhole formation but could not be evaluated within the scope of the current dataset. In this study, groundwater levels and hydrogeological data were measured during both dry and wet seasons, and the differences between these periods were analyzed. This approach allowed the model to partially capture the dynamic processes influencing sinkhole formation. However, the lack of long-term and continuous monitoring data may limit the model’s ability to fully capture temporal variations. Future research should focus on collecting missing geological and hydrological data to improve model performance. Additionally, incorporating long-term monitoring of climate change scenarios and groundwater fluctuations could provide more accurate predictions of future sinkhole occurrences. Investigating the effects of human activities, such as urbanization and agricultural practices, using time-series data would also enhance our understanding of sinkhole formation mechanisms. Finally, the development of real-time monitoring systems and early warning mechanisms could be a significant step in reducing sinkhole risks. Such systems could track critical factors like groundwater levels and land subsidence, providing timely alerts to authorities before sinkhole formation occurs. These future research directions will further enhance the accuracy and practical applicability of sinkhole susceptibility models, contributing to more effective risk management and sustainable land-use planning in karst regions.

6. Conclusions

This study aims to model sinkhole susceptibility in the Konya Closed Basin using an explainable artificial intelligence (XAI) approach. For this purpose, models created using RF, XGBoost, and LightGBM algorithms were compared, and the RF model, with the highest accuracy (95.5%), was preferred for generating the susceptibility map of the results. SHAP analyses have shown that geological–tectonic (proximity to faults, lithology), hydrogeological (bicarbonate difference, groundwater level change), and climatic (annual average precipitation) factors are the most essential factors in sinkhole formation. In this study, specific threshold values have been identified. Sinkhole formation significantly increases within a distance of up to 2000 m from faults. The annual average precipitation ranging between 33 and 36 mm is a critical factor contributing to the increase in sinkhole formation in the region. In addition to that, a bicarbonate difference exceeding 0.68 indicates that changes in groundwater chemistry directly contribute to sinkhole formation. These findings provide important insights for urban planning, infrastructure investments, agricultural water management, and the sustainability of human life. In particular, the SHAP-based explainability approach has facilitated easier interpretation of sinkhole susceptibility analyses by decision-makers and has made the model outputs more understandable. This study demonstrates the importance of using XAI techniques in natural disaster modeling and enables more reliable decision-making in risk management strategies. In conclusion, this research provides a robust explainable machine learning framework to better understand sinkhole formation in karst areas and minimize potential risks that threaten sustainability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15063139/s1, Table S1: Multi-collinearity test results involving all criteria; Table S2: Multi-collinearity test results involving selected criteria after the sequential elimination; Figure S1: The Pearson’s correlation coefficients.

Author Contributions

Conceptualization, S.S.B., M.C.I., Ş.A. and C.G.; methodology, S.S.B. and M.C.I.; software, S.S.B. and M.C.I.; validation, S.S.B., C.G. and M.C.I.; formal analysis, S.S.B., C.G. and M.C.I.; investigation, S.S.B., C.G., M.C.I., H.B. and H.I.G.; data curation, S.S.B., C.G., H.B. and H.I.G.; writing—original draft preparation, S.S.B., C.G., M.C.I., H.B., H.I.G. and Ş.A.; writing—review and editing, S.S.B., C.G., M.C.I., H.B., H.I.G. and Ş.A.; visualization, S.S.B., C.G. and M.C.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Konya Provincial Directorate of Turkish Disaster and Emergency Management Presidency (AFAD) within the framework of the “Determination of Sinkhole Locations Project” [grant no. 2020K-138637].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The author thank the journal editor and anonymous reviewers for constructive comments. Special thanks to Konya Provincial Directorate of AFAD (Turkish Disaster and Emergency Management Presidency) for project funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

De Waele, J.; Gutiérrez, F. Karst Hydrogeology, Geomorphology and Caves; John Wiley & Sons: Hoboken, NJ, USA, 2022; pp. 1–912. [Google Scholar] [CrossRef]
Gutiérrez, F.; Parise, M.; De Waele, J.; Jourde, H. A Review on Natural and Human-Induced Geohazards and Impacts in Karst. Earth-Sci. Rev. 2014, 138, 61–88. [Google Scholar] [CrossRef]
Esposito, C.; Belcecchi, N.; Bozzano, F.; Brunetti, A.; Marmoni, G.M.; Mazzanti, P.; Romeo, S.; Cammillozzi, F.; Cecchini, G.; Spizzirri, M. Integration of Satellite-Based A-DInSAR and Geological Modeling Supporting the Prevention from Anthropogenic Sinkholes: A Case Study in the Urban Area of Rome. Geomat. Nat. Hazards Risk 2021, 12, 2835–2864. [Google Scholar] [CrossRef]
Galve, J.P.; Bonachea, J.; Remondo, J.; Gutiérrez, F.; Guerrero, J.; Lucha, P.; Cendrero, A.; Gutiérrez, M.; Sánchez, J.A. Development and Validation of Sinkhole Susceptibility Models in Mantled Karst Settings. A Case Study from the Ebro Valley Evaporite Karst (NE Spain). Eng. Geol. 2008, 99, 185–197. [Google Scholar] [CrossRef]
D’Angella, A.; Canora, F.; Spilotro, G. Sinkholes Susceptibility Assessment in Urban Environment Using Heuristic, Statistical and Artificial Neural Network (ANN) Models in Evaporite Karst System: A Case Study from Lesina Marina (Southern Italy). In Engineering Geology for Society and Territory—Volume 5: Urban Geology, Sustainable Planning and Landscape Exploitation; Springer: Cham, Switzerland, 2015; pp. 411–414. [Google Scholar] [CrossRef]
Perrin, J.; Cartannaz, C.; Noury, G.; Vanoudheusden, E. A Multicriteria Approach to Karst Subsidence Hazard Mapping Supported by Weights-of-Evidence Analysis. Eng. Geol. 2015, 197, 296–305. [Google Scholar] [CrossRef]
Wang, G.; Hao, J.; Wen, H.; Cao, C. A Random Forest Model of Karst Ground Collapse Susceptibility Based on Factor and Parameter Coupling Optimization. Geocarto Int. 2022, 37, 15548–15567. [Google Scholar] [CrossRef]
Xie, Y.; Zang, B.; Liu, Y.; Liu, B.; Zhang, C.; Lin, Y. Evaluation of the Karst Collapse Susceptibility of Subgrade Based on the AHP Method of ArcGIS and Prevention Measures. Water 2022, 14, 1432. [Google Scholar] [CrossRef]
Galve, J.P.; Gutiérrez, F.; Remondo, J.; Bonachea, J.; Lucha, P.; Cendrero, A. Evaluating and Comparing Methods of Sinkhole Susceptibility Mapping in the Ebro Valley Evaporite Karst (NE Spain). Geomorphology 2009, 111, 160–172. [Google Scholar] [CrossRef]
Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine Learning Methods for Landslide Susceptibility Studies: A Comparative Overview of Algorithm Performance. Earth-Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
Kumar, M.; Singh, P.; Singh, P. Machine Learning and GIS-RS-Based Algorithms for Mapping the Groundwater Potentiality in the Bundelkhand Region, India. Ecol. Inform. 2023, 74, 101980. [Google Scholar] [CrossRef]
Bilgilioğlu, S.S.; Bilgilioğlu, H. Aksaray Ili Obruk Duyarlılık Haritasının Coğrafi Bilgi Sistemleri (CBS) ve Analitik Hiyerarşi Süreci (AHS) Yöntemleri Ile Oluşturulması. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilim. Derg. 2023, 12, 612–625. [Google Scholar] [CrossRef]
Wu, Y.; Jiang, X.; Guan, Z.; Luo, W.; Wang, Y. AHP-Based Evaluation of the Karst Collapse Susceptibility in Tailai Basin, Shandong Province, China. Environ. Earth Sci. 2018, 77, 436. [Google Scholar] [CrossRef]
Taheri, K.; Gutiérrez, F.; Mohseni, H.; Raeisi, E.; Taheri, M. Sinkhole Susceptibility Mapping Using the Analytical Hierarchy Process (AHP) and Magnitude-Frequency Relationships: A Case Study in Hamadan Province, Iran. Geomorphology 2015, 234, 64–79. [Google Scholar] [CrossRef]
Maleki, M.; Salman, M.; Sahebi Vayghan, S.; Szabo, S. GIS-Based Sinkhole Susceptibility Mapping Using the Best Worst Method. Spat. Inf. Res. 2023, 31, 537–545. [Google Scholar] [CrossRef]
Taheri, K.; Shahabi, H.; Chapi, K.; Shirzadi, A.; Gutiérrez, F.; Khosravi, K. Sinkhole Susceptibility Mapping: A Comparison between Bayes-Based Machine Learning Algorithms. Land Degrad. Dev. 2019, 30, 730–745. [Google Scholar] [CrossRef]
Elmahdy, S.I.; Mohamed, M.M.; Ali, T.A.; Abdalla, J.E.D.; Abouleish, M. Land Subsidence and Sinkholes Susceptibility Mapping and Analysis Using Random Forest and Frequency Ratio Models in Al Ain, UAE. Geocarto Int. 2022, 37, 315–331. [Google Scholar] [CrossRef]
Yilmaz, I. GIS Based Susceptibility Mapping of Karst Depression in Gypsum: A Case Study from Sivas Basin (Turkey). Eng. Geol. 2007, 90, 89–103. [Google Scholar] [CrossRef]
Kim, Y.J.; Nam, B.H.; Jung, Y.H.; Liu, X.; Choi, S.; Kim, D.; Kim, S. Probabilistic Spatial Susceptibility Modeling of Carbonate Karst Sinkhole. Eng. Geol. 2022, 306, 106728. [Google Scholar] [CrossRef]
Kim, K.; Kim, J.; Kwak, T.Y.; Chung, C.K. Logistic Regression Model for Sinkhole Susceptibility Due to Damaged Sewer Pipes. Nat. Hazards 2018, 93, 765–785. [Google Scholar] [CrossRef]
Zhou, G.; Yan, H.; Chen, K.; Zhang, R. Spatial Analysis for Susceptibility of Second-Time Karst Sinkholes: A Case Study of Jili Village in Guangxi, China. Comput. Geosci. 2016, 89, 144–160. [Google Scholar] [CrossRef]
Mohammady, M.; Pourghasemi, H.R.; Amiri, M.; Tiefenbacher, J.P. Spatial Modeling of Susceptibility to Subsidence Using Machine Learning Techniques. Stoch. Environ. Res. Risk Assess. 2021, 35, 1689–1700. [Google Scholar] [CrossRef]
Ozdemir, A. Investigation of Sinkholes Spatial Distribution Using the Weights of Evidence Method and GIS in the Vicinity of Karapinar (Konya, Turkey). Geomorphology 2015, 245, 40–50. [Google Scholar] [CrossRef]
Bausilio, G.; Annibali Corona, M.; Di Martire, D.; Guerriero, L.; Tufano, R.; Calcaterra, D.; Di Napoli, M.; Francioni, M. Comparison of Two Machine Learning Algorithms for Anthropogenic Sinkhole Susceptibility Assessment in the City of Naples (Italy). In Geotechnical Engineering for the Preservation of Monuments and Historic Sites III—Proceedings of the 3rd International Issmge TC301 Symposium, Naples, Italy, 22–24 June 2022; CRC Press: Boca Raton, FL, USA, 2022; pp. 1112–1123. [Google Scholar] [CrossRef]
Bianchini, S.; Confuorto, P.; Intrieri, E.; Sbarra, P.; Di Martire, D.; Calcaterra, D.; Fanti, R. Machine Learning for Sinkhole Risk Mapping in Guidonia-Bagni Di Tivoli Plain (Rome), Italy. Geocarto Int. 2022, 37, 16687–16715. [Google Scholar] [CrossRef]
Sarı, F.; Yalçın, M. Maximum Entropy Model-Based Spatial Sinkhole Occurrence Prediction in Karapınar, Turkey. Kuwait J. Sci. 2023, 50, 1–15. [Google Scholar] [CrossRef]
Muili, O.; Babaie, H.A. Sinkhole Susceptibility Analysis Using Machine Learning for West Central Florida. Master’s Thesis, Georgia State University, Atlanta, GA, USA, 2024. [Google Scholar] [CrossRef]
Nefeslioglu, H.A.; Tavus, B.; Er, M.; Ertugrul, G.; Ozdemir, A.; Kaya, A.; Kocaman, S. Integration of an InSAR and ANN for Sinkhole Susceptibility Mapping: A Case Study from Kirikkale-Delice (Turkey). ISPRS Int. J. Geo-Inf. 2021, 10, 119. [Google Scholar] [CrossRef]
Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable Machine Learning for Scientific Insights and Discoveries. IEEE Access 2020, 8, 42200–42216. [Google Scholar] [CrossRef]
Rai, A. Explainable AI: From Black Box to Glass Box. J. Acad. Mark. Sci. 2020, 48, 137–141. [Google Scholar] [CrossRef]
Abdollahi, A.; Pradhan, B. Explainable Artificial Intelligence (XAI) for Interpreting the Contributing Factors Feed into the Wildfire Susceptibility Prediction Model. Sci. Total Environ. 2023, 879, 163004. [Google Scholar] [CrossRef]
Ghaffarian, S.; Taghikhah, F.R.; Maier, H.R. Explainable Artificial Intelligence in Disaster Risk Management: Achievements and Prospective Futures. Int. J. Disaster Risk Reduct. 2023, 98, 104123. [Google Scholar] [CrossRef]
Roussel, C.; Böhm, K. Geospatial XAI: A Review. ISPRS Int. J. Geo-Inf. 2023, 12, 355. [Google Scholar] [CrossRef]
Yao, J.; Yao, X.; Zhao, Z.; Liu, X. Performance Comparison of Landslide Susceptibility Mapping under Multiple Machine-Learning Based Models Considering InSAR Deformation: A Case Study of the Upper Jinsha River. Geomat. Nat. Hazards Risk 2023, 14, 2212833. [Google Scholar] [CrossRef]
Aydin, H.E.; Iban, M.C. Predicting and Analyzing Flood Susceptibility Using Boosting-Based Ensemble Machine Learning Algorithms with SHapley Additive ExPlanations. Nat. Hazards 2023, 116, 2957–2991. [Google Scholar] [CrossRef]
Iban, M.C.; Sekertekin, A. Machine Learning Based Wildfire Susceptibility Mapping Using Remotely Sensed Fire Data and GIS: A Case Study of Adana and Mersin Provinces, Turkey. Ecol. Inform. 2022, 69, 101647. [Google Scholar] [CrossRef]
Sharma, N.; Saharia, M.; Ramana, G.V. High Resolution Landslide Susceptibility Mapping Using Ensemble Machine Learning and Geospatial Big Data. Catena 2024, 235, 107653. [Google Scholar] [CrossRef]
Dahal, A.; Lombardo, L. Explainable Artificial Intelligence in Geoscience: A Glimpse into the Future of Landslide Susceptibility Modeling. Comput. Geosci. 2023, 176, 105364. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, X.; Grekousis, G.; Huang, Y.; Hua, F.; Pan, Z.; Liu, Y. Examining the Importance of Built and Natural Environment Factors in Predicting Self-Rated Health in Older Adults: An Extreme Gradient Boosting (XGBoost) Approach. J. Clean. Prod. 2023, 413, 137432. [Google Scholar] [CrossRef]
Doǧan, U.; Yilmaz, M. Natural and Induced Sinkholes of the Obruk Plateau and Karapınar-Hotamış Plain, Turkey. J. Asian Earth Sci. 2011, 40, 496–508. [Google Scholar] [CrossRef]
Sarı, F.; Kahveci, M.; Somay-Altas, M.; Tuşat, E. Evaluating Sinkhole Formation with Multicriteria Decision Analysis: A Case Study in Karapınar-Konya, Turkey. Arab. J. Geosci. 2021, 14, 281. [Google Scholar] [CrossRef]
Günay, G.; Çörekçioǧlu, I.; Övül, G. Geologic and Hydrogeologic Factors Affecting Sinkhole (Obruk) Development in Central Turkey. Carbonates Evaporites 2011, 26, 3–9. [Google Scholar] [CrossRef]
Ozdemir, A. Sinkhole Susceptibility Mapping Using a Frequency Ratio Method and GIS Technology Near Karapınar, Konya-Turkey. Procedia Earth Planet. Sci. 2015, 15, 502–506. [Google Scholar] [CrossRef]
Ozdemir, A. Sinkhole Susceptibility Mapping Using Logistic Regression in Karapınar (Konya, Turkey). Bull. Eng. Geol. Environ. 2016, 75, 681–707. [Google Scholar] [CrossRef]
Eren, Y.; Parlar, Ş.; Coşkuner, B.; Arslan, Ş. Geological and Morphological Features of the Karapınar Sinkholes (Konya, Central Anatolia, Türkiye). J. Earth Sci. 2024, 35, 1654–1668. [Google Scholar] [CrossRef]
Arık, F.; Bilgilioğlu, S.S.; İban, M.C.; Delikan, A.; Göçmez, G.; Döğen, A.; Kansun, G.; Gezgin, C.; Bilgilioğlu, H.; Dülger, A. Obruk Teknik Kılavuz; Tosun, Y., Çoruk, F., Arslan, Ş., Akkaya, Y., Gökkaya, E., Gökkaya, E., Eds.; Paradigma Akademi Yayınları: Konya, Türkiye, 2023; ISBN 978-625-6822-12-2. [Google Scholar]
Törk, K.; Güner, İ.N.; Erduran, B.; Yılmaz, N.P.; Sülükçü, S.; Yeleser, L.; Ateş, Ş.; Mutlu, G.; Sertel, N.; Keleş, S.; et al. Konya Ovası Projesi (KOP) Bölgesinde (Konya, Karaman, Aksaray, Niğde) Karstik Çöküntü Alanlarının Belirlenmesi ve Tehlike Değerlendirmesi Projesi (Report No. 263); Türkiye General Directorate of Mineral Research and Exploration: Ankara, Türkiye, 2019. [Google Scholar]
Calligaris, C.; Devoto, S.; Galve, J.P.; Zini, L.; Pérez-Peña, J.V. Integration of Multi-Criteria and Nearest Neighbour Analysis with Kernel Density Functions for Improving Sinkhole Susceptibility Models: The Case Study of Enemonzo (NE Italy). Int. J. Speleol. 2017, 46, 191–204. [Google Scholar] [CrossRef]
Partington, G. Developing Models Using GIS to Assess Geological and Economic Risk: An Example from VMS Copper Gold Mineral Exploration in Oman. Ore Geol. Rev. 2010, 38, 197–207. [Google Scholar] [CrossRef]
Subedi, P.; Subedi, K.; Thapa, B.; Subedi, P. Sinkhole Susceptibility Mapping in Marion County, Florida: Evaluation and Comparison between Analytical Hierarchy Process and Logistic Regression Based Approaches. Sci. Rep. 2019, 9, 7140. [Google Scholar] [CrossRef] [PubMed]
Milanovic, P. Water Resources Engineering in Karst; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar] [CrossRef]
Margiotta, S.; Negri, S.; Parise, M.; Valloni, R. Mapping the Susceptibility to Sinkholes in Coastal Areas, Based on Stratigraphy, Geomorphology and Geophysics. Nat. Hazards 2012, 62, 657–676. [Google Scholar] [CrossRef]
Kim, Y.J.; Nam, B.H.; Youn, H. Sinkhole Detection and Characterization Using LiDAR-Derived DEM with Logistic Regression. Remote Sens. 2019, 11, 1592. [Google Scholar] [CrossRef]
Zevenbergen, L.W.; Thorne, C.R. Quantitative Analysis of Land Surface Topography. Earth Surf. Process. Landf. 1987, 12, 47–56. [Google Scholar] [CrossRef]
Wilson, J.P.; Gallant, J.C. Digital Elevation Data Sources and Structures. In Terrain Analysis; John Wiley and Sons: New York, NY, USA, 2000; pp. 3–5. [Google Scholar]
Zuo, Y.; Jiang, S.; Wu, S.; Xu, W.; Zhang, J.; Feng, R.; Yang, M.; Zhou, Y.; Santosh, M. Terrestrial Heat Flow and Lithospheric Thermal Structure in the Chagan Depression of the Yingen-Ejinaqi Basin, North Central China. Basin Res. 2020, 32, 1328–1346. [Google Scholar] [CrossRef]
Jiang, S.; Zuo, Y.; Yang, M.; Feng, R. Reconstruction of the Cenozoic Tectono-Thermal History of the Dongpu Depression, Bohai Bay Basin, China: Constraints from Apatite Fission Track and Vitrinite Reflectance Data. J. Pet. Sci. Eng. 2021, 205, 108809. [Google Scholar] [CrossRef]
Yu, H.; Arabameri, A.; Costache, R.; Crăciun, A.; Arora, A. Land Subsidence Susceptibility Assessment Using Advanced Artificial Intelligence Models. Geocarto Int. 2022, 37, 18067–18093. [Google Scholar] [CrossRef]
Santo, A.; Del Prete, S.; Di Crescenzo, G.; Rotella, M. Karst Processes and Slope Instability: Some Investigations in the Carbonate Apennine of Campania (Southern Italy). Geol. Soc. Spec. Publ. 2007, 279, 59–72. [Google Scholar] [CrossRef]
Lumongsod, R.M.G.; Ramos, N.T.; Dimalanta, C.B. Mapping the Karstification Potential of Central Cebu, Philippines Using GIS. Environ. Earth Sci. 2022, 81, 449. [Google Scholar] [CrossRef]
Green, J.; Marken, W.; Alexander, C.; Alexander, S. Karst Unit Mapping Using Geographic Information System Technology, Mower County, Minnesota, USA. Environ. Geol. 2002, 42, 457–461. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Lundberg, S.M.; Allen, P.G.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Belsley, D.A.; Kuh, E.; Welsch, R.E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity; Wiley: New York, NY, USA, 1980; Volume 32. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar] [CrossRef][Green Version]
Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance Evaluation of the GIS-Based Data Mining Techniques of Best-First Decision Tree, Random Forest, and Naïve Bayes Tree for Landslide Susceptibility Modeling. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Zhu, A.X. Modeling Landslide Susceptibility Using LogitBoost Alternating Decision Trees and Forest by Penalizing Attributes with the Bagging Ensemble. Sci. Total Environ. 2020, 718, 137231. [Google Scholar] [CrossRef]
Iban, M.C.; Bilgilioglu, S.S. Snow Avalanche Susceptibility Mapping Using Novel Tree-Based Machine Learning Algorithms (XGBoost, NGBoost, and LightGBM) with EXplainable Artificial Intelligence (XAI) Approach. Stoch. Environ. Res. Risk Assess. 2023, 37, 2243–2270. [Google Scholar] [CrossRef]
Gutiérrez, F. Sinkhole Hazards. In Oxford Research Encyclopedia of Natural Hazard Science; Oxford University Press: Oxford, UK, 2016. [Google Scholar] [CrossRef]
Rane, P.R.; Vincent, S. Landslide Susceptibility Mapping Using Machine Learning Algorithms for Nainital, India. Eng. Sci. 2022, 17, 142–155. [Google Scholar] [CrossRef]
Doctor, K.Z.; Doctor, D.H.; Kronenfeld, B.; Wong, D.W.S.; Brezinski, D.K. Predicting Sinkhole Susceptibility in Frederick Valley, Maryland Using Geographically Weighted Regression; American Society of Civil Engineers: Reston, VA, USA, 2008; pp. 243–256. [Google Scholar] [CrossRef]
Khosravi, K.; Rezaie, F.; Cooper, J.R.; Kalantari, Z.; Abolfathi, S.; Hatamiafkoueieh, J. Soil Water Erosion Susceptibility Assessment Using Deep Learning Algorithms. J. Hydrol. 2023, 618, 129229. [Google Scholar] [CrossRef]
Tran, T.T.K.; Janizadeh, S.; Bateni, S.M.; Jun, C.; Kim, D.; Trauernicht, C.; Rezaie, F.; Giambelluca, T.W.; Panahi, M. Improving the Prediction of Wildfire Susceptibility on Hawaiʻi Island, Hawaiʻi, Using Explainable Hybrid Machine Learning Models. J. Environ. Manag. 2024, 351, 119724. [Google Scholar] [CrossRef]
Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Seo, M.B.; Choi, S.M. Application of Genetic Algorithm in Optimization Parallel Ensemble-Based Machine Learning Algorithms to Flood Susceptibility Mapping Using Radar Satellite Imagery. Sci. Total Environ. 2023, 873, 162285. [Google Scholar] [CrossRef] [PubMed]
Lyu, H.M.; Yin, Z.Y. Flood Susceptibility Prediction Using Tree-Based Machine Learning Models in the GBA. Sustain. Cities Soc. 2023, 97, 104744. [Google Scholar] [CrossRef]
Wang, M.; Li, Y.; Yuan, H.; Zhou, S.; Wang, Y.; Adnan Ikram, R.M.; Li, J. An XGBoost-SHAP Approach to Quantifying Morphological Impact on Urban Flooding Susceptibility. Ecol. Indic. 2023, 156, 111137. [Google Scholar] [CrossRef]
Dietterich, T.G. Ensemble Methods in Machine Learning. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2000; LNCS; Volume 1857, pp. 1–15. [Google Scholar] [CrossRef]
Altman, N.; Krzywinski, M. Points of Significance: Ensemble Methods: Bagging and Random Forests. Nat. Methods 2017, 14, 933–934. [Google Scholar] [CrossRef]
Lin, N.; Zhang, D.; Feng, S.; Ding, K.; Tan, L.; Wang, B.; Chen, T.; Li, W.; Dai, X.; Pan, J.; et al. Rapid Landslide Extraction from High-Resolution Remote Sensing Images Using SHAP-OPT-XGBoost. Remote Sens. 2023, 15, 3901. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Zhang, J.; Ma, X.; Zhang, J.; Sun, D.; Zhou, X.; Mi, C.; Wen, H. Insights into Geospatial Heterogeneity of Landslide Susceptibility Based on the SHAP-XGBoost Model. J. Environ. Manag. 2023, 332, 117357. [Google Scholar] [CrossRef]
Teke, A.; Kavzoglu, T. Exploring the Decision-Making Process of Ensemble Learning Algorithms in Landslide Susceptibility Mapping: Insights from Local and Global Explainable AI Analyses. Adv. Space Res. 2024, 74, 3765–3785. [Google Scholar] [CrossRef]
Pradhan, B.; Lee, S.; Dikshit, A.; Kim, H. Spatial Flood Susceptibility Mapping Using an Explainable Artificial Intelligence (XAI) Model. Geosci. Front. 2023, 14, 101625. [Google Scholar] [CrossRef]
Jas, K.; Dodagoudar, G.R. Explainable Machine Learning Model for Liquefaction Potential Assessment of Soils Using XGBoost-SHAP. Soil Dyn. Earthq. Eng. 2023, 165, 107662. [Google Scholar] [CrossRef]
Jena, R.; Pradhan, B.; Gite, S.; Alamri, A.; Park, H.J. A New Method to Promptly Evaluate Spatial Earthquake Probability Mapping Using an Explainable Artificial Intelligence (XAI) Model. Gondwana Res. 2023, 123, 54–67. [Google Scholar] [CrossRef]
Iban, M.C.; Aksu, O. SHAP-Driven Explainable Artificial Intelligence Framework for Wildfire Susceptibility Mapping Using MODIS Active Fire Pixels: An In-Depth Interpretation of Contributing Factors in Izmir, Türkiye. Remote Sens. 2024, 16, 2842. [Google Scholar] [CrossRef]
Liu, Y.; Chen, X.; Yang, J.; Li, L.; Wang, T. Snow Avalanche Susceptibility Mapping from Tree-Based Machine Learning Approaches in Ungauged or Poorly-Gauged Regions. Catena 2023, 224, 106997. [Google Scholar] [CrossRef]
Gholami, H.; Mohammadifar, A.; Fitzsimmons, K.E.; Li, Y.; Kaskaoutis, D.G. Modeling Land Susceptibility to Wind Erosion Hazards Using LASSO Regression and Graph Convolutional Networks. Front. Environ. Sci. 2023, 11, 1187658. [Google Scholar] [CrossRef]
Shapley, L.S. Stochastic Games*. Proc. Natl. Acad. Sci. USA 1953, 39, 1095–1100. [Google Scholar] [CrossRef]
Ullah, I.; Liu, K.; Yamamoto, T.; Zahid, M.; Jamal, A. Modeling of Machine Learning with SHAP Approach for Electric Vehicle Charging Station Choice Behavior Prediction. Travel Behav. Soc. 2023, 31, 78–92. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
Jenks, G. The Data Model Concept in Statistical Mapping. Int. Yearb.Cartogr. 1967, 7, 186–190. [Google Scholar]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Dormann, C.F.; Elith, J.; Bacher, S.; Buchmann, C.; Carl, G.; Carré, G.; Marquéz, J.R.G.; Gruber, B.; Lafourcade, B.; Leitão, P.J.; et al. Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance. Ecography 2013, 36, 27–46. [Google Scholar] [CrossRef]
Ford, D.; Williams, P. Karst Hydrogeology and Geomorphology; John Wiley & Sons: Hoboken, NJ, USA, 2013; pp. 1–562. [Google Scholar] [CrossRef]
Klimchouk, A. Hypogene Speleogenesis: Hydrogeological and Morphogenetic Perspective. In KIP Monographs; National Cave and Karst Research Institute: Carlsbad, NM, USA, 2007. [Google Scholar]
White, W.B. Karst Hydrology: Recent Developments and Open Questions. Eng. Geol. 2002, 65, 85–105. [Google Scholar] [CrossRef]
Parise, M.; Gunn, J. Natural and Anthropogenic Hazards in Karst Areas: An Introduction. Geol. Soc. Spec. Publ. 2007, 279, 1–3. [Google Scholar] [CrossRef]
Palmer, A.N. Origin and Morphology of Limestone Caves. Geol. Soc. Am. Bull. 1991, 103, 1–21. [Google Scholar] [CrossRef]
Worthington, S.R.H. A comprehensive strategy for understanding flow in carbonate aquifers. In Karst Modeling; Palmer, A.N., Palmer, M.V., Sasowsky, I.D., Eds.; Karst Water Initiative: Charlottesville, VA, USA, 1999; pp. 30–37. [Google Scholar]
Galloway, D.L.; Jones, D.R.; Ingebritsen, S.E. Land Subsidence in the United States; U.S. Geological Survey: Reston, VA, USA, 1999. [Google Scholar] [CrossRef]
Foster, S.S.D.; Chilton, P.J. Groundwater: The Processes and Global Significance of Aquifer Degradation. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 2003, 358, 1957–1972. [Google Scholar] [CrossRef] [PubMed]
Ghasemizadeh, R.; Hellweger, F.; Butscher, C.; Padilla, I.; Vesper, D.; Field, M.; Alshawabkeh, A. Review: Groundwater Flow and Transport Modeling of Karst Aquifers, with Particular Reference to the North Coast Limestone Aquifer System of Puerto Rico. Hydrogeol. J. 2012, 20, 1441–1461. [Google Scholar] [CrossRef]
Fiorillo, F.; Doglioni, A. The Relation between Karst Spring Discharge and Rainfall by Cross-Correlation Analysis (Campania, Southern Italy). Hydrogeol. J. 2010, 18, 1881–1895. [Google Scholar] [CrossRef]
Çankal, G.; Alkın, R.C. İklim Değişikliği, Bilinçsiz Tarım ve Afet Yönetimi: Karapınar Obruklarına Bir Bakış. Afet Ve Risk Derg. 2024, 7, 410–425. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Geological and tectonic factors (a) LTG; (b) PTF.

Figure 3. Hydrogeological factors: (a) GWL; (b) GWD; (c) DPT; (d) DSD; (e) DSU; (f) DMG; (g) DCA; (h) DCH; (i) DBC; (j) DTI; (k); DDC; (l) DDO; (m) DCT; (n) DPH.

Figure 4. Topographical factors: (a) ELV; (b) SLP; (c) ASP; (d) CRV; (e) PLC; (f) PRC; (g) PDR; (h) SPI; (i) TWI.

Figure 5. Meteorological factors: (a) MAP; (b) MAT; (c) MAW.

Figure 6. Environmental and anthropological factors: (a) WDS; (b) LDU; (c) PST; (d) PRD; (e) NDVI; (f) SDP.

Figure 7. The performance evaluation of trained ML models for SSM: (a–c) present confusion matrices for RF, XGBoost, and LightGBM, respectively. (d) consolidates key metrics, including sensitivity, specificity, accuracy, precision, and F-1 score. (e) showcases ROC curves and Area Under the Curve (AUC) results.

Figure 8. (a) The RF classifier’s SHAP summary plot, (b) RF classifier’s global feature importance by absolute SHAP values.

Figure 9. SHAP dependence plots. The plots are arranged based on their importance ranks: (a) PTF; (b) MAP; (c) DBC; (d) DPH; (e) DDO; (f) LTG; (g) WDS; (h) DCH; (i) DDC; (j) DMG; (k); DSU; (l) DPT; (m) PST; (n) ELV; (o) PRD; (p) SLP; (q) GWD; (r) NDVI; (s) PDR; (t) SDP; (u) DTI; (v) PRC; (w) ASP.

Figure 10. Sinkhole susceptibility map.

Figure 11. Areal distribution of susceptibility classes.

Table 1. Conditioning factors.

Factors		Abbreviation	Data Source	Explanation
Topographic	Elevation	ELV	General Directorate of Mapping	It is produced from contour lines obtained as vectors from standard topographic maps.
	Slope	SLP
	Aspect	ASP
	Curvature	CRV
	Plan curvature	PLC
	Profile curvature	PRC
	Proximity to drainage	PDR
	Stream power index	SPI
	Topographic wetness index	TWI
Meteorological	Monthly average precipitation	MAP	Observations of 32 meteorological stations obtained from the Turkish General Directorate of Meteorology
	Monthly average temperature	MAT
	Monthly average water vapor pressure	MAW
Environmental and anthropological	Well density	WDS	General Directorate of State Hydraulic Works (DSI)	It was produced by performing density analysis (14,317 water wells).
	Land use	LDU	Corine Dataset	It is produced from vector data.
	Proximity to settlements	PST	Environmental Plan obtained from Turkey Ministry of Environment and Urbanization	It is produced from vector data.
	Proximity to roads	PRD	Environmental Plan obtained from Turkey Ministry of Environment and Urbanization	It is produced from vector data.
	NDVI	NDVI	Landsat satellite image	It is produced from Landsat satellite image.
	Soil depth	SDP	The Ministry of Agriculture and Forestry	It is produced from vector data.
Geological Tectonics	Lithology	LTG	General Directorate of Mineral Research and Exploration and field studies	It is produced from vector data.
Geological Tectonics	Proximity to faults	PTF		It is produced from vector data.
Hydrogeological	Groundwater level	GWL	Laboratory studies	It is produced from geochemical analyses of 519 water well samples
	Groundwater decline (dry and wet spell)	GWD
	Differences in Potassium (dry and wet spell)	DPT
	Differences in Sodium (dry and wet spell)	DSD
	Differences in Sulfate (dry and wet spell)	DSU
	Differences in Magnesium (dry and wet spell)	DMG
	Differences in Calcium (dry and wet spell)	DCA
	Differences in Chloride (dry and wet spell)	DCH
	Differences in Bicarbonate (dry and wet spell)	DBC
	Differences in Total ion (dry and wet spell)	DTI
	Differences in Dissolved CO₂ (dry and wet spell)	DDC
	Differences in Dissolved O₂ (dry and wet spell)	DDO
	Differences in Conductivity (dry and wet spell)	DCT
	Differences in PH (dry and wet spell)	DPH

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bilgilioğlu, S.S.; Gezgin, C.; Iban, M.C.; Bilgilioğlu, H.; Gündüz, H.I.; Arslan, Ş. Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye. Appl. Sci. 2025, 15, 3139. https://doi.org/10.3390/app15063139

AMA Style

Bilgilioğlu SS, Gezgin C, Iban MC, Bilgilioğlu H, Gündüz HI, Arslan Ş. Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye. Applied Sciences. 2025; 15(6):3139. https://doi.org/10.3390/app15063139

Chicago/Turabian Style

Bilgilioğlu, Süleyman Sefa, Cemil Gezgin, Muzaffer Can Iban, Hacer Bilgilioğlu, Halil Ibrahim Gündüz, and Şükrü Arslan. 2025. "Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye" Applied Sciences 15, no. 6: 3139. https://doi.org/10.3390/app15063139

APA Style

Bilgilioğlu, S. S., Gezgin, C., Iban, M. C., Bilgilioğlu, H., Gündüz, H. I., & Arslan, Ş. (2025). Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye. Applied Sciences, 15(6), 3139. https://doi.org/10.3390/app15063139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Sinkhole Susceptibility Mapping Using Machine-Learning-Based SHAP: Quantifying and Comparing the Effects of Contributing Factors in Konya, Türkiye

Abstract

1. Introduction

2. Study Area and Data

2.1. Sinkhole Inventory Map

2.2. Conditioning Factors

2.2.1. Geological and Tectonic Factors

2.2.2. Hydrogeological Factors

2.2.3. Topographical Factors

2.2.4. Meteorological Factors

2.2.5. Environmental and Anthropological Factors

3. Methodology

3.1. Multicollinearity Assessment

3.2. Model Selection and Classification Scheme

3.2.1. Random Forest (RF)

3.2.2. eXtreme Gradient Boosting Machine (XGBoost)

3.2.3. Light Gradient Boosting Machine (LightGBM)

3.3. Performance Metrics

3.4. Enhancing Model Explainability Through SHAP

4. Results

4.1. Multicollinearity Results

4.2. Hyperparameters Tuning

4.3. Predictive Performance of Classifier Models

4.4. SHAP-Driven Feature Significance Analysis

4.5. SHAP Dependence Plots

4.6. Generated Sinkhole Susceptibility Map

5. Discussion

5.1. Comparative Performance of Machine Learning Models in SSM

5.2. Explainable AI in Sinkhole Susceptibility: Insights from SHAP

5.3. Factors Influencing Sinkhole Formation

5.4. Implications for Sustainable Land Management

5.5. Enhancing Risk Management Strategies

5.6. Limitations and Future Studies

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI