Utilizing Machine Learning Algorithms for the Development of Gully Erosion Susceptibility Maps: Evidence from the Chotanagpur Plateau Region, India

Hasanuzzaman, Md; Shit, Pravat Kumar; Alqadhi, Saeed; Almohamad, Hussein; Hasher, Fahdah Falah ben; Abdo, Hazem Ghassan; Mallick, Javed

doi:10.3390/su16156569

Open AccessArticle

Utilizing Machine Learning Algorithms for the Development of Gully Erosion Susceptibility Maps: Evidence from the Chotanagpur Plateau Region, India

by

Md Hasanuzzaman

^1,2

,

Pravat Kumar Shit

¹,

Saeed Alqadhi

³,

Hussein Almohamad

⁴

,

Fahdah Falah ben Hasher

⁵,

Hazem Ghassan Abdo

⁶

and

Javed Mallick

^3,*

¹

PG Department of Geography, Raja N. L. Khan Women’s College (Autonomous), Gope Palace, Midnapore 721102, India

²

Research Centre in Natural and Applied Science, Raja N. L. Khan Women’s College (Autonomous), Vidyasagar University, Midnapore 721102, India

³

Department of Civil Engineering, College of Engineering, King Khalid University, P.O. Box 394, Abha 61411, Saudi Arabia

⁴

Department of Geography, College of Languages and Human Sciences, Qassim University, Buraydah 51452, Saudi Arabia

⁵

Department of Geography and Environmental Sustainability, College of Humanities and Social Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

⁶

Geography Department, Faculty of Arts and Humanities, Tartous University, Tartous P.O. Box 2147, Syria

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(15), 6569; https://doi.org/10.3390/su16156569 (registering DOI)

Submission received: 22 May 2024 / Revised: 3 July 2024 / Accepted: 29 July 2024 / Published: 31 July 2024

(This article belongs to the Special Issue Applications of GIS and Remote Sensing in Soil Environment Monitoring 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Gully erosion is a serious environmental threat, compromising soil health, damaging agricultural lands, and destroying vital infrastructure. Pinpointing regions prone to gully erosion demands careful selection of an appropriate machine learning algorithm. This choice is crucial, as the complex interplay of various environmental factors contributing to gully formation requires a nuanced analytical approach. To develop the most accurate Gully Erosion Susceptibility Map (GESM) for India’s Raiboni River basin, researchers harnessed the power of two cutting-edge machine learning algorithm: Extreme Gradient Boosting (XGBoost) and Random Forest (RF). For a comprehensive analysis, this study integrated 24 potential control factors. We meticulously investigated a dataset of 200 samples, ensuring an even balance between non-gullied and gullied locations. To assess multicollinearity among the 24 variables, we employed two techniques: the Information Gain Ratio (IGR) test and Variance Inflation Factors (VIF). Elevation, land use, river proximity, and rainfall most influenced the basin’s GESM. Rigorous tests validated XGBoost and RF model performance. XGBoost surpassed RF (ROC 86% vs. 83.1%). Quantile classification yielded a GESM with five levels: very high to very low. Our findings reveal that roughly 12% of the basin area is severely affected by gully erosion. These findings underscore the critical need for targeted interventions in these highly susceptible areas. Furthermore, our analysis of gully characteristics unveiled a predominance of V-shaped gullies, likely in an active developmental stage, supported by an average Shape Index (SI) value of 0.26 and a mean Erosivness Index (EI) of 0.33. This research demonstrates the potential of machine learning to pinpoint areas susceptible to gully erosion. By providing these valuable insights, policymakers can make informed decisions regarding sustainable land management practices.

Keywords:

gully erosion; Shape Index; Extreme Gradient Boost; Raiboni River; Random Forest; Erosivness Index

1. Introduction

Among the myriad forms of environmental degradation, gully erosion stands out as a particularly insidious threat [1]. This water-driven process of soil destruction has emerged as a global concern, leaving its mark on an astounding 2 billion hectares or 55% of degraded land worldwide [2]. The repercussions of this phenomenon are far-reaching and severe, extending beyond mere soil loss to encompass the destruction of vital habitats, the pollution of water sources, the siltation of lakes, and an elevated risk of devastating floods [3]. Moreover, gully erosion significantly affects the productivity of agricultural, and damages infrastructure [4]. In India, the issue of gully erosion is particularly severe, contributing to extensive land degradation. This form of erosion causes soil to be washed away at an alarming rate of 16.4 tons per hectare annually, leading to a staggering annual soil loss of approximately 5 billion metric tons [5]. To effectively manage the negative consequences of gully erosion, it is vital to accurately assess its susceptibility mapping [6]. Such massive soil erosion not only diminishes the fertility and productivity of agricultural lands but also leads to siltation of water bodies, reduction in water quality, and increased vulnerability of infrastructure. The socio-economic impact is profound, affecting the livelihoods of farmers, increasing maintenance costs for public and private infrastructure, and posing safety risks to communities living in erosion-prone areas. Effectively tackling gully erosion is key to achieving sustainable soil management and securing the future of our vital resources.

Predicting gully erosion susceptibility has driven the development of various statistical models. These approaches fall into three main categories: cutting-edge machine learning (ML) techniques, sophisticated multi-criteria decision-making frameworks, and traditional multivariate and bivariate statistical models [7,8,9,10,11]. The ML algorithms have outperformed conventional methods by excelling in four key areas: uncovering complex patterns, processing vast amounts of data, revealing subtle correlations, and minimizing human bias. This revolutionary approach has significantly enhanced our ability to forecast and understand erosion processes [12]. One of the significant strengths of these algorithms is their ability to continuously improve their accuracy over time. This allows them to become adept at recognizing complicate changes and even unlimited scenarios, particularly valuable in situations where data might be limited [13]. ML models hold a particular advantage when it comes to evaluating how climate-induced changes in runoff patterns will affect gully erosion against other methods [14]. The adaptability and precision of these models make them invaluable tools for understanding and managing gully erosion, offering enhanced predictive capabilities and more effective mitigation strategies. By leveraging the strengths of machine learning, researchers and practitioners can better anticipate and respond to the dynamic factors driving gully erosion, ultimately contributing to more sustainable land management practices.

Choosing the right ML model is essential for creating an accurate GESM, as model performance can vary based on the specific environmental risks involved [15,16]. The effectiveness of a machine learning model in predicting gully erosion susceptibility hinges on its ability to accurately capture and analyze the unique characteristics of the landscape and the specific environmental factors contributing to erosion. The performance of various machine learning models in creating a GESM can be significantly influenced by several key factors, including soil type, land use patterns, topography, and climatic conditions. Therefore, a thorough evaluation and comparison of various machine learning models are essential to identify the one best suited for the specific context of gully erosion. This selection process not only enhances the precision of the GESM, but also ensures that the model can reliably predict erosion-prone areas, aiding in targeted mitigation efforts and sustainable land management practices. By integrating advanced machine learning techniques with comprehensive environmental data, researchers can create robust susceptibility maps that significantly improve our understanding and management of gully erosion risks. For this study, we utilized two widely accepted and robust machine learning techniques: the XGBoost algorithm and the RF method. RF is chosen for its widespread and versatility application in evaluating diverse environmental risks, including groundwater [15,16], landslide [17] and flood susceptibility [18,19]. Conversely, we opted for the XGBoost model due to its proven effectiveness in addressing slope-related geo-environmental hazards, particularly in landslide susceptibility studies [20,21]. This research uniquely monitors the performance of two established ML algorithms. This approach has not been widely explored in this specific region.

The RF model stands out as a robust machine learning algorithm for GESM, capacity to manage extensive input variables, offering numerous benefits such as high accuracy, capturing non-linear correlations, adept handling of missing data, identifying crucial variables, and preventing overfitting [22]. In contrast, the XGBoost model shines due to its exceptional scalability. It can operate at speeds exceeding ten times those of conventional central processors, making it an invaluable asset for situations with limited memory resources [23]. It effectively manages sparse data, large datasets, and instance weights, providing precise estimates by a weighted quantum sketch technique and employing advanced tree-learning algorithms [24]. Specifically tailored for crafting precise GESM, the XGBoost model reduces initial model variance and bias, thereby elevating weak learners to robust ones [24]. In this study, we have placed a strong emphasis on ensuring the realism of the GESM. We conducted field surveys in two distinct periods. The first filed survey was carried out before running the model to create a gully inventory map, while the second field survey objected to verify the model’s output. During the second survey, we measured the geometric parameters of ephemeral gullies, such as the width-depth ratio, shape index, and erosivity index. This approach allowed us to understand the potential level of activity in the gully regions identified through the GESM. Typically, such studies focus solely on generating susceptibility maps. However, we have made a novel attempt to determine the dynamics of the gully regions, adding a new dimension to this type of work. This method fills a significant research void in these types of studies.

The Raiboni basin in West Bengal, India, was selected because it faces a significant issue with soil erosion, primarily caused by gully erosion [5]. Developing effective mitigation and prevention strategies requires a thorough GESM and monitoring of erosion impact severity. A comprehensive review of the existing literature revealed significant gaps in GESM for this topic, particularly in the application of machine learning techniques. Previous studies have largely overlooked machine learning approaches, failed to focus on gully-dominant areas, and neglected the analysis of gully geometrical parameters. This paucity of research highlights the novelty and importance of our study in addressing these critical aspects of gully erosion assessment. Our comprehensive literature review indicates that this small basin represents an unexplored frontier in gully erosion research. To date, no previous research has analyzed any aspect of gully erosion in this specific area, making it a pristine site for scientific inquiry. This lack of prior research underscores the unique opportunity our study presents to contribute foundational knowledge about gully erosion processes in this basin. Hence, the aims of this study are as follows: (i) to produce a high-resolution Gully Susceptibility Map using RF and XGBoost models and (ii) to assess gully-dominant areas and gully geometrical parameters in the study basin. These results provide valuable spatial data for prioritizing soil conservation efforts through targeted gully erosion management strategies. This information can demonstrate the significant role in protecting the environment and ecosystems within the basin. Moreover, this research contributes to the academic understanding of gully erosion issues in this area, expanding the existing knowledge on the subject.

2. Materials and Methods

2.1. Study Area

We focused on the Raiboni basin, a right tributary of the Subarnarekha River in West Bengal, India. Our research focused on the Raiboni River basin, a significant waterway in West Bengal in eastern India. This basin forms a key tributary of the larger Subarnarekha River system, originating from its right bank. Spanning from 86°43′51.12″ E to 86°46′4.81″ E and 22°06′48.13″ N to 22°12′37.13″ N, the Raiboni basin encompasses a diverse landscape spanning 55.432 km² (Figure 1). The Raiboni basin, on the fringe of the Chotanagpur Plateau, experiences high gully erosion due to its hilly terrain and diverse landforms [25]. The Raiboni River basin’s geological history is shaped by significant depositional events. During the Tertiary period, laterite formations emerged. Subsequently, the Quaternary period witnessed extensive alluvium accumulation. This process resulted in the development of vast alluvial plains, particularly in the lower floodplain, where deposits can reach depths of around 5 m. These deposits likely originated during the Pleistocene to Holocene epochs [26]. Geological surveys indicate that lateritic soil dominates 85% of the basin’s landscape. Two distinct types of laterites are present in this region: plain laterite along the rivers and upland laterite in more distant areas. The basin’s soil composition varies: 13% is yellowish-brown fine sediment (sand, silt, clay) and 2% is calcareous. The subtropical monsoon climate features distinct dry and wet seasons impacting rainfall and temperature. The year can be divided into four main seasons: (a) Winter (December–February): characterized by low temperatures, low humidity, and scarce rainfall. (b) Pre-monsoon (March–May): sees minimal rain with high temperatures and evaporation rates. (c) Monsoon (June–September): brings the majority of the annual rainfall (around 82%) alongside high temperatures. (d) Post-monsoon (October–November): features a steady decline in both rainfall and temperature. The basin receives a mean of 1500 mm of rainfall annually. An analysis of rainfall data from 1980 to 2023 using a linear regression model (y = −2.137x + 5704) revealed no statistically significant changes in rainfall patterns. This is further supported by the low coefficient of determination (R² = 0.005), which indicates a negligible correlation between time and rainfall. This aligns with the region’s classification as a classic tropical monsoon climate, further reflected in its high average annual temperature of 26 °C [25]. Shallow soil, steep slopes, and sparse vegetation contribute to the region’s high surface runoff coefficient, ranging from 0.4 to 0.6 [27]. This, coupled with the area’s unique geomorphology, surface water runoff and accelerates soil erosion during rainy season [27]. Unsustainable agricultural practices, deforestation, and urban expansion all worsen gully erosion in the region [25]. The prevalence of gully erosion poses substantial threats to soil fertility, environmental stability, water quality, and agriculture [28]. Soil erosion not only reduces land productivity, but also increases downstream sedimentation, harming aquatic ecosystems [29]. Management strategies prioritize soil conservation, sustainable land use, and community awareness to lessen gully erosion, maintain ecological balance, and ensure the region’s long-term health [30].

2.2. Methodology

Figure 2 presents a workflow diagram that encapsulates the research methodology employed in this study. This visual representation provides a clear and concise overview of the entire research approach.

2.2.1. Inventory of Gully Erosion Locations

The first and pivotal step in developing GESMs involves creating a gully inventory map [4]. To create this map for our study, we leveraged high-resolution DEM data with a 12.5 m resolution and utilized Google Earth Pro software (Version 7.3.2.5495) to precisely delineate and map existing gullies. A field survey in November–January 2024 verified the accuracy of the gully inventory map. This ground verification involved using high-precision GPS devices, specifically the Garmin GPS etrex10 (Shreeji Instruments, Ahmedabad, India), to validate the location and extent of the mapped gullies. We randomly selected 100 gullies in the research region. Our field surveys revealed that the gullies exhibited a range of sizes. Average depths varied from 2.14 m to 9.13 m, and average lengths spanned from 0.041 km to 0.512 km. To prepare this data for the models, we meticulously digitized these gullies as polygons. We also converted them into point features for further analysis and incorporation into the ML algorithms. To create a robust training dataset, we also randomly selected 100 non-gully locations that met the model’s requirements. This approach confirms the methods is trained on a balanced representation of both gully-prone and non-gully areas [6]. Therefore, the research employed 200 random sample points, comprising 100 gully points and 100 non-gully points. We then assigned binary classifications to the data points. Gully locations were designated as ‘1’, while non-gully locations were labeled ‘0’. Following this, the dataset was strategically divided into testing and training subsets [31] (Figure 1). To ensure the model’s generalizability, we employed a common practice in machine learning by extracting the data into testing sets and training. For this purpose, we allocated 30% (60 points) of the total dataset to the testing set. The remaining 70% (140 points) were used to train the model. The study was subsequently finalized by computing the datasets and executing the models through the utilization of ArcGIS (version 10.4) and R software (version 4.2.0).

2.2.2. Gully Erosion Conditioning Factors

Gully development is controlled by numerous natural parameters, and the precise identification of erosion-prone areas hinges on the judicious selection of the appropriate contributing factors. This study identified 24 parameters that control gully erosion based on previous research findings, extensive field surveys, and available data [6,18,19,31,32], for computing the GESMs (Table 1). The factors selected for the present work are as follows: geology, LULC, elevation, geomorphology, slope characteristics, soil properties, proximity to roads and water features, drainage patterns, vegetation indices, rainfall, and TWI. The related parameters are depicted in Figure 3 and tabulated in Table 1. A comprehensive discussion of these factors was provided in the Supplementary Document (SD) S1.

2.2.3. Multicollinearity Assessment

We assessed the 24 gully erosion factors for multicollinearity, a phenomenon where variables are highly correlated. This step ensures the validity of our model by mitigating potential issues arising from these correlations. This strong interdependency can make it problematic to isolate the unique effect of each factor on the outcome, potentially leading to inaccurate or misleading model results [33]. To identify and address multicollinearity, we employed two common techniques: Variance Inflation Factors (VIFs) and Information Gain Ratio (IGR). Generally, VIF values below 0.1 and above 10 suggest a potential multicollinearity issue [34]. Unlike other methods, IGR (measured by average merit) ranks factors by their influence on gully formation; a higher AM indicates a greater impact [35]. To confirm the heftiness of our gully erosion method, we employed a multi-pronged approach to diagnose and mitigate multicollinearity. This approach combined VIF, Pearson’s correlation coefficients, IGR, and tolerance criteria analysis. By utilizing this comprehensive strategy, we were confident that the chosen factors were suitable for the model and would not lead to misleading results.

2.2.4. Random Forest (RF)

RF, a powerful ensemble ML algorithm industrialized by Breiman [36], is known for its accuracy. It works by creating a multitude of uncorrelated decision trees. To fine-tune the RF model for optimal performance [37], several key parameters were adjusted. Our analysis revealed two key parameters that significantly impact the model’s performance: number of trees (‘mtree’); consistent with previous research [38], a value of 3 for ‘mtree’ yielded the most effective results in this study.

Minimum node size (‘min.node.size’): We found that a minimum of 5 data points per terminal node (‘min.node.size’ of 5) produced optimal results. Moreover, we optimized the number of factors considered for splitting at each decision point within the trees.

RF algorithms build upon the strengths of Classification and Regression Trees (CART) by incorporating a method called bootstrap aggregation, bagging for short. Bagging involves creating multiple random subsets of the original data. These subsets, called bootstrap replicates, are similar in size to the original dataset, ensuring a good representation of the overall data [39]. By creating diverse training sets through bagging (where some data points appear more and some less), Random Forest reduces bias and eliminates the need for a separate validation set [40]. We optimized the model’s performance by testing different hyperparameter combinations. We found a ‘mtree’ value of 3 and a ‘min.node.size’ of 5 yielded the best results. We also explored splitting rules and capped the number of trees at 500. This hyperparameter tuning ensured our Random Forest model was finely calibrated for superior gully erosion prediction.

2.2.5. Extreme Gradient Boosting (XGBoost)

XGBoost, a cutting-edge tree-boosting algorithm, excels in speed, handling large datasets, and achieving high accuracy [41]. Unlike other ensemble methods, it builds decision trees sequentially. Each tree corrects errors from the previous one, focusing on areas where predictions were less certain. This iterative process refines predictions, culminating in a combined output from all the trees [41].

XGBoost boasts a wider range of tunable parameters compared to other tree-based models. These adjustments help prevent overfitting, enhance consistency, and improve overall precision [33]. We employed established tuning practices suggested by Boehmke and Greenwell [37], including setting parameters like maximum training rounds, tree depth, learning rate, and various regularization techniques.

Additionally, we employed ArcGIS software (version 10.8) to determine the basin’s gully-dominant area by intersecting the final gully susceptibility maps generated by the RF and XGBoost models, specifically focusing on the areas classified as having a very high susceptibility class.

2.2.6. Method to Measure Geometric Parameters

This research involved two rounds of ground truth. The first field survey was undertaken to corroborate the accuracy of the sample points marked on the inventory map. The second survey, carried out between January and March 2024, focused on validating the final results. In this field survey, we conducted field verification on the 50 points identified on the gully-dominant map to confirm their accuracy against the actual conditions on the ground. Among them, we selected 20 gullies from the gully-dominant area patches to measure geometric parameters. The measured parameters included total area (m²), width of the one-fourth depth (WQD), width of the half depth (WHD), total width (WT), depth of the half right side (DRH), depth of the half left side (DLH), average depth (D), width–depth ratio (W/D), Erosiveness Index (EI), and Shape Index (SI). Figure 4 was utilized to measure all these geometric parameters. The Shape Index (SI) and Erosiveness Index (EI) were calculated using Equations (1) and (2), respectively [42,43].

S h a p e I n d e x (S I) = \frac{W Q D}{W T}

(1)

E r o s i v e n e s s I n d e x = \frac{A r e a}{(W T * (\max (D R, D L)))}

(2)

2.2.7. Method of Model Validation

To gauge model performance, we employed a suite of metrics evaluating its ability to distinguish gully erosion from stable areas. These included the root mean square error, mean absolute error, and Kappa index for precision assessment. We also utilized R-squared and receiver operating characteristic curves to measure the overall effectiveness. This comprehensive evaluation was conducted on both training and testing datasets, ensuring robust model validation. Within the Gully Erosion Susceptibility Map (GESM), correctly classified non-erosion areas were considered true negatives (TN), while correctly identified gullies were true positives (TP). Conversely, areas mistakenly classified as gullies were false positives (FP) and missed gullies were false negatives (FN). The total number of misclassified pixels was determined by subtracting the sum of TP, TN, FP, and FN from the total number of pixels [19,44]. This approach allowed us to quantify the misclassification rate accurately, which is crucial for assessing the performance of our gully erosion susceptibility models. The study employs Equations (3)–(6) to calculate accuracy, Kappa index, and error rates (RMSE, MAE), quantifying model performance.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(3)

K a p p a (k) = \frac{P_{c} - P_{c x p}}{1 - P_{c x p}}

(4)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{i = n} {(X_{e i} - X_{o i})}^{2}}

(5)

M A E = \frac{1}{n} \sum_{i = 1}^{i = n} |X_{e i} - X_{o i}|

(6)

where

P_{c}

refers to a number of pixels to be classified correctly as gully eroded or non-gully eroded pixels;

P_{c x p}

denotes the expected results;

X_{o i}

and

X_{e i}

are the

i^{t h}

observed and model estimation values, respectively; and

n

is the number of data points [34].

The ROC curve illustrates the trade-off between sensitivity and specificity across different decision thresholds. We quantified overall model performance using the Area Under the Curve (AUC). Our analysis employed the pROC package in R to generate ROC curves and compute AUC values. An AUC closer to 1.0 signifies superior discriminatory power, with 1.0 indicating flawless classification [45].

3. Results

3.1. Influence of the Key Factors on GESM

We investigated potential multicollinearity (redundancy) among the 24 chosen factors using two techniques: VIF and IGR. The outputs in Table 2 show no multicollinearity concerns, as all VIF values fall between 0.1 and 10. The statistical tests suggest a low degree of collinearity among these factors, meaning they are relatively independent. Table 2 also reveals the relative influence of each factor on GESM within the Raiboni River basin. The table presents the average merit (AM) for each factor, indicating its importance in the GESM. The factors exhibiting the highest average merit values, such as elevation (0.75), land use and land cover (0.71), proximity to rivers (0.70), and rainfall patterns (0.69), exert the most profound influence on an area’s susceptibility to gully erosion. Conversely, distance from lineament (DFL: 0.21) has the least influence. Following these key factors in decreasing order of influence were curvature (AM: 0.25), Topographic Wetness Index (TWI: AM: 0.34), slope length (AM: 0.37), and slope aspect (AM: 0.39). These findings offer critical information about the key drivers of gully erosion in this region, allowing for more effective management strategies.

3.2. Gully Erosion Susceptibility Mapping

GESMs are vital tools for environmental protection and sustainable development. They help us understand erosion patterns and develop strategies to mitigate their impact. Leveraging 24 factors influencing gully formation, this study developed a GESM using both RF and XGBoost models (Figure 5). ArcGIS software’s quantile classification method was leveraged to generate well-defined gully erosion risk zones.

This method allowed us to classify susceptibility levels into five zones: very low (VL), low (L), moderate (M), high (H), and very high (VH). Examining each susceptibility zone provided a detailed picture of how vulnerable different areas within the study area are to gully erosion. Figure 3 presents the GESMs generated by the RF and XGBoost models, visualizing the predicted susceptibility levels across the Raiboni River basin. The monitoring revealed some slight differences in the spatial distribution of risk zones. The XGBoost model classified a moderately larger portion of the study area into the Very High (VH) and High (H) gully erosion susceptibility categories, with 11.44% and 19.23%, respectively, compared to the RF model’s classification of 12.49% as Very High and 20.86% as High susceptibility zones. Conversely, the RF model classified a slightly larger area in the Moderate (M) class (45.13% vs. 47.94% for XGBoost). Both models agreed on the overall distribution of susceptibility zones, with Low (L) and Very High (VH) classes covering similar areas (around 10–12%). Encouragingly, both models showed good classification abilities. The data presented in Table 3 reveals a distinct pattern: areas classified as having lower susceptibility to gully erosion exhibit a correspondingly diminished density of actual gully formations. Both the RF and XGBoost models demonstrated exceptional efficacy in identifying gully-prone areas within the Raiboni River basin. These advanced techniques produced highly accurate GESMs, effectively delineating zones of varying risk levels throughout the study area. The models’ outputs showed remarkable alignment with the observed erosion patterns, validating their reliability and predictive power. These GESMs serve as invaluable tools for environmental management, enabling the precise targeting of high-risk areas for conservation efforts. By pinpointing vulnerable locations, they facilitate the development of tailored, sustainable land management strategies. This proactive approach not only aids in mitigating current erosion issues, but also contributes significantly to long-term environmental protection and the overall ecological health of the basin, paving the way for more resilient landscapes.

3.3. Analysis of the Geometric Parameters of Gullies

The detailed results of the geometric parameters for the 20 measured gullies are presented in Table 4. These parameters include the total area (m²), WQD, WHD, WT, DRH, DLH, average depth (D), width–depth ratio (W/D), EI, and Shape Index (SI). Measurements were conducted for the 1st, 2nd, and 3rd orders of the gullies, and the averaged results across the three orders are provided.

The results indicate that the gullies in this basin had a mean area of 5.61 m², a mean depth of 0.50 m, a mean width of 11.23 m, and a mean width–depth ratio of 23.21. The Shape Index (SI) classifies gullies as V-shaped for SI < 0.4, intermediate for 0.4 < SI < 0.6, and U-shaped for SI > 0.6 [38,39]. Our study yielded a mean SI value of 0.26, suggesting the gullies are predominantly V-shaped and actively eroding. Furthermore, the EI value approaches 0 for actively developing gullies and 1 for stable, mature ones [39]. With a mean EI of 33, our findings point to an active developmental stage for these gullies. Thus, it can be concluded that the active gully regions have been accurately identified by both ML methods.

3.4. Validation of Models

To assess the efficacy of the XGBoost and Random Forest models, we employed a comprehensive suite of statistical tests on both training and testing datasets. Our evaluation toolkit included ACC, MAE, Kappa index, R-squared (R²), and RMSE. The XGBoost model demonstrated superior performance across all metrics when applied to the training dataset (Table 5). It achieved an impressive accuracy of 87.5%, a Kappa index of 0.85, and a high R² value of 0.88. Additionally, it exhibited the lowest RMSE (0.14) and MAE (0.11), underscoring its exceptional predictive capabilities for gully erosion. Notably, both models showed consistent performance across training and testing datasets when evaluated using identical statistical metrics. This consistency highlights their robustness and suggests strong generalization capabilities, indicating reliable applicability to unseen data. While XGBoost outperformed Random Forest, both models proved highly effective in predicting gully erosion susceptibility, offering valuable tools for erosion management and landscape conservation efforts.

Figure 6 depicts ROC curves, which visually assess model performance. These curves show the trade-off between incorrectly classifying non-gully areas and correctly identifying gullies at various thresholds. Both models excelled at delineating erosion risk zones, achieving high area under the curve (AUC) values: 86% for XGBoost and 83.1% for RF. AUC values reflect the models’ ability to differentiate between gully-prone and non-gully areas. A higher AUC indicates better accuracy, with XGBoost having a small edge.

A January–March 2024 field survey corroborated the models’ accuracy. Within areas identified by the models as having a high prevalence of gullies, we randomly selected 50 sample points for evaluation. The field survey findings revealed that 68% of the area consisted of the actual gully area (AGA), while 14% comprised REABDHA. Fallow lands accounted for 11%, and the remaining 7% encompassed other land cover class. The various images presented in Figure 7 provide visual documentation of gully erosion occurring across different regions (Namely, Nekra Gudri, Chhahazari Fulberia, Bugbugi Khal, Chilli Cholai Vati, Fuldihi, Asna Sol, Kakharusole, etc.) within the study area.

4. Discussions

Gully erosion severely threatens environments, damaging landscapes, endangering people, and harming agriculture. It causes habitat loss, soil erosion, water pollution, infrastructure destruction, and flooding. Traditional analysis methods often suffer from uncertainty and overfitting [46]. This study presents a robust approach to GESM in the Raiboni basin, leveraging advanced ML algorithms XGBoost and RF. Our model incorporates an extensive set of 24 key factors influencing gully formation, setting it apart from many previous studies through its comprehensive scope and methodological rigor. We employed IGR and VIF techniques to assess and mitigate multicollinearity among the factors. Our analysis, using average merit (AM), identified elevation, LULC, distance from the rivers, and rainfall as the most critical influences on gully erosion in the basin. Conversely, distance from lineaments emerged as the least influential factor. This research offers a powerful explanatory model that combines diverse data types, robust validation methods, and a thorough set of controlling factors, providing a more nuanced understanding of gully erosion dynamics in the region. Our results align with various research across diverse landscapes and climates, which have consistently emphasized the importance of various factors in gully erosion development. This is further supported by research in China [47], India [48], Ethiopia [49], and Iran [32].

We evaluated the performance of both Random Forest (RF) and XGBoost models using a robust set of statistical tests (ROC analysis, RMSE, MAE, R², accuracy, and Kappa) applied to training and testing data. While both models excelled, XGBoost achieved a slightly higher accuracy in predicting gully erosion susceptibility zones. This contrasts with some previous studies by Saha et al. [50], Avand et al. [51], and Hosseinalizadeh et al. [52] who found RF to be superior. Our findings suggest that XGBoost can outperform RF for GESM development under specific conditions, emphasizing the importance of model selection for optimal results within a particular study area. We sought the model that best aligned with our research goals of identifying gully erosion zones. While XGBoost has proven effective in landslide [53] and flood prediction [54], it also demonstrates promise in this application, further supported by Yang et al. [55] and Hasanuzzaman et al. [56].

The XGBoost model identified slightly less area (11.44% Very High [VH], 19.23% High [H]) in the most susceptible classes compared to the RF model (12.49% VH, 20.86% H). Both models identified a significant portion of the study area, roughly 12%, as highly susceptible to gully erosion, falling within the ‘very-high gully-erosion area’ or ‘gully-dominant area’ classifications (Figure 8). Our analysis of gully characteristics revealed a dominance of V-shaped gullies, likely in an active development stage. This is supported by the average Shape Index (SI) value of 0.26 (indicating V-shaped) and the mean Erosivity Index (EI) of 0.33 (suggesting active development). Hence, it can be deduced that both machine learning methodologies adeptly pinpointed areas undergoing active gully erosion. A January–March 2024 field survey with GPS verification confirmed model accuracy. 68% of the area was actual gully (AGA), with an additional 14% classified as REABDHA. Fallow lands made up 11%, while the remaining 7% comprised other land cover classifications. This field validation process is crucial, as it demonstrates how well the model’s predictions align with real-world conditions. To the best of our knowledge, this study represents pioneering research in this region, with no prior governmental or private reports available for comparison. Given this novelty, we placed significant an emphasis on rigorous field verification. We conducted two separate field surveys to ensure accuracy and reliability. Furthermore, our study distinguishes itself by quantifying the activity level of gully erosion in the affected areas through the application of Shape and Erosivity Indexes, providing a more comprehensive analysis than previous research.

The findings of this research reveal that the ML technique, XGBoost, is particularly well-suited for GESM. By enabling targeted interventions, this approach goes beyond simply facilitating management strategies. This strategy, incorporating community involvement and ecological benefits, encourages sustainable land use to minimize gully erosion’s environmental impact on soil, land, and the future. While our study faced limitations due to the scarcity of high-resolution data and prior research on gully erosion in this region, it lays valuable groundwork for future investigations. This research paves the way for further studies by establishing a robust methodology for model selection and validation in similar contexts.

5. Conclusions

Our research produced a precise GESM for the Raiboni River basin. We leveraged advanced ML techniques XGBoost and RF, integrating 24 key factors that drive gully formation to achieve this high-accuracy model. To ensure these factors were not excessively correlated, we used VIF and IGR techniques-finding no multicollinearity issues. The analysis identified elevation, LULC, river proximity, and rainfall as the strongest drivers of gully erosion, while the distance to lineaments had minimal impact. Robust tests (RMSE, MAE, Kappa, R², ACC, ROC) assessed model performance on training and testing data. While both the XGBoost and RF models performed admirably and were well-suited to this task, the XGBoost model exhibited a slight edge in predicting gully erosion susceptibility zones within the GESM. Both models agreed that approximately 12% of the basin area faces a high risk of gully erosion, forming zones dominated by gully activity. This crucial finding underscores the urgent need for targeted management techniques in these vulnerable areas. Furthermore, our analysis of gully characteristics revealed a dominance of V-shaped gullies, likely in an active developmental stage. This is supported by the average Shape Index (SI) value of 0.26 (indicating V-shaped) and the mean Erosivity Index (EI) of 0.33 (suggesting active development). These findings suggest that both machine learning methods effectively identified regions with actively developing gullies. Focusing efforts on these vulnerable areas allows decision-makers to implement customized and sustainable programs and policies effectively. By addressing the specific needs and conditions of these regions, such initiatives can significantly reduce the future impacts of gully erosion on local communities. This targeted approach not only helps protect the residents, but also ensures the long-term stability and resilience of the affected landscapes, fostering a more sustainable and secure environment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su16156569/s1, Supplementary document (SD) S1. References [57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72] are cited in Supplementary Materials.

Author Contributions

Conceptualization, M.H., P.K.S., F.F.b.H. and J.M.; Data curation, M.H.; Formal analysis, M.H. and P.K.S.; Funding acquisition, S.A. and F.F.b.H.; Investigation, S.A.; Methodology, M.H. and P.K.S.; Project administration, S.A., H.A. and F.F.b.H.; Resources, S.A. and H.A.; Software, P.K.S.; Supervision, P.K.S., S.A., H.G.A. and J.M.; Validation, P.K.S.; Visualization, H.A.; Writing—original draft, M.H. and P.K.S.; Writing—review and editing, S.A., H.A., F.F.b.H., H.G.A. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for this research was given under award number RGP2/442/44 by the Deanship of Scientific Research, King Khalid University, Ministry of Education, Kingdom of Saudi Arabia; and Princess Nourah bint Abdulrahman University researchers supporting project number PNURSP2024R675, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sow, S.A. Dynamic Geomorphology: Systemic Analysis of Continental Water Mor-Phodynamics by Gully. Eur. Sci. J. 2020, 16, 78–98. [Google Scholar]
Hassen, G.; Bantider, A. Assessment of Drivers and Dynamics of Gully Erosion in Case of Tabota Koromo and Koromo Danshe Watersheds, South Central Ethiopia. Geoenviron. Disasters 2020, 7, 5. [Google Scholar] [CrossRef]
Amiri, M.; Pourghasemi, H.R.; Ghanbarian, G.A.; Afzali, S.F. Assessment of the Importance of Gully Erosion Effective Factors Using Boruta Algorithm and Its Spatial Modeling and Mapping Using Three Machine Learning Algorithms. Geoderma 2019, 340, 55–69. [Google Scholar] [CrossRef]
Chen, W.; Lei, X.; Chakrabortty, R.; Pal, S.C.; Sahana, M.; Janizadeh, S. Evaluation of Different Boosting Ensemble Machine Learning Models and Novel Deep Learning and Boosting Framework for Head-Cut Gully Erosion Susceptibility. J. Environ. Manag. 2021, 284, 112015. [Google Scholar] [CrossRef]
Majhi, A.; Nyssen, J.; Verdoodt, A. What Is the Best Technique to Estimate Topographic Thresholds of Gully Erosion? Insights from a case study on the permanent gullies of Rarh plain, India. Geomorphology 2021, 375, 107547. [Google Scholar] [CrossRef]
Rahmati, O.; Tahmasebipour, N.; Haghizadeh, A.; Pourghasemi, H.R.; Feizizadeh, B. Evaluation of Different Machine Learning Models for Predicting and Mapping the Susceptibility of Gully Erosion. Geomorphology 2017, 298, 118–137. [Google Scholar] [CrossRef]
Arabameri, A.; Rezaei, K.; Pourghasemi, H.R.; Lee, S.; Yamani, M. GIS-Based Gully Erosion Susceptibility Mapping: A Comparison among Three Data-Driven Models and AHP Knowledge-Based Technique. Environ. Earth Sci. 2018, 77, 628. [Google Scholar] [CrossRef]
Azareh, A.; Rahmati, O.; Rafiei-Sardooi, E.; Sankey, J.B.; Lee, S.; Shahabi, H.; Ahmad, B.B. Modelling Gully-Erosion Susceptibility in a Semi-Arid Region, Iran: Investigation of Applicability of Certainty Factor and Maximum Entropy Models. Sci. Total Environ. 2019, 655, 684–696. [Google Scholar] [CrossRef]
Igwe, O.; John, U.I.; Solomon, O.; Obinna, O. GIS-Based Gully Erosion Susceptibility Modeling, Adapting Bivariate Statistical Method and AHP Approach in Gombe Town and Environs Northeast Nigeria. Geoenviron. Disasters 2020, 7, 32. [Google Scholar] [CrossRef]
Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Choi, S.M. Gully Erosion Susceptibility Mapping Using Artificial Intelligence and Statistical Models. Geomat. Nat. Hazards Risk 2020, 11, 821–844. [Google Scholar] [CrossRef]
Mehmood, Q.; Qing, W.; Chen, J.; Yan, J.; Ammar, M.; Rahman, G. Susceptibility Assessment of Single Gully Debris Flow Based on AHP and Extension Method. Civil. Eng. J. 2021, 7, 6. [Google Scholar] [CrossRef]
Mohebzadeh, H.; Biswas, A.; Rudra, R.; Daggupati, P. Machine Learning Techniques for Gully Erosion Susceptibility Mapping: A Review. Geosciences 2022, 12, 429. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Shahabi, H.; Mirchooli, F.; Valizadeh Kamran, K.; Lim, S.; Aryal, J.; Jarihani, B.; Blaschke, T. Gully Erosion Susceptibility Mapping (GESM) Using Machine Learning Methods Optimized by the Multi Collinearity Analysis and K-Fold Cross-Validation. Geomat. Nat. Hazards Risk 2020, 11, 1653–1678. [Google Scholar] [CrossRef]
Liu, G.; Arabameri, A.; Santosh, M.; Nalivan, O.A. Optimizing Machine Learning Algorithms for Spatial Prediction of Gully Erosion Susceptibility with Four Training Scenarios. Environ. Sci. Pollut. Res. 2023, 30, 46979–46996. [Google Scholar] [CrossRef] [PubMed]
Hasanuzzaman, M.; Mandal, M.H.; Hasnine, M.; Shit, P.K. Groundwater Potential Mapping Using Multi-Criteria Decision, Bivariate Statistic and Machine Learning Algorithms: Evidence from Chota Nagpur Plateau, India. Appl. Water Sci. 2022, 12, 58. [Google Scholar] [CrossRef]
Thanh, N.N.; Chotpantarat, S.; Trung, N.H.; Ngu, N.H. Mapping Groundwater Potential Zones in Kanchanaburi Province, Thailand by Integrating of Analytic Hierarchy Process, Frequency Ratio, and Random Forest. Ecol. Indic. 2022, 145, 109591. [Google Scholar] [CrossRef]
Zhou, X.; Wen, H.; Zhang, Y.; Xu, J.; Zhang, W. Landslide Susceptibility Mapping Using Hybrid Random Forest with GeoDetector and RFE for Factor Optimization. Geosci. Front. 2021, 12, 101211. [Google Scholar] [CrossRef]
Mosavi, A.; Golshan, M.; Janizadeh, S.; Choubin, B.; Melesse, A.M.; Dineva, A.A. Ensemble Models of GLM, FDA, MARS, and RF for Flood and Erosion Susceptibility Mapping: A Priority Assessment of Sub-Basins. Geocarto Int. 2022, 37, 2541–2560. [Google Scholar] [CrossRef]
Hasanuzzaman, M.; Shit, P.K.; Bera, B.; Islam, A. Characterizing Recurrent Flood Hazards in the Himalayan Foothill Region through Data-Driven Modelling. Adv. Space Res. 2023, 71, 5311–5326. [Google Scholar] [CrossRef]
Sahin, E.K. Assessing the Predictive Capability of Ensemble Tree Methods for Landslide Susceptibility Mapping Using XGBoost, Gradient Boosting Machine, and Random Forest. SN Appl. Sci. 2020, 2, 1308. [Google Scholar] [CrossRef]
Kavzoglu, T.; Teke, A. Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost). Arab. J. Sci. Eng. 2022, 47, 7367–7385. [Google Scholar] [CrossRef]
Sun, D.; Shi, S.; Wen, H.; Xu, J.; Zhou, X.; Wu, J. A Hybrid Optimization Method of Factor Screening Predicated on GeoDetector and Random Forest for Landslide Susceptibility Mapping. Geomorphology 2021, 379, 107623. [Google Scholar] [CrossRef]
Janizadeh, S.; Vafakhah, M.; Kapelan, Z.; Mobarghaee Dinan, N. Hybrid XGboost Model with Various Bayesian Hyperparameter Optimization Algorithms for Flood Hazard Susceptibility Modeling. Geocarto Int. 2022, 37, 8273–8292. [Google Scholar] [CrossRef]
Sahin, E.K. Implementation of Free and Open-Source Semi-Automatic Feature Engineering Tool in Landslide Susceptibility Mapping Using the Machine-Learning Algorithms RF, SVM, and XGBoost. Stoch. Environ. Res. Risk Assess. 2023, 37, 1067–1092. [Google Scholar] [CrossRef]
Shit, P.K.; Maity, R. Rill Hydraulics—An Experimental Study on Gully Basin in Lateritic Upland of Paschim Medinipur, West Bengal, India. J. Geogr. Geol. 2012, 4, 4. [Google Scholar] [CrossRef]
Ghosh, S.; Guchhait, S.K. Characterization and evolution of laterites in West Bengal: Implication on the geology of northwest Bengal Basin. Transactions. 2015, 37, 93–119. [Google Scholar]
Samanta, R.K.; Bhunia, G.S.; Shit, P.K. Spatial Modelling of Soil Erosion Susceptibility Mapping in Lower Basin of Subarnarekha River (India) Based on Geospatial Techniques. Model. Earth Syst. Environ. 2016, 2, 99. [Google Scholar] [CrossRef]
Wang, X.; Zhang, T.; Cao, L.; Liang, Y. Erosion and Global Change. Europe 2016, 93, 39. [Google Scholar]
Tsegaye, L.; Bharti, R. Assessment of the effects of agricultural management practices on soil erosion and sediment yield in Rib watershed, Ethiopia. Int. J. Environ. Sci. Technol. 2023, 20, 503–520. [Google Scholar] [CrossRef]
Dharmawan, I.W.; Pratiwi; Siregar, C.A.; Narendra, B.H.; Undaharta, N.K.; Sitepu, B.S.; Sukmana, A.; Wiratmoko, M.D.; Abywijaya, I.K.; Sari, N. Implementation of Soil and Water Conservation in Indonesia and Its Impacts on Biodiversity, Hydrology, Soil Erosion and Microclimate. Appl. Sci. 2023, 13, 7648. [Google Scholar] [CrossRef]
Hitouri, S.; Meriame, M.; Ajim, A.S.; Pacheco, Q.R.; Nguyen-Huy, T.; Bao, P.Q.; ElKhrachy, I.; Varasano, A. Gully Erosion Mapping Susceptibility in a Mediterranean Environment: A Hybrid Decision-Making Model. Int. Soil. Water Conserv. Res. 2024, 12, 279–297. [Google Scholar] [CrossRef]
Garosi, Y.; Sheklabadi, M.; Pourghasemi, H.R.; Besalatpour, A.A.; Conoscenti, C.; Van Oost, K. Comparison of Differences in Resolution and Sources of Controlling Factors for Gully Erosion Susceptibility Mapping. Geoderma 2018, 330, 65–78. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A Comparative Study of Logistic Model Tree, Random Forest, and Classification and Regression Tree Models for Spatial Prediction of Landslide Susceptibility. CATENA 2017, 151, 147–160. [Google Scholar] [CrossRef]
Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.-B.; Gróf, G.; Ho, H.L. A Comparative Assessment of Flood Susceptibility Modeling Using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B.; et al. Modeling Flood Susceptibility Using Data-Driven Approaches of Naïve Bayes Tree, Alternating Decision Tree, and Random Forest Methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef] [PubMed]
Boehmke, B.; Greenwell, B.M. Hands-On Machine Learning with R.; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
Xu, R.; Lin, H.; Lü, Y.; Luo, Y.; Ren, Y.; Comber, A. A Modified Change Vector Approach for Quantifying Land Cover Change. Remote Sens. 2018, 10, 1578. [Google Scholar] [CrossRef]
Valdez, M.C.; Chang, K.-T.; Chen, C.-F.; Chiang, S.-H.; Santos, J.L. Modelling the Spatial Variability of Wildfire Susceptibility in Honduras Using Remote Sensing and Geographical Information Systems. Geomat. Nat. Hazards Risk 2017, 8, 876–892. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Schmidt, K.; Amirian-Chakan, A.; Rentschler, T.; Zeraatpisheh, M.; Sarmadian, F.; Valavi, R.; Davatgar, N.; Behrens, T.; Scholten, T. Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate Space. Remote Sens. 2020, 12, 1095. [Google Scholar] [CrossRef]
Deng, Q.; Qin, F.; Zhang, B.; Wang, H.; Luo, M.; Shu, C.; Liu, H.; Liu, G. Characterizing the Morphology of Gully Crosssections Based on PCA: A Case of Yuanmou Dry-Hot Valley. Geomorphology 2015, 228, 703–713. [Google Scholar] [CrossRef]
Islam, A.; Sarkar, B.; Das, B.C.; Barman, S.D. Assessing Gully Asymmetry Based on Cross-Sectional Morphology: A Case of Gangani Badland of West Bengal, India. Gully Eros. Stud. India Surround. Reg. 2020, 69–92. [Google Scholar] [CrossRef]
Hong, H.; Pradhan, B.; Xu, C.; Bui, D.T. Spatial Prediction of Landslide Hazard at the Yihuang Area (China) Using Two-Class Kernel Logistic Regression, Alternating Decision Tree and Support Vector Machines. CATENA 2015, 133, 266–281. [Google Scholar] [CrossRef]
Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.C.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011, 12, 77. [Google Scholar] [CrossRef] [PubMed]
Telikani, A.; Tahmassebi, A.; Banzhaf, W.; Gandomi, A.H. Evolutionary Machine Learning: A Survey. ACM Comput. Surv. (CSUR) 2021, 54, 161. [Google Scholar] [CrossRef]
Huang, D.; Su, L.; Zhou, L.; Tian, Y.; Fan, H. Assessment of Gully Erosion Susceptibility Using Different DEM-Derived Topographic Factors in the Black Soil Region of Northeast China. Int. Soil. Water Conserv. Res. 2023, 11, 97–111. [Google Scholar] [CrossRef]
Gayen, A.; Pourghasemi, H.R.; Saha, S.; Keesstra, S.; Bai, S. Gully Erosion Susceptibility Assessment and Management of Hazard-Prone Areas in India Using Different Machine Learning Algorithms. Sci. Total Environ. 2019, 668, 124–138. [Google Scholar] [CrossRef]
Setargie, T.A.; Tsunekawa, A.; Haregeweyn, N.; Tsubo, M.; Fenta, A.A.; Berihun, M.L.; Sultan, D.; Yibeltal, M.; Ebabu, K.; Nzioki, B.; et al. Random Forest–Based Gully Erosion Susceptibility Assessment across Different Agro-Ecologies of the Upper Blue Nile Basin, Ethiopia. Geomorphology 2023, 431, 108671. [Google Scholar] [CrossRef]
Saha, S.; Roy, J.; Arabameri, A.; Blaschke, T.; Tien Bui, D. Machine Learning-Based Gully Erosion Susceptibility Mapping: A Case Study of Eastern India. Sensors 2020, 20, 1313. [Google Scholar] [CrossRef] [PubMed]
Avand, M.; Janizadeh, S.; Naghibi, S.A.; Pourghasemi, H.R.; Khosrobeigi Bozchaloei, S.; Blaschke, T. A Comparative Assessment of Random Forest and K-Nearest Neighbor Classifiers for Gully Erosion Susceptibility Mapping. Water 2019, 11, 2076. [Google Scholar] [CrossRef]
Hosseinalizadeh, M.; Kariminejad, N.; Chen, W.; Pourghasemi, H.R.; Alinejad, M.; Behbahani, A.M.; Tiefenbacher, J.P. Gully Headcut Susceptibility Modeling Using Functional Trees, Naïve Bayes Tree, and Random Forest Models. Geoderma 2019, 342, 1–11. [Google Scholar] [CrossRef]
Parra, F.; González, J.; Chacón, M.; Marín, M. Modeling and evaluation of the susceptibility to landslide events using machine learning algorithms in the province of Chañaral, Atacama region, Chile. Sustainability 2023, 15, 16806. [Google Scholar] [CrossRef]
Wei, A.; Yu, K.; Dai, F.; Gu, F.; Zhang, W.; Liu, Y. Application of tree-based ensemble models to landslide susceptibility mapping: A comparative study. Sustainability 2022, 14, 6330. [Google Scholar] [CrossRef]
Yang, A.; Wang, C.; Pang, G.; Long, Y.; Wang, L.; Cruse, R.M.; Yang, Q. Gully erosion susceptibility mapping in highly complex terrain using machine learning models. ISPRS Int. J. Geo-Inf. 2021, 10, 680. [Google Scholar] [CrossRef]
Hasanuzzaman, M.; Adhikary, P.P.; Shit, P.K. Gully erosion susceptibility mapping and prioritization of gully-dominant sub-watersheds using machine learning algorithms: Evidence from the Silabati River (tropical river, India). Adv. Space Res. 2024, 73, 1653–1666. [Google Scholar] [CrossRef]
Arabameri, A.; Pourghasemi, H.R. Spatial modeling of gully erosion using linear and quadratic discriminant analyses in GIS and R. In Spatial Modeling in GIS and R for Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2019; pp. 299–321. [Google Scholar]
Arabameri, A.; Cerda, A.; Pradhan, B.; Tiefenbacher, J.P.; Lombardo, L.; Bui, D.T. A methodological comparison of head-cut based gully erosion susceptibility models: Combined use of statistical and artificial intelligence. Geomorphology 2020, 359, 107136. [Google Scholar] [CrossRef]
Choubin, B.; Rahmati, O.; Tahmasebipour, N.; Feizizadeh, B.; Pourghasemi, H.R. Application of fuzzy analytical network process model for analyzing the gully erosion susceptibility. In Natural Hazards Gis-Based Spatial Modeling Using Data Mining Techniques; Springer: Berlin, Germany, 2019; pp. 105–125. [Google Scholar]
Cui, L.; Li, X.; Lin, J.; Guo, G.; Zhang, X.; Zeng, G. The mineralization and sequestration of soil organic carbon in relation to gully erosion. Catena 2022, 214, 106218. [Google Scholar] [CrossRef]
Frankl, A.; Vanmaercke, M.; Nyssen, J.; Poesen, J. Gully prevention and rehabilitation: A review. In Proceedings of the 8th International symposium on Gully Erosion (ISGE), Townsville, Australia, 21–27 July 2019; p. 67. [Google Scholar]
Lana, J.C.; Castro, P.D.T.A.; Lana, C.E. Assessing gully erosion susceptibility and its conditioning factors in southeastern Brazil using machine learning algorithms and bivariate statistical methods: A regional approach. Geomorphology 2022, 402, 108159. [Google Scholar] [CrossRef]
Li, Y.; Mo, Y.Q.; Are, K.S.; Huang, Z.; Guo, H.; Tang, C.; Abegunrin, T.P.; Qin, Z.; Kang, Z.; Wang, X. Sugarcane planting patterns control ephemeral gully erosion and associated nutrient losses: Evidence from hillslope observation. Agric. Ecosyst. Environ. 2021, 309, 107289. [Google Scholar] [CrossRef]
Nhu, V.H.; Janizadeh, S.; Avand, M.; Chen, W.; Farzin, M.; Omidvar, E.; Shirzadi, A.; Shahabi, H.J.; Clague, J.; Jaafari, A.; et al. GIS-based gully erosion susceptibility mapping: A comparison of computational ensemble data mining models. Appl. Sci. 2020, 10, 2039. [Google Scholar] [CrossRef]
Rahmati, O.; Haghizadeh, A.; Pourghasemi, H.R.; Noormohamadi, F. Gully erosion susceptibility mapping: The role of GIS-based bivariate statistical models and their comparison. Nat. Hazards 2016, 82, 1231–1258. [Google Scholar] [CrossRef]
Roy, P.; Chakrabortty, R.; Chowdhuri, I.; Malik, S.; Das, B.; Pal, S.C. Development of different machine learning ensemble classifier for gully erosion susceptibility in Gandheswari Watershed of West Bengal, India. Mach. Learn. Intell. Decis. Sci. 2020, 1–26. [Google Scholar] [CrossRef]
Senanayake, S.; Pradhan, B. Predicting soil erosion susceptibility associated with climate change scenarios in the Central Highlands of Sri Lanka. J. Environ. Manag. 2022, 308, 114589. [Google Scholar] [CrossRef] [PubMed]
Wei, Y.; Cai, C.; Guo, Z.; Wang, J. Linkage between aggregate stability of granitic soils and the permanent gully erosion in subtropical China. Soil Tillage Res. 2022, 221, 105411. [Google Scholar] [CrossRef]
Wen, Y.; Kasielke, T.; Li, H.; Zepp, H.; Zhang, B. A case-study on history and rates of gully erosion in Northeast China. Land Degrad. Dev. 2021, 32, 4254–4266. [Google Scholar] [CrossRef]
Wen, H.; Ni, S.; Wang, J.; Cai, C. Changes of soil quality induced by different vegetation restoration in the collapsing gully erosion areas of southern China. Int. Soil Water Conserv. Res. 2021, 9, 195–206. [Google Scholar] [CrossRef]
Zabihi, M.; Mirchooli, F.; Motevalli, A.; Darvishan, A.K.; Pourghasemi, H.R.; Zakeri, M.A.; Sadighi, F. Spatial modelling of gully erosion in Mazandaran Province, northern Iran. Catena 2018, 161, 1–13. [Google Scholar] [CrossRef]
Zhu, P.; Zhang, G.; Zhang, B. Soil saturated hydraulic conductivity of typical revegetated plants on steep gully slopes of Chinese Loess Plateau. Geoderma 2022, 412, 115717. [Google Scholar] [CrossRef]

Figure 1. Study area. (a) Location of India, (b) location of West Bengal, (c) location of testing and training dataset in the Rainoni River basin.

Figure 2. Workflow diagram of the present study.

Figure 3. Distribution of twenty-four key factors used in this research: (a) elevation, (b) slope, (c) slope length, (d) slope aspect, (e) curvature, (f) drainage density, (g) distance from the river, (h) distance from lineament, (i) TWI, (j) distance from the road, (k) NDVI, (l) rainfall, (m) lithology, (n) geomorphology, (o) LULC, (p) soil organic density, (q) bulk density, (r) clay content, (s) coarse fragments, (t) sand, (u) silt, (v) carbon exchange capacity, (w) nitrogen, and (x) soil organic carbon.

Figure 4. Parameters describing the cross-sectional morphology of the gully (note: width of the one-fourth depth (WQD), width of the half depth (WHD), total width (WT), depth of the half right side (DRH), depth of the half left side (DLH), average depth (D) (source: based on Deng et al. [42]).

Figure 5. Final gully erosion susceptibility maps using: (a) the RF and (b) XGBoost models.

Figure 6. Evaluation of the accuracy of the XGBoost and RF models using ROC analysis.

Figure 7. Photographs captured of the gullies during the subsequent field investigations: (a–c) during the gully geometrical parameters survey; (d,e) rock exposure areas caused by deforestation and human activity (REABDHA); (f) agriculture practices in the gully; and (g–i) fallow lands (FL).

Figure 8. The gully-dominant area of the Raiboni River Basin and the selected gully for measuring geometric parameters.

Table 1. Comprehensive breakdown of our data sources, offering a detailed look at the information underpinning this study.

Sl No	Category	Data Source	Resolution
1	Alos Palsar DEM (elevation)	https://search.asf.alaska.edu	12.5 × 12.5 m
2	Slope	Extracted from DEM	12.5 × 12.5 m
3	Slope length	Extracted from DEM	12.5 × 12.5 m
4	Slope aspect	Extracted from DEM	12.5 × 12.5 m
5	Curvature	Extracted from DEM	12.5 × 12.5 m
6	Drainage density (DD)	Extracted from DEM	12.5 × 12.5 m
7	Distance from the river (DFR)	Extracted from DEM	12.5 × 12.5 m
8	Distance from the lineament (DFL)	Extracted from DEM	12.5 × 12.5 m
9	Topographic Weightiness Index (TWI)	Extracted from DEM	12.5 × 12.5 m
10	Rainfall	WorldClim website	885.67 × 885.67 m
11	NDVI	Satellite image (USGS website)	30 × 30 m
12	Land Use and Land Cover (LULC)	Satellite image (USGS website)	30 × 30 m
13	Distance from the road (DR)	https://www.openstreetmap.org	30 × 30 m
14	Lithologic	Survey of India (bhukosh.gsi.gov.in)	30 × 30 m
15	Geomorphology	Survey of India (bhukosh.gsi.gov.in)	30 × 30 m
16	Soil organic density (SOD)	https://soilgrids.org	250 × 250 m
17	Bulk density	https://soilgrids.org	250 × 250 m
18	Clay Content in Soil (SC)	https://soilgrids.org	250 × 250 m
19	Coarse fragments	https://soilgrids.org	250 × 250 m
20	Sand	https://soilgrids.org	250 × 250 m
21	Silt	https://soilgrids.org	250 × 250 m
22	Carbon exchange capacity (CEC)	https://soilgrids.org	250 × 250 m
23	Nitrogen	https://soilgrids.org	250 × 250 m
24	Soil organic carbon (SOC)	https://soilgrids.org	250 × 250 m

Table 2. The Progression of influencing factors via IGR and VIF tests (average merit).

No	Influencing Factor	VIF	Average Merit (AM)
1	DEM (elevation)	3.87	0.75
2	Slope	1.83	0.49
3	Slope length	1.14	0.37
4	Slope aspect	1.17	0.39
5	Curvature	1.72	0.25
6	Drainage density (DD)	2.49	0.47
7	Distance from the river (DFR)	3.34	0.70
8	Distance from the lineament (DFL)	1.44	0.21
9	Topographic Weightiness Index (TWI)	2.43	0.34
10	Rainfall	3.64	0.69
11	NDVI	2.55	0.57
12	Land Use and Land Cover (LULC)	2.48	0.71
13	Distance from the road (DR)	1.71	0.61
14	Lithology	3.22	0.48
15	Geomorphology	1.68	0.63
16	Soil organic density (SOD)	3.41	0.59
17	Bulk density	3.88	0.51
18	Clay Content in Soil (SC)	3.47	0.40
19	Coarse fragments	1.91	0.63
20	Sand	2.76	0.64
21	Silt	2.90	0.68
22	Carbon exchange capacity (CEC)	2.29	0.43
23	Nitrogen	1.87	0.47
24	Soil organic carbon (SOC)	3.49	0.65

Table 3. Gully erosion susceptibility area (in km²) of the study area by the RF and the XGBoost algorithms.

Class	XGBoost		RF
Class	Area in km²	Area in %	Area in km²	Area in %
Very High	6.34	11.44	6.92	12.49
High	10.66	19.23	11.56	20.86
Moderate	26.57	47.94	25.01	45.13
Low	5.51	9.94	5.62	10.14
Very Low	6.35	11.46	6.31	11.39
Total	55.42	100	55.42	100

Table 4. Statistical description of the geometric parameters of gullies.

Gully Name	Total Area (m²)	D	DLH	DRH	WT	WHD	WQD	W/D	SI	EI
G1	4.31	0.35	0.34	0.35	11.68	6.13	3.02	33.37	0.26	0.3
G2	4.76	0.35	0.36	0.34	12.01	6.39	3.13	34.31	0.26	0.29
G3	5.63	0.38	0.39	0.37	11.39	6.03	2.94	29.97	0.26	0.31
G4	5.26	0.36	0.35	0.36	11.88	5.87	2.91	33.00	0.24	0.28
G5	5.71	0.44	0.43	0.44	10.38	5.41	2.71	23.59	0.26	0.33
G6	4.36	0.37	0.36	0.37	10.91	4.2	2.73	29.49	0.25	0.31
G7	6.14	0.6	0.58	0.61	13.45	6.85	2.57	22.42	0.19	0.28
G8	5.89	0.47	0.46	0.48	10.54	5.64	2.73	22.43	0.26	0.35
G9	6.01	0.5	0.49	0.51	11.66	5.46	2.91	23.32	0.25	0.39
G10	5.66	0.56	0.55	0.57	11.33	5.74	2.93	20.23	0.26	0.31
G11	6.39	0.61	0.62	0.59	11.78	5.34	5.79	19.31	0.49	0.33
G12	4.85	0.42	0.41	0.43	8.29	4.31	2.11	19.74	0.25	0.25
G13	5.51	0.48	0.49	0.47	8.94	4.33	2.12	18.63	0.24	0.34
G14	5.29	0.47	0.46	0.48	8.89	4.29	2.22	18.91	0.25	0.35
G15	6.37	0.67	0.67	0.67	11.21	5.67	2.91	16.73	0.26	0.29
G16	6.11	0.57	0.59	0.55	10.97	5.88	2.75	19.25	0.25	0.34
G17	5.75	0.5	0.51	0.49	9.43	4.5	2.19	18.86	0.23	0.36
G18	6.34	0.67	0.66	0.67	12.71	6.38	3.09	18.97	0.24	0.37
G19	6.17	0.69	0.68	0.69	13.96	6.73	3.55	20.23	0.25	0.41
G20	5.69	0.62	0.61	0.63	13.25	6.69	3.29	21.37	0.25	0.39

Table 5. The outputs of RMSE, MAE, R², ACC, and Kappa statistical tests for the RF and XGBoost algorithm through the training and testing data.

Method	RF		XGBoost
Method	Training	Testing	Training	Testing
Accuracy (%)	83.7	83.2	87.5	88.7
Kappa index (K)	0.80	0.81	0.85	0.83
MAE	0.15	0.19	0.11	0.13
RMSE	0.17	0.15	0.14	0.15
R²	0.81	0.80	0.88	0.86

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hasanuzzaman, M.; Shit, P.K.; Alqadhi, S.; Almohamad, H.; Hasher, F.F.b.; Abdo, H.G.; Mallick, J. Utilizing Machine Learning Algorithms for the Development of Gully Erosion Susceptibility Maps: Evidence from the Chotanagpur Plateau Region, India. Sustainability 2024, 16, 6569. https://doi.org/10.3390/su16156569

AMA Style

Hasanuzzaman M, Shit PK, Alqadhi S, Almohamad H, Hasher FFb, Abdo HG, Mallick J. Utilizing Machine Learning Algorithms for the Development of Gully Erosion Susceptibility Maps: Evidence from the Chotanagpur Plateau Region, India. Sustainability. 2024; 16(15):6569. https://doi.org/10.3390/su16156569

Chicago/Turabian Style

Hasanuzzaman, Md, Pravat Kumar Shit, Saeed Alqadhi, Hussein Almohamad, Fahdah Falah ben Hasher, Hazem Ghassan Abdo, and Javed Mallick. 2024. "Utilizing Machine Learning Algorithms for the Development of Gully Erosion Susceptibility Maps: Evidence from the Chotanagpur Plateau Region, India" Sustainability 16, no. 15: 6569. https://doi.org/10.3390/su16156569

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utilizing Machine Learning Algorithms for the Development of Gully Erosion Susceptibility Maps: Evidence from the Chotanagpur Plateau Region, India

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Methodology

2.2.1. Inventory of Gully Erosion Locations

2.2.2. Gully Erosion Conditioning Factors

2.2.3. Multicollinearity Assessment

2.2.4. Random Forest (RF)

2.2.5. Extreme Gradient Boosting (XGBoost)

2.2.6. Method to Measure Geometric Parameters

2.2.7. Method of Model Validation

3. Results

3.1. Influence of the Key Factors on GESM

3.2. Gully Erosion Susceptibility Mapping

3.3. Analysis of the Geometric Parameters of Gullies

3.4. Validation of Models

4. Discussions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI