**Narges Javidan 1, Ataollah Kavian 1,\*, Hamid Reza Pourghasemi 2, Christian Conoscenti <sup>3</sup> and Zeinab Jafarian <sup>4</sup>**


Received: 14 September 2019; Accepted: 29 October 2019; Published: 6 November 2019

**Abstract:** Soil erosion is a serious problem affecting numerous countries, especially, gully erosion. In the current research, GIS techniques and MARS (Multivariate Adaptive Regression Splines) algorithm were considered to evaluate gully erosion susceptibility mapping among others. The study was conducted in a specific section of the Gorganroud Watershed in Golestan Province (Northern Iran), covering 2142.64 km2 which is intensely influenced by gully erosion. First, Google Earth images, field surveys, and national reports were used to provide a gully-hedcut evaluation map consisting of 307 gully-hedcut points. Eighteen gully erosion conditioning factors including significant geoenvironmental and morphometric variables were selected as predictors. To model sensitivity of gully erosion, Multivariate Adaptive Regression Splines (MARS) was used while the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), drawing ROC curves, efficiency percent, Yuden index, and kappa were used to evaluate model efficiency. We used two different scenarios of the combination of the number of replications, and sample size, including 90%/10% and 80%/20% with 10 replications, and 70%/30% with 5, 10, and 15 replications for preparing gully erosion susceptibility mapping (GESM). Each one involves a various subset of both positive (presence), and negative (absence) cases. Absences were extracted as randomly distributed individual cells. Therefore, the predictive competency of the gully erosion susceptibility model and the robustness of the procedure were evaluated through these datasets. Results did not show considerable variation in the accuracy of the model, with altering the percentage of calibration to validation samples and number of model replications. Given the accuracy, the MARS algorithm performed excellently in predictive performance. The combination of 80%/20% using all statistical measures including SST (0.88), SPF (0.83), E (0.79), Kappa (0.58), Robustness (0.01), and AUC (0.84) had the highest performance compared to the other combinations. Consequently, it was found that the performance of MARS for modelling gully erosion susceptibility is quite consistent while changes in the testing and validation specimens are executed. The intense acceptable prediction capability of the MARS model verifies the reliability of the method employed for use of this model elsewhere and gully erosion studies since they are qualified to quickly generating precise and exact GESMs (gully erosion sensitivity maps) to make decisions and management edaphic and hydrologic features.

**Keywords:** gully erosion susceptibility; GIS; robustness; MARS algorithm

#### **1. Introduction**

Soil erosion through water is found to be a drastic soil destruction process, which accounts for about one billion hectares worldwide [1], resulting in low plant development, filling reservoirs and valleys, geo-environmental degradation, degradation of a large part of the soil, and siltation of watercourses [2–4]. One of the soil degradation processes is gully erosion and serves as the most intricate erosion phenomena [5], usually stimulated or exacerbated integrating extreme rainstorms and unwise land exploitation [6]. Such erosion contains wide varieties of small processes, including head-cut, fluting, piping, continuous cracking progress, and mass flow [7,8]. Generally, the increasing attention towards analysis of gully erosion indicates the necessity to enhance our awareness regarding its consequences and condition agents that change due to various factors [6]. Gullies involve complex pathways adjusted through the variability of correlated variables including soil texture, lithology, land use and plant canopy, climate, and topography [9]. As gully erosion is a threshold phenomenon [8], various studies have emphasized characterizing the topographic as well as hydraulic conditions to forecast and assess the starting gullies susceptibility mapping [10], to a threshold approach where characterization and positions of erosion processes might as well as be anticipated by using bivariate to multivariate methods. Such techniques provide scholars the ability to describe soil erosion processes, through evaluating space distribution of the gully erosion forms (the consequences) compared to some predictors (geological and environmental factors). As for geomorphology, actuarial methods are enormously used to evaluate landslide susceptibility mapping [11–17].

The large number of studies have at the same time used the probabilistic method to map erosion sensitivity and other hazards. As well as bivariate methods [18–20], various multivariate actuarial approaches were considered to satisfy this end including logistic regression [21,22], classification and regression trees [23,24], and multivariate adaptive regression splines [25–28]. Gutiérrez et al. [27] used the Multivariate Adaptive Regression Splines (MARS) model to predict gully creation locations. The results showed that this model has good performance in geomorphic research.

Gully erosions have the highest sediment production potential. In recent decades, gully erosion has developed in most watersheds of Iran. Gullies are an important sediment source and often cause environmental problems [29]. Due to their damages, such as loss of productive capacity and significant land degradation, high sediment discharge and sediment yields, which can transport both pollutants and nutrients, reducing the water capacity of the reservoirs and damage to the infrastructure and transport routes, prediction of susceptible areas is, therefore, considered essential in management of watersheds [30]. The result of gully erosion study demonstrates the susceptibility to erosion over a country, providing beneficial information for remediation strategies and establishing land use plans [29].

In ongoing research, we adopted the multivariate adaptive regression splines [31] as a multivariate actuarial method for analyzing, assessing, and forecasting the local incidence of gully erosion pathways. The MARS model serves as the most common actuarial method that previously proved to offer credible patterns of gully sensitivity. The reason behinds choosing MARS models for the forecasting gully erosion are as follows: (1) possibility for modeling curvilinear association among the conditioning factors and gully incidence; (2) it allows working with various outcome variables and may manipulate data from different measures; (3) according to previous studies in this area, that any studies have applied this model for evaluating its ability and robustness for gully susceptibility.

This area in Gorganrood has witnessed gully erosion that caused many issues in this area and led organizations to reassess the Weirdness of adopted constant genesis strategies (CONRWMGP 2009). Astonishingly, the major initial place for the erosion phenomenon is cutting down trees positioned at the upper part of Gorganrood watershed and land-use changes (CONRWMGP 2009). From 1990 to 2005 due to the presence of loess soils in the northern of Golestan Province, 430,000 ha of these areas were affected by erosion. Soil erosion in this province is 5–6 tons/ha/year in forest areas (Department of Golestan Natural Resources and Watershed management). Gully erosion in Maravetape and Kalale counties leads to the loss of soil, the imposition of large costs, reduced agricultural potential, and has caused the migration of people in the villages of this region and exacerbated soil erosion pathways

influence farming system efficiency. Consequently, to describe a robust model to evaluate the sensitivity of the territory to the development of gully pathways is necessary and Gorganrood watershed as the susceptible area will be the focus of the present study.

The difference between this study and previous studies who used the MARS model is applying two scenarios of the combination of number of replications and sample size, including 90%/10% and 80%/20% with 10 replications, and 70%/30% with 5, 10, and 15 replications for preparing gully erosion susceptibility mapping (GESM) and assessing their performance by MARS algorithm. Each one involves a various subset of both positive and negative cases. Absences are extracted as randomly distributed individual cells. Therefore the predictive competency of the gully erosion susceptibility model and the robustness of the procedure were evaluated through these datasets.

Therefore, the main scope of the current study is gully erosion modeling based on the MARS technique and explores the ability and robustness of the MARS model to forecast the incidence of gully erosion by various data sets and assessment measures. This study enriches the systematic assessment of the MARS model to map gully erosion susceptibility among others. This study tried to investigate gully erosion susceptibility through the MARS model and analyzed the performance and accuracy of this technique for zoning gully erosion.

The result of this study demonstrates ways of offering beneficial insight for remediation strategies and founding land exploitation projects by erosion susceptibility.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The area under research belongs to Gorganrood basin, related to Golestan Province located in north-eastern Iran. This area accounts for 2142.64 km2 and is intensely influenced by gully erosion. Its coordinates span over 37◦18 –37◦52 N and 55◦18 –56◦10 E (Figure 1). Topographically, this region is considered as plain. The mean height ranges between 56 and 2165 m. Its prevalent soil textures are Silty-loamy (approximately 53.7%) and Silty-clay-loamy (nearly 45.3%) soils. The major land uses in this region include pasturelands (40.1%), agriculture (farming) (29.4%), and forest (18.4%). The mean annual rainfall is approximately between 460 and 603 mm. The average minimum and maximum temperatures are 11 and 18.5 ◦C, respectively [32]. In the last decade, this area has been challenged with natural hazards and has faced intensive gully erosion. Therefore, this study area was selected as a potential gully erosion-prone area. Figure 2 presents some Google Earth images of gully erosion in the research field.

**Figure 1.** Study area and topographical characteristics. (**a**) Iran, (**b**) Gorganrood Watershed, (**c**) Study area.

**Figure 2.** Some photographs of the gully erosion zoned on study area.

#### *2.2. Methodology*

Figure 3 indicates a flow-diagram for the method applied for the application of multivariate adaptive regression splines model for gully erosion sensitivity zoning developed for this specific research in the north of Iran. As shown, the flowchart consists of five steps: (1) preparing thematic layers or 18 effective conditioning factors; (2) selection of factors using a multi collinearity test; (3) applying two scenarios of the combination of the number of replications, and sample size, including 90%/10% and 80%/20% with 10 replications, and 70%/30% with 5, 10, and 15 replications; (4) gully erosion susceptibility modeling using the MARS technique; (5) validation of the susceptibility maps using the ROC-AUC (Area Under the Receiver Operating Characteristic) curve, efficiency percent, Yuden index, and kappa.

#### 2.2.1. Gully Erosion Inventory Mapping

An essential stage to zoning is to create a hazard evaluation for hazard zones [22]. The gully erosion inventory for the present section of Gorganrood Watershed was provided by Google Earth images, field surveys, and national and regional reports from different organizations. The present map constitutes a set of incidences (307 hedcut points). While designing statistical plans, the training set must differ from sets applied in the validation part [33]. To distinguish training points from the validation points a random dividing algorithm [29,34] was used. In this research, two scenarios were used: these scenarios were selected after altering different sample sizes and the number of replications, including 90%/10% and 80%/20% with 10 replications. To assess the robustness of the model's data sensitivity [8,22,35], 5, 10, and 15 sample data sets, (replicates) for 70%/30% sample size, were prepared through randomly multi-extracting of various data sets in the calibration and validation subsets [36]. Every set was adjusted through addition to positives (i.e., pixels having hedcut points) an equal number of randomly selected negative points, corresponding to pixels without hedcut [37].

**Figure 3.** Flow diagram for the research method.

## 2.2.2. Gully Erosion Predictor Variables (GEPV)

It is essential to determine the effective factors on natural hazards and man-made fatalities in order to performing gully erosion susceptibility maps have great importance [38]. Good knowledge of the main gully erosion-related factors is required to recognize the susceptible areas. Such contributors always apply in studies analyzing gully erosion. Therefore, such factors were chosen from past studies [34,39]. In this study, to generate and exhibit such data grid, ArcGIS 10.5 and a system for automated geoscientific analyses (SAGA) software were used. For the application of the MARS model, all agents were transformed into a raster network with 30 × 30 m grid pixel. All conditioning factors were primarily continuous, and some of them (litologhy, soil, and land use) were classified within different categories based on expert knowledge and literature review [40–42].

The predicting factors used in this work are (a) digital elevation model (m), (b) aspect map, (c) slope percent, (d) curvature of profile, (e) curvature of plan, (f) land use (LU), (g) soil texture, (h) Topographic wetness index (TWI), (i) distance to streams (m), (j) distance to roads (m), (k) drainage

density, (l) annual rainfall (mm), (m) stream power index, (n) relative slope position, (o) lithological formation, (p) K factor, (q) Melton ruggedness number, (r) topographic position index.

Digital contour data obtained from the Department of Natural Resources Management of Iran was applied. A DEM (Digital Elevation Model) (Figure 4a) of the research field characterized with a 30 m pixel was generated. Drawing upon DEM, physiographical and geomorphological grids such as aspect (Figure 4b), slope percent (Figure 4c) as well as curvature layers were extracted using ArcGIS 10.5. The slope percent includes a large section of the intrinsic views and is an important factor as it affects drainage density, surface runoff, influences, vegetation structure the soil erosion, soil moisture, and geomorphological processes [9,43–47]. Slope aspect is another important factor related to precipitation, snow meltwater, land cover, soil moisture patterns, and physiographic trends [48–52]. The suitable geomorphological data may be inferred via curvature assessment [53–55]. Three categories, convex, concave, and flat, were used to develop the slope curvature map. Positive curvature exhibits convex (>+0.1), negative curvature depicts concave (<−0.1), and zero curvature represents flat ((−0.1)–(+0.1)). In addition, profile and plan curvatures (Figure 4d,e) include various negative as well as positive quantities and indicate various description in every measure. Negative as well as positive ones in profile curvature denote convexity (increasing flow velocity) and concavity (reducing flow velocity), respectively. In contrast, negative and positive values in the plan curvature imply concavity (flow convergence) as well as the convexity (flow divergence) [54,56]. Those near to zero denote neutral curvature in both conditions.

LU has a substantial contribution in geomorphological and hydrological pathways through direct or indirect influences on evapotranspiration, infiltration, run-off generation, and sediment dynamics [52,57]. The other hand, agricultural activities have an important impact on gully erosion development as well as genesis [58]. The land-use map in the region in 1:100,000 scale was prepared by the Natural Resources department of Golestan Province and manipulated by Google Earth images. The land-use of the region consists of residential areas (urban), range and farming, forestlands, rangelands, farming, and lake (Figure 4f).

Soil texture (Figure 4g) usually is recognized as a substantial limiting factor in the process of infiltration and overflow and it is effective on gully occurrence [59–61]. Through digitizing the soil texture map of Golestan Province (1:100,000 scale) obtained from the Agriculture Department, Iran the aforementioned layer was prepared. The soil texture in the study area consists of sandy-loam, clay-loam, sandy-clay-loam, silty-clay, silty-clay-loam, as well as silty-loam.

Moore, Grayson, and Ladson [62] and Grabs et al. [63] mentioned that TWI (Topographic wetness index) represents the spatial distribution of wetness conditions, and inclination of gravitation to transport water to downstream. This factor was prepared using Equation (1):

$$\text{TWI} = \ln\left(\frac{\infty}{\tan\beta}\right) \tag{1}$$

where α denotes the aggregated upstream area leaving a point (contour points length) and tan β is point angle. Here, GIS 10.5 software was used for TWI mapping and ranges from 1.20 to 22.92 (Figure 4h).

Distance to streams (Figure 4i) serves as a key determinant because of its important effect on flow magnitude as well as gully erosion [29]. Based on field studies, gully erosions are diffused typically close the linear aspects in particular roads. Undoubtedly, road making imposes an adverse effect on hill sustainability at which flow may be appropriate for gullies [22,64].

Layers of the proximity were produced using the Euclidean metric function in ArcGIS 10.5 and ranged from 0 to 11,720 m for roads (Figure 4j), and 0–15,080 m for streams. The national topographic map at the scale of 1:50,000 was used to extract routes and streams.

The drainage density (Figure 4k) is found to be a great factor which plays an important role in the incidence of many hazards [65]. Several factors such as the structure and nature of the soil characteristics, geological beds, infiltration rate, plant cover condition, and slope degree [66] affect drainage networks. To turn the drainage network model to a reasonable value, the drainage density was specified via an extension of "line density" in ArcGIS 10.5 software.

**Figure 4.** *Cont.*

**Figure 4.** Gully conditioning factor maps. (**a**) Digital Elevation Model (DEM) (m), (**b**) Aspect, (**c**) Slope (%), (**d**) Profile curvature, (**e**) Plan curvature, (**f**) Land use, (**g**) Soil texture, (**h**) Topographic Wetness Index, (**i**) Distance to stream (m), (**j**) Distance to roads (m), (**k**) Drainage density (**l**) Rainfall (mm), (**m**) Stream Power Index, (**n**) Relative slope position (**o**) lithology, (**p**) K factor, (**q**) Melton ruggedness number, (**r**) Topographic Position Index.

The yearly average precipitation map (Figure 4l) of Gorganrood Basin was developed according to precipitation data obtained from the Golestan province Regional Water Organization. The above-mentioned map was developed using the 53 gauges and statistical period of 2009–2016 based on Inverse Distance Weight (IDW) interpolation method [67] (Equation (2)). This map ranges from 460 to 603 mm/year. The rainfall map was developed in 30 × 30 m in ArcGIS 10.5 as an input layer for susceptibility assessment of gully erosion.

$$\lambda \mathbf{i} = \frac{\text{Di}^{-\alpha}}{\sum\_{i=1}^{n} \text{Di}^{-\alpha'}} \text{ } \tag{2}$$

where, λi represents point i weight, Di denotes the space between the point i and the point of unknown, and α implies weighing power [67].

The stream power index (SPI) (Figure 4m) is an index of the water flow erosive power according to the hypothesis that flow is relative to the particular watershed [68].

$$\text{SPI} = \text{As} \times \tan \sigma\_{\prime} \tag{3}$$

where, As is the particular watershed area per meter and r is the gradient of slope in degrees. The SPI index is the most important factor adjusting slope erosion processes, as erosive power of flow directly affect s river cutting and slope to erosion [68]. The regions with great river power measures have excessive erodibility because it indicates potential energy for limiting sediment [69].

Relative slope position (RSP), (Figure 4n), as a tool, could calculate several terrain indices from the digital elevation model. General information on the computational concept can be found in [70].

Lithology units (Figure 4o), have a dominant contribution in specifying gully erosion sensitivity [9,50,52,71,72] as gully erosion relies on the lithology properties and different lithological units display significant differences in erosion instability. In this research, the lithological map for the region was generated as the present geological maps with a 1:100,000 scale obtained from the Geological Survey Department, Iran. This area of Gorganrood Watershed is full of various types of outcrop formations and divided into 10 classes (Table 1).


**Table 1.** Lithology of the Gorganrood watershed.

Soil erosion potential (k-factor) (Figure 4p) influences soil persistence to run-off force or rainfall impact [73]. The soil erodibility (K) in Universal Soil Loss Equation (USLE), is measured by the texture, organic matter, infiltration, and soil structure.

A simplified flow accumulation measure, computed as discrepancy among max and min elevation in the watershed divided by the square root of the watershed area is Melton ruggedness number (MRN) (Figure 4q). The measurement is done for every grid pixel, hence minimum elevation equals elevation in the cell's location. Flow measurement is easily performed with Deterministic 8 given the inconsistent nature of a single maximum elevation [74–76].

Topographic position index (TPI) (Figure 4r) serves as an approach widely applied to evaluate topographic slope location, and to zone ordination automation. This function produces single-band raster characterized quantities measured upon elevation. TPI is an abbreviation for Topographic Position Index, in turn, described as the difference between the main pixel and the average of its adjacent pixels [77].

## *2.3. Multi-Collinearity Test*

The above-mentioned factors were used to consider the effect of correlation among them as the independent variable. If both predictor variables are intensely related, it is a problemin the modelling process. The issue is named collinearity. The VIF (variance inflation factor) and Tolerance include both significant measures of multi-collinearity recognition. Indeed, VIF is simply the reciprocal of tolerance, on the other hand, tolerance is 1-R2 for the variable regression versus predictors, deprived of the dependent variable [78]. A VIF of five or 10 and above and/or a tolerance of less than 0.20 or 0.10 indicates a multi-collinearity problem [79,80].

#### *2.4. Multivariate Adaptive Regression Splines (MARS Model)*

MARS can adapt complicated, non-linear associations among the independent and dependent measures although providing an explainable model [81]. The MARS algorithm has been applied in geomorphology to forecast and mapping the incidence of gullies [25–27] and landslides [17,38,82].

This technique functions via dividing magnitudes of the explanatory predictors into areas and through the fitting, for every area, a linear regression equation. "Knots" divides quantities among regions, whereas the phrase "basis function" (BF) denotes implying every various condition factor interval (dependents). BFs are functions of the following:

$$\max(0, \, \mathbf{x} - \mathbf{k}) \text{or}$$

$$\max(0, \, \mathbf{k} - \mathbf{x})\_{\prime}$$

where k denotes constant equal to a knot and x is an independent variable. The MARS may be stated generally as below:

$$\mathbf{y} = (\mathbf{x}) = \alpha + \sum\_{\mathbf{n}=1}^{N} \beta\_{\mathbf{n}} \mathbf{h}\_{\mathbf{n}}(\mathbf{x}),\tag{4}$$

where y is predictor variable anticipated via function f(x), N denotes terms number, who is shaped by a coefficient βn, α is a constant, and hn (x) represents a separate basis function or multiplication of BFs. MARS analysis was carried out by the Earth version of the R software [83–85]

#### Evaluation of the Model

The evaluation pathway comprises the fitting degree assessment (goodness of fit), robustness, and prediction skills of the model [22,86]. The goodness of fit demonstrations capability of the approach in forecasting the training subset, whereas the predictive efficiency (prediction skills) is a fundamental step for model precision to forecast a validation set (the percent of gully points that do not use the training process) [39,87]. While changing training and validation points, the precision of the forecasting model is determined as the consistency of outputs of the model in respect to model precision. Here, predictive performance and goodness of fit of the model are subjected to assessment using both threshold-driven and non-threshold-dependent methods. ROC curve, as a threshold-driven method, was drawn for all datasets as well as afterwards goodness of fit (i.e., degree of fitting) and forecasting efficacy of the algorithms were studied, respectively [88].

The Kappa coefficient, Youden index, and efficiency (as threshold dependent performance) were calculated based on the components of the confusion matrix [89]. Furthermore, sensitivity (SST) and specificity (SPF) are common statistical indexes applied for validation of each model performance [90]. Comparison of the model observed data and results are demonstrated through a contingency matrix (Table 2). According to Table 2, the true negative (TN) and true positive (TP) are the numbers of pixels that are correctly and appropriately classified, whereas false negative (FN) and false positive (FP) are the numbers of pixels fallaciously classified.

$$\text{SST} = \frac{\text{TP}}{\text{TP} + \text{FN}'} \tag{5}$$

$$\text{PF} = \frac{\text{TN}}{\text{TN} + \text{FP}}.\tag{6}$$

Efficiency (E) is the ratio of gully points and non-gully cells which output model accurately divided:

$$\mathbf{E} = \frac{\mathbf{T}\mathbf{P} + \mathbf{T}\mathbf{N}}{\mathbf{T}\mathbf{P} + \mathbf{T}\mathbf{N} + \mathbf{F}\mathbf{P} + \mathbf{F}\mathbf{N}}.\tag{7}$$

The Kappa factor (K) describes the potential of the employed model to categorize the gully point cells [1], and can be expressed as the ratio of given consistency beyond that expected by chance:

$$\mathbf{K} = \frac{\mathbf{P}\_{\text{obs}} - \mathbf{P}\_{\text{exp}}}{1 - \mathbf{P}\_{\text{exp}}},\tag{8}$$

where Pobs represents the ratio of cells that are properly divided as gully occurrence or non-gully and Pexp denotes the ratio of pixels whose consistency is random [91]. Pobs and Pexp can be measured as below:

$$P\_{\text{obs}} = \text{TP} + \text{TN}\_{\text{\textdegree}} \tag{9}$$

$$\mathbf{P}\_{\rm exp} = [(\mathbf{TP} + \mathbf{FN})(\mathbf{TP} + \mathbf{FP}) + (\mathbf{FP} + \mathbf{TN})(\mathbf{FN} + \mathbf{TN})].\tag{10}$$

The performance of model given the Kappa factor may be categorized as ≤0 (weak), 0–0.2 (slight), 0.2–0.4 (fair), 0.4–0.6 (moderate), 0.6–0.8 (high), and 0.8–1 (almost perfect) [1,2].

**Table 2.** Contingency matrix applied for the Multivariate Adaptive Regression Splines (MARS) model assessment.


The ROC (receiver operating characteristics) curve, as a non-threshold-driven method, is widely used to quantify the measure classification [92–94]. The receiver operating characteristics is the area under the curve (AUC-ROC) that describes the performance of a model to precisely forecast non-incidence or an incidence [95,96]. The forecasting precision of the model according to the AUC value may be categorized as three classes of accuracy following the classification proposed by [97]: 0.7, 0.8, and 0.9. AUC thresholds were used for acceptable, superior, and substantial performance, respectively [22,37].

Since the admission of a forecasting model entails, for assessment of its precision, trifle variations of the input data (i.e., data susceptibility), a gully erosion sensitivity model was developed on 10 various samples of mapping measures. Therefore, robustness in forecasting models and their consistency were furthermore evaluated when the training and validation samples are altered (i.e., a replicate method) [8,22,41,96,98]. The model was employed to given data, and subsequently was tested by the validation datasets. As for the ROC curve, the calculated event map by the validation dataset were compared.

Through subtracting the maximum and minimum accuracy values according to each assessment measures precision of the model was calculated [22,96]:

$$R\_{\rm AUC-ROC} = \text{AUC} - \text{ROC}\_{\rm max} - \text{AUC} - \text{ROC}\_{\rm min} \tag{11}$$

where RAUC-ROC is model precision as per AUC-ROC measures, and AUC-ROCmax includes maximum precisions within all sets. As well, least accuracy values are represented as AUC-ROC min within whole datasets. In addition, in this research, ROC curves were applied to designate the optimum benchmark point of the model rate, via considering Youden's index (J) [99], with J relative to the maximum perpendicular space between the ROC and the former bisector as follows:

$$\mathbf{J} = \text{Maximum (sensitivity} + \text{specificity} - 1). \tag{12}$$

#### **3. Results**

#### *3.1. Gully Erosion Susceptibility Model*

First, Google Earth images, field surveys, and national reports were used to provide a gully-hedcut evaluation map consisting of 307 gully-hedcut points. Here, the MARS model was employed on balanced given sets (positives/negatives), and each one comprises all the positive (gully point) pixels and same randomly inferred negative (non-gully spots) cells, that was for two processes of integrating some repeats and samples, involving 90%/10% and 80%/20% with 10 replications. To assess the robustness of the model's data sensitivity, 5, 10 and 15 sample datasets (replicates) for 70%/30% sample size, were prepared through random selection of different data sets in the calibration and validation subsets. Figure 5 indicates the relative diffusion of the mean of gully erosion susceptibility categorizes for 10 groups in the sample data sets (70%/30%).

**Figure 5.** Sensitivity maps of gully erosion of the study area by the MARS model for 70%/30%, 10 replications.

The results of the relative distribution of the average of gully erosion susceptibility classes for other models are presented in Table 3. However, in the other combinations, the results were almost similar together. As well, the actuarial features of the probabilistic forecasting of the gully erosion of 10 sample data sets and replicates are shown in Table 4.


**Table 3.** Relative distributions of the gully susceptibility classes.

\* rep: Replicate.

**Table 4.** Actuarial features of the probability values inferred from the MARS model.


SD: Standard deviation.

#### *3.2. Evaluation of the Susceptibility in Gully Erosion*

The results of the goodness-of-fit are shown in Table 5. The results did not show considerable variation in the accuracy of the model, with altering the percentage of calibration to validation samples and number of model replications.


**Table 5.** Forecasting efficiency of the model given 10 data sets.

The MARS algorithm performed excellently both in predictive performance. It can be observed that the MARS model for combination of 80%/20% had the highest performance in terms of SST (0.88), SPF (0.83), E% (0.79), Kappa (0.58), Robustness (0.01) and AUC (0.84) compared to the other combinations. The combination of 70%/30% with five replicates only had the highest performance in terms of E% (0.79). Figure 6 illustrates the mean ROC curves for the MARS model through 10 replicates. The results of AUC as a threshold-driven method for other scenarios are as follows: (70%/30% with 5, 10 and 15 replicates: 0.80, 0.82, 0.83 respectively; 90%/10%: 0.83). The MARS model revealed from acceptable to excellent performances, with AUC values above the 0.77 and 0.8 thresholds. By adopting

the J index, 0.386 probability cut-off values were generated for the susceptibility gully erosion model (Figure 6). Based on these cut-off values, the probability of gully erosion occurrence for each cell was adapted into a binary (positive/negative) prediction to achieve the spatial distribution of cases correctly categorized (true positives and negatives (TP, TN)) and incorrectly classified (false positives and negatives (FP, FN)) for the MARS susceptibility model (Table 6). Figure 7 shows the distribution of true positive (TP), false positive (FP), true negative (TN), and false-negative (FN) cases within the study area. A larger true positive prediction is produced by the model, indicating that the conditions for gully erosion are widespread.

**Figure 6.** Average Receiver Operating Characteristic (ROC) curves for the MARS model through 10 replicates.

**Figure 7.** Contingency matrix applied for assessing the MARS model for all scenarios.


**Table 6.** Contingency matrix applied for assessment of the MARS model.

#### **4. Discussion**

The results of the model (based on all sample data sets) show different ranges of susceptibility values of gullying. The output map was divided into four classes of poor, high, and very high for each model by the Natural breaks classification approach [96,100,101]. As the regions of the area were categorized as high and very high, gully erosion susceptibility maps were complementary with the sections of the watershed with low slope and near to roads. For an average of gully susceptibility, more than 31% of the study area has a high (HGES) and very high sensitivity (VHGES). Approximately 23.63% of the research field was classified as moderate classes (MGES). A total of 44.72% of the cells in the study region classified into low susceptibility groups (LGES). On the other hand, the low density of gullies was observed in forest areas with high slopes. In the forest areas, roughness due to vegetation covering may lead to medium runoff factors on this area as well, hence, low degradation force of centralized flow [96]. Areas nearer to roads and streams, with sparse vegetation and higher drainage density than other areas, have more potential for gully occurrence. These findings are in line with [4]. The parts of the basin with low slope and near to roads classified as high and very high gully erosion susceptibility maps (HGES and VHGES), and high density of gully erosions occurred in these parts of the catchment. The distribution of gullies over lithological units by other studies observed that poorly sorted materials are favorable conditions for gully erosion development [102–104].

The MARS model precision for gully erosion susceptibility was assessed by both threshold-driven and non-threshold-driven methods as pointed out in the methodology section. To assess the robustness of the model's data sensitivity similar with references [8,22,35], 5, 10 and 15 sample data sets, (replicates) for 70%/30% sample size, were prepared through random selection of different data sets in the calibration and validation subsets. This method is in agreement with studies of Rotigliano et al. [36]. The MARS model reproduces non-linear relationships using several linear regressions. Hence, this allows MARS to generate models with a better fit to the training data set while maintaining high predictive power [5].

Given the accuracy, the MARS algorithm performed excellently in predictive performance. It can be observed that the MARS model for combination of 80%/20% had the highest performance in terms of SST (0.88), SPF (0.83), E% (0.79), Kappa (0.58), Robustness (0.01), and AUC (0.84) compared to the other combinations. Since the accuracy values are very similar for the all sample data sets, the MARS model was robust when the validation group changed [2]. As well, in the research of Rahmati et al. [29], three various classes of training samples were applied to accompany different machine learning models for predicting the susceptibility of gully erosion. Three different sample data sets (S1, S2, and S3), were randomly prepared to evaluate the robustness of the models. Their results illustrated accurate predictions. Additionally, it was found that performance of RF and RBF-SVM for modelling gully erosion occurrence is quite stable when the learning and validation samples are changed. Regarding forecasting efficiency, the results of AUC as a threshold-driven method for scenarios are as follows: (70%/30% with 5, 10 and 15 replicates: 0.80, 0.82, 0.83 respectively; 90%/10%: 0.83, 80%/20%: 0.84). The MARS model revealed from acceptable to excellent performances, with AUC values above the 0.77 and 0.8 thresholds [97]. This result demonstrated a strong agreement between the distribution of the existing gully erosion points and the final predictive susceptibility map. Additionally, the AUC values for all data sets are approximately similar and the modelling method can be considered as robust to changes in learning points. Our results similar and agree with studies of Gómez-Gutiérrez et al. [26], who also applied the MARS model to predict gully erosion occurrence, and obtained AUC in the range of 0.75–0.98. The other MARS application to gully erosion susceptibility evaluation was made

by Gómez-Gutiérrez et al. [27], who achieved a mean AUC of 0.826 and 0.859 in Spain and Sicily respectively. Accordingly, Conoscenti et al. [17] pointed out that even for the worst validation AUC value inferred from five datasets, MARS is much more than the best calibration AUC value calculated compared with the logistic regression (LR) model. In addition, Conoscenti et al. [102] evaluated gully erosion sensitivity in both surrounding farmed basins of Sicily (Italy) by using multivariable adaptive regression splines. Model assessment on the whole basins indicates the outstanding predictive performance of models. This finding supports our results. Based on cut-off values, the probability of gully erosion occurrence for each cell was adapted into a binary (positive/negative) prediction to achieve the spatial distribution of cases correctly categorized (true positives and negatives (TP, TN)) and incorrectly classified (false positives and negatives (FP, FN)) for the MARS susceptibility model (Table 6). A larger true positive prediction is produced by the model, indicating that the conditions for gully erosion are widespread. As Rahmati et al. [96] pointed out, random ordination of datasets is the main origin of uncertainty in spatial modelling. From the validation result, it is clear that the MARS model provided acceptable to excellent performance in predicting the probability of gully erosion occurrence based on independent and dependent assessment measures. In light of abovementioned results, it is obvious that the MARS model can be used as an efficient statistical model for the prediction gully erosion susceptibility map. This is in line with the other studies that applied this model to landslides and gully erosion susceptibility mapping [9,25–27,102]. This is a relevant issue to achieve sustainable land management where gully erosions must be restored when they have developed as a result of human mismanagement, and for this it is necessary to use nature-based solutions [6]. To achieve success in gully erosion control, the strategies must find a way to reduce the connectivity of the flows [7].

#### **5. Conclusions**

Recognition of a precise and calibrated model to alleviate errors in modelling gully erosion susceptibility and determining gully erosion susceptible areas is of great importance. The present study enriches the systematic assessment of the multivariate adaptive regression splines model (MARS) to model gully erosion susceptibility among others. The major concluding remarks might be written as below: the aforementioned MARS model not just confirmed superiority for either based on limits-independent and limits-driven approaches, at the same time led to precise forecasting while changing the sample dataset. Hence, it can be inferred that this model is superior to evaluate gully erosion susceptibility while research aims to generate an exact gully erosion susceptibility map (GESM) and to pave the way to offering information on the outstanding potential of predictors. Reconnaissance and gully alleviation controls are not cost-effective and at the same time are not time-effective, and hence, to develop comprehensive susceptibility forecasting system as per modelling seems to be a remarkable possibility. In the current research based on past studies [34] and multicollinearity tests, digital elevation model, aspect map, slope percent, curvature of profile, curvature of plan, land use (LU), soil texture, TWI, distance to streams, distance to roads, drainage density, annual rainfall, stream power index, relative slope position, lithological formation, K factor, melton ruggedness number, and topographic position index are significant factors that influence gullying in the study area. Additionally, in this study we tried to investigate gully erosion susceptibility through the MARS model and analyzed the performance and accuracy of this technique for zoning gully erosion. While changing training and validation spots, the precision of the forecasting model was determined as the consistency of outputs of the model in respect to model precision. Here, predictive performance and goodness of fit of the model was subjected to assessment using both threshold-driven and non-threshold-dependent methods. This validates the robustness as well as the effectiveness of this model. Zonation of the gully erosion susceptibility for that section of Gorganrood display regions with high hazard susceptibility as well as demonstrates a valid map, in which the result is useful to managers and stakeholders to recognize the most prone area gully erosion and dedicate inputs for soil protection actions in the best manner.

**Author Contributions:** N.J., A.K., Z.J., H.R.P. and C.C. designed the experiments, performed the experiments, analyzed the data, and contributed reagents/materials/analysis tools. N.J. wrote the paper, Writing—review and editing was done by A.K. and H.R.P.

**Funding:** This research received no external funding.

**Acknowledgments:** We are grateful to the Sari Agricultural Sciences and Natural Resources University for financial support (SANRU) and Department of Earth and Sea Sciences (DISTEM), University of Palermo.

**Conflicts of Interest:** The authors declare no conflict of interest.
