1. Introduction
As pointed out by many researchers, one of the most important natural resource worldwide is groundwater, with one third of the world’s population depending on it [
1,
2,
3,
4]. Several areas in the world are subject to overexploitation of groundwater, undergoing water shortages as a result of a difference between water supply and demand [
5]. It is also well established that the demand for groundwater will increase substantially in the following years, mainly due to the growing population and economic development [
6,
7,
8,
9,
10]. According to Curran and de Sherbinin [
11], even though the supply of water is mainly controlled by climatic parameters, the management and the followed practices significantly influence the availability of water. In the case of groundwater resources, inappropriate management may result in the deterioration of water resources but also a decrease in the quality of groundwater [
12]. Similar with the rest of the world, China faces increasing consumption of groundwater, making imperative the application of accurate methods for assessing groundwater potential [
13,
14,
15,
16].
Groundwater spring potential mapping has been recognized as an investigation practice, the outcomes of which provide useful data inputs concerning groundwater management projects [
17]. Identifying areas with high probability concerning the presence of groundwater springs assists in developing appropriate groundwater exploitation and groundwater resources conservation programs [
18,
19]. Over the past two decades, geographical information systems (GIS) and remote sensing techniques (RS) have been the main investigation tools concerning groundwater spring potential mapping [
4,
17,
18,
20,
21,
22]. Successful examples of studies concerning groundwater potential mapping involve bivariate and multivariate methods and specifically applications of frequency ratio (FR) [
13,
17,
23], analytical hierarchy process (AHP) [
20,
24,
25,
26,
27], weight of evidence (WofE) [
18,
23,
28,
29], evidential belief function (EBF) [
19,
30,
31,
32,
33], and logistic regression (LR) [
28,
34,
35].
Similar, machine learning methods have been introduced as an alternative option for groundwater potential, mapping mainly involving tree-based methods, such as classification and regression tree (CART) [
36] and random forest (RF) [
19,
26,
37], and neural network-based methods, such as artificial neural network (ANN) [
18,
38,
39] and support vector machine (SVM) [
40]. Other notable examples of machine learning methods that have been utilized in groundwater potential mapping assessments are the implementations of naive Bayes (NB) [
41] and K-nearest neighbor (KNN) [
42].
Quite recently, new hybrid and ensemble methods have been applied in groundwater mapping studies, showing in most cases enhanced performance than single predictive models [
12]. Chen et al. [
43] produced groundwater potential maps integrating WofE with LR and functional tree (FT) models, the validation of which clearly highlighted the efficacy of the integrated models. The authors reported that the integrated models provided better results, overcoming the drawbacks of bivariate statistics and machine learning. Khosravi et al. [
44] proposed five hybrid artificial intelligence methods, integrating an adaptive neuro-fuzzy inference system (ANFIS) and meta-heuristic algorithms. The outcomes of their study illustrated that by applying the novel hybrid models they could produce more accurate groundwater potential models. Kordestani et al. [
33] proposed an ensemble method, integrating EBF and boosted regression tree (BRT), reporting that the EBF–BRT model was capable of providing highly accurate results. The authors suggest that the produced model improves the weak points of each method, while taking advantage of the ability of the methods to analyze the relation of groundwater with each groundwater-related variable and with each class of the groundwater-related variable. In a similar study, Chen et al. [
15] integrated an ANFIS model with a teaching–learning based optimization (TLBO) and a biogeography-based optimization (BBO) model. According to the authors, the two novel data mining methods could be useful in solving non-linear and high-dimensional problems and overall could be useful for groundwater management and exploration development projects.
In this context, the current study presents a novel hybrid integration approach of FR with artificial intelligence-based kernel logistic regression (KLR), alternating decision tree (ADTree), and RF models for groundwater spring potential mapping, having as a test site the Ningtiaota region, China. A hybrid integration approach of FR, KLR, ADTree, and RF is a relatively new contribution that has been seldom used for modeling of groundwater spring potential areas. It should also be mentioned that limited studies have been conducted concerning groundwater spring potential mapping in China, therefore, this research aims to fill this gap in the literature.
2. Study Area
Ningtiaota region is located in the northern territory of Shaanxi Province, China. The climate is characterized as dry throughout the year. The maximum and minimum temperatures are 38.9 °C and −29.0 °C, respectively, the average relative humidity is 56%, the average wind speed is 13.4 m/s, and the average annual rainfall is 434.1 mm. The study area, which is a portion of the Ningtiaota region, defined and limited to the area where data were available, is a geographical area of 119.77 km
2, located within latitudes 38°57′30′′ to 39°7′57′′ N and longitudes 110°9′36′′ to 110°16′20′′ E (
Figure 1). According to the Soil Map produced by the Institute of Soil Science, Chinese Academy of Sciences [
45], the typical soil types that cover the study region are Calcari-Gypsiric Arenosols (Arc), Haplic Arenosols (ARh), Calcareous Red Clay (CMe), and Luvi-Calcic Kastanozems (KSk)
Topographically, altitudes vary from 1118 to 1364 m above the sea level, and slope gradients vary from 0 to 37.88° based on a digital elevation model (DEM) with a 30 m regular grid. Approximately 75.38% of the area appears with less than 10° slope surface, whereas only 0.097% of the total study area have slopes greater than 30°. Areas with the slopes between 10 and 20° and 20 and 30° account for 21.77% and 2.74%, respectively.
Geologically, the strata of the study area belong to the Ordos Basin stratigraphic subarea in the North China stratigraphic area. Based on the geological map of China (
http://www.cgs.gov.cn), the strata in the area from old to new are Yan’an formation (J
2y), Zhiluo formation (J
2z), Anding formation (J
2a), Baode formation (N
2b), Lishi formation (Q
2l), Salawusu formation (Q
3s), Malan formation (Q
3m), Alluvium (Q
4al), and Eolian deposit (Q
4eol), respectively (
Table 1).
4. Data Used
A crucial aspect in groundwater spring potential mapping process is to identify spring locations. Based on extensive field surveys conducted during 2006–2017, 66 springs were detected in the study area (
Figure 1 and
Figure 3a,b). An equal number of 66 nonspring locations were randomly selected from the free of spring’s space by applying the Create Random Points function found in the Data Management Tools in the ArcGIS platform [
69]. The spring and nonspring locations were randomly divided into two subsets, by using the Subset tool in the Geo statistical extension package of the ArcGIS platform [
69]. The first subsets consisted of 46 spring and 46 nonspring locations, 70% of the total number of springs and nonspring areas and were used for training, whereas the second subset consisted of the remaining 30% (20 spring and 20 nonspring locations) and were used for validation.
Generally, several spring explanatory factors may influence spring occurrence, however, there are no guidelines for the selection of spring explanatory factors. Therefore, in the present study and based on the experienced gained from previous studies, 14 spring explanatory factors, including slope aspect, slope angle, plan curvature, profile curvature, elevation, stream power index (SPI), sediment transport index (STI), topographic wetness index (TWI), distance to streams, distance to roads, normalized difference vegetation index (NDVI), lithology, soil, and land use, were selected and prepared for further analysis within a GIS environment [
43,
70,
71]. Eight geomorphometric factors, including slope aspect, slope angle, plan curvature, profile curvature, elevation, SPI, STI, and TWI, were extracted from the ASTER GDEM version 2 sensor (
http://www.jspacesystems.or.jp/ersdac/GDEM/E/index.html) with a resolution of 30 m. These spring explanatory factors were reclassified into categories (
Table 2) based on the outcomes of frequency analysis concerning spring occurrence and also characteristics of the study area. The distance-to-streams and distance-to-roads maps were produced using the topographic maps at 1:10,000-scale. The NVDI was calculated using Landsat 8 OLI (path/row 126–33) obtained on 4 November 2017 (available at
http://www.gscloud.cn). A lithological map was extracted from the geological map at a scale of 1:10,000 and constructed with nine classes based on lithological similarities [
43,
72]. The soil types were extracted from soil maps at 1:1,000,000-scale in the study area and were classified into four classes [
43,
73]. In addition, the land use map was extracted from land use maps at 1:100,000-scale with six land use types based on the supervised classification method and maximum likelihood algorithm [
19]. All the spring explanatory factors were finally converted into the same spatial resolution of 30 × 30 m
2 (
Figure 4).
6. Discussion
As several studies report, the significance and predictive power of spring related factors that are used in groundwater spring potential assessments are controlled by the geological, morphological, hydrological, and climatic settings of the area [
19,
22,
79,
80,
81]. According to Ozdemir [
35], topographic features, such as elevation and slope, have a negative influence with groundwater spring potential, and on the other hand TWI and drainage density have a positive influence. Similar studies, report that topographic features along with the characteristics of the soil cover, tectonic features (fault density and distance to faults), and also hydrological features (drainage density) influence the rainfall-runoff rate and also the infiltration rate, thus possibly affecting the groundwater spring potential occurrence [
19,
35]. Chen et al. [
43] reported that lithology, elevation, and distance to streams had a greater influence, whereas land use, NDVI, plan, and profile curvature appear to have the least influence.
During the present study, the implementation of LSVM revealed that lithology had the highest predictive power, followed by elevation, SPI, and soil cover. Concerning the lithology factor, lithological and structural differences lead to variations in the durability and permeability of rock and soil formations, and thus the presence of springs [
35]. Based on the FR analysis, groundwater springs are more probable to be found in southeast facing slopes, in areas with slopes angles ranging between 15 to 25 degrees and elevation lower than 1150 m. Concerning slope angles, the outcomes of the study are persistent with previous studies that report that areas with slopes greater than 35° are considered unfavorable since as the slope increases so too does runoff, having as a result reduced infiltration rates [
82,
83]. Moreover, the most spring-probable areas are covered by Haplic Arenosols (ARh) soils, which are coarsely textured sandy soils, permeable to water, and Calcic Kastanozems (KSk) soils, which are characterized by a rather restricted water transmission with higher portions of clay particles. According to Srivastava and Bhattacharya [
84], sandy soils and coarse sandy clays appear as potential favorable storage bodies due to their light texture and excellent rate of infiltration, which is persistent with the findings of our study.
Within the research area, sand, mudstone, and sandstone formations appear to be more likely to contain springs. Similar findings were found by the authors in a previous study concerning the area of research [
43]. Mudstone layers, which could be defined as formations with very low infiltration capacity, form an impermeable layer while sand and sandstone formations act as permeable layers allowing the concentration of surface water within their mass. The alternation of these layers permits the formation of groundwater springs as can be found in the area of research.
An interesting point that should be mentioned is the high predictive value of the factor distance to roads. The distance-from-road network is considered to have an influence on the occurrence of groundwater springs since its presence can cause local hydrological and erosion issues while affecting indirectly the groundwater table [
85]. Also, the presence of a road may influence the amount of soil moisture but also the infiltration rate as a result of the removal of geological formations and the disturbance of the surface during of the construction phase [
43,
85].
Concerning the validation and comparison of the three models (KLR, RF, and ADTree), the RF model appears to provide slightly higher AUC values, lower SE values, and shorter CI intervals than the other two methods. Several studies have indicated that RF models have higher accuracy, compared to other models. According to Naghibi et al. [
12], who applied support vector machine (SVM), RF, and genetic algorithm optimized RF (RFGA) methods to assess groundwater potential by spring locations, RF and optimized RF models outperformed SVM models. According to Golkarian et al. (2018) [
86], this could be attributed to the methodological approach they followed, which involves aggregating the outcomes of many decision trees in order to limit overfitting effects as well as to limit error due to bias and error due to variance, thus producing more accurate predictions. However, other studies report that the performance of RF models could be influenced by the presence of datasets with noisy data and by the presence of data that includes categorical variables with different numbers of levels where, in such a case, RF models are biased in favor of those variables that appear with more levels [
36]. In the present study, KLR gave more accurate results than those from the ADTree model. In similar studies concerning landslide susceptibility assessments, which implemented KLR and ADTree methods, it was found that KLR produced more balanced results for the training and validation datasets in terms of the statistical index, while the ADTree models showed significant variance [
74]. Finally, although the presented models appear to have satisfactory predictive performance, it must be kept in mind that their results are influenced by the quality and quantity of the available input, and also the identification of nonspring areas. Concerning future work, the presented approach could be applied to an area with different geo-environmental settings or include in the analysis dynamic variables, such as precipitation and temperature, that may vary over short timeframes, so as to estimate the efficiency of the proposed models.
7. Conclusions
In the present study, three artificial intelligence methods (KLR, RF, and ADTree) were utilized for the generation of a groundwater spring potential map for the Ningtiaota region, which is located in the northern territory of Shaanxi Province, China. A linear support vector machine method was used as a feature selection method so as to determine the optimal set of factors, which included fourteen explanatory factors (elevation, slope, aspect, plan curvature, profile curvature, stream power index, sediment transport index, topographic wetness index, distance to streams, distance to roads, NDVI, lithology, soil, and land use). The performed analysis highlighted the higher predictive power of the spring explanatory factors lithology, elevation, SPI, and soil cover. These four factors significantly influence the prediction accuracy. The comparison between the performances of KLR, RF, and ADTree models revealed that the RF model had higher prediction accuracy than the other two models, based on the results of higher values of AUC metric, lower SE values, and shorter CI intervals. The RF model’s ability to limit overfitting effects may be the reason for its higher predictive performance. While remembering that the results obtained by tree-based artificial intelligence approaches could be influenced by the quality and quantity of data, overall they could be appreciated as accurate and reliable investigation tools in groundwater spring potential assessments.