Machine Learning Models for Prediction of Soil Properties in the Riparian Forests

Zolfaghari Nia, Masoud; Moradi, Mostafa; Moradi, Gholamhosein; Taghizadeh-Mehrjardi, Ruhollah

doi:10.3390/land12010032

Open AccessArticle

Machine Learning Models for Prediction of Soil Properties in the Riparian Forests

by

Masoud Zolfaghari Nia

¹,

Mostafa Moradi

^1,*

,

Gholamhosein Moradi

² and

Ruhollah Taghizadeh-Mehrjardi

^3,4,*

¹

Department of Forestry, Faculty of Natural Resources, Behbahan Khatam Alanbia University of Technology, Behbahan P.O. Box 63616-47189, Iran

²

School of Natural Resources and Desert Studies, Yazd University, Yazd P.O. Box 89168-69511, Iran

³

Faculty of Agriculture and Natural Resources, Ardakan University, Yazd P.O. Box 89518-95491, Iran

⁴

Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, 72076 Tübingen, Germany

^*

Authors to whom correspondence should be addressed.

Land 2023, 12(1), 32; https://doi.org/10.3390/land12010032

Submission received: 11 November 2022 / Revised: 9 December 2022 / Accepted: 19 December 2022 / Published: 22 December 2022

(This article belongs to the Special Issue Recent Progress in Carbon Cycling in Drylands)

Download

Browse Figures

Versions Notes

Abstract

:

Spatial variability of soil properties is a critical factor for the planning, management, and exploitation of soil resources. Thus, the use of different digital soil mapping models to provide accuracy plays a crucial role in providing soil physicochemical properties maps. Soil spatial variability in forest stands is not well-known in Iran. Meanwhile, riparian buffers are important for several services such as providing high water quality, nutrient recycling, and buffering agricultural production. Accordingly, in this research, 103 soil samples were taken using the Latin hypercubic method in the Maroon riparian forest of Behbahan and agricultural lands in the vicinity of the forest to evaluate the spatial variability of soil nitrogen, potassium, organic carbon, C:N ratio, pH, calcium carbonate, sand, silt, clay, and bulk density. Different machine learning models, including artificial neural networks, random forest, cubist regression tree, and k-nearest neighbor were used to compare the estimation of soil properties. Moreover, three main sources of spatial information including remote sensing images, digital elevation model, and climate parameters were used as ancillary data. Our results indicated that the random forest model has the best results in estimating soil pH, nitrogen, potassium, and bulk density. In contrast, the cubist regression tree indicated the best estimation for organic carbon, C:N ratio, phosphorous, and clay. Further, artificial neural networks showed the best estimation for calcium carbonate, sand, and silt contents. Our results revealed that geospatial information such as terrain parameters, climate parameters, and satellite images could be well used as ancillary data for the spatial mapping of soil physiochemical properties in riparian forests and agricultural lands. In conclusion, a specific machine learning model needs to be used for each soil property to provide highly accurate maps with less error.

Keywords:

random forest; artificial neural networks; cubist; ancillary data; soil properties

1. Introduction

Soil physiochemical properties are used as convenient features for soil processing and evaluating soil condition improvement [1]. Thus, it is essential to provide this information on maps as a valuable source of information for planning and land use evaluation on local and national scales. Thus, the demand for soil information maps with high spatial resolution is essential [2]. Further, soil mapping represents a soil information database which provides access to every soil property spatial change [3]. In the forest ecosystem, plant coverage is affected more by soil properties than by any other factor. Therefore, the evaluation of soil nutrient spatial mapping in ecosystems is a fundamental tool for monitoring changes in ecosystems due to the labor and associated costs of the sampling [3]. In addition, regardless of the access to the high spatial resolution satellite, the identification of the vegetation condition still faces difficulties [4].

McBratney et al. [5] reviewed different approaches for digital soil mapping (DSM) and its capability of providing comprehensive soil physiochemical plus biological mapping in large areas using machine learning (ML) methods, geographical information systems (GIS), and remote sensing (RS) data. Meanwhile, the most critical soil factors in DSM are soil horizons, color, texture, classification, erosion, and drainage [6]. ML models are one of the best methods for the analysis of point data as well as prediction, interpolation, and providing maps [7]. Ließ et al. [8] reported that the gradient boosting model (GBM) outperformed the neural network (NN), Random Forest (RF), and SVM in predicting soil organic carbon in a complex tropical mountain landscape in Ecuador. However, Were et al. [9] found SVM to be the best method to predict SOC stocks in the Afromontane Forest in Eastern Africa. Viscarra Rossel and Behrens [10] reported that the smallest RMSE values were found with the SVM approach, used to predict three soil properties, including SOC, clay content, and pH.

Among the different ecosystems, the forest ecosystem’s spatial heterogeneity is a consequence of tree activities by several means, including root penetration to the soil, litterfall, and dead wood [11,12]. Forest soil studies are an indicator of soil quality and reported by many researchers [13,14,15]. Riparian areas are constantly changed by human activity and natural events, thus warranting riparian monitoring [4]. The riparian buffer is essential in water quality improvement, riverbank side erosion, nutrient recycling, providing habitat for fishes and wildlife [16,17], and carbon stock [18]. The most abundant tree species in the riparian forests of the arid and semi-arid ecosystems are Populus euphratica and Tamaris sp. [19]. Not much information is available about the DSM of these ecosystems. To the best of our knowledge, no information is available for digital soil mapping in riparian areas. Since riparian forests are negatively affected by human activity [15,20], increasing our knowledge about them would be an essential step toward restoring their natural potential.

These days, new methods and software processing have helped us to gain a more detailed understanding at lower cost using RS data for monitoring of the riparian zone by managers remain a challenge for successful management [21]. Further, to the best of our knowledge, no information is available about soil spatial changes in riparian forests. Thus, DSM can provide a base for using quantitative and statistical methods to achieve soil spatial changes patterns in the riparian forests using less laboratory and field work. The objective of the present study is to examine the potential of using ML methods in providing DSM of the riparian buffer. The other aim is to test Artificial Neural Networks (ANN), Decision Tree Model (DTM), Random Forest (RF), Regression Tree Cubist (RTC), and K-nearest neighbors (K-NN) in preparing the maps. We hypothesized that ML methods can generate soil physiochemical properties maps of the riparian forest with reasonable accuracy and precision.

2. Materials and Methods

2.1. Study Site

The study site is the Maroon riparian forest of Behbahan, Khuzestan, Iran. Its location is at 32°38′53″–30°39′30″ N, and 50°09′30″–50°10′25″ E. The mean annual precipitation is 350 mm and the mean annual temperature is 24°. Based on the Amberje climate index, this area has a dry climate. Moreover, the elevation range is 250–300 m a.s.l. The study site is composed of three types of stand; pure Populus euphratica stands, pure Tamarix arceuthoides stands, and mixed Populus euphratica and Tamarix arceuthoides stand [15]. The soil type in the studied site is Fluvent (USDA 2014) [19].

2.2. Procedures

This study was conducted in several steps (Figure 1) according to the DSM framework: (1) soil sampling and analyzing soil properties, (2) acquiring the ancillary data from three main sources, (3) selecting the most important ancillary data using Boruta algorithm, (4) intersecting georeferenced sample points with the selected ancillary data and preparing geodatabases, (5) training and testing five ML models, and (6) applying the best ML model on the selected ancillary data to prepare soil maps.

2.3. Soil Data Sampling and Analysis

Sampling points were selected based on the Latin hypercubic (LHS) method, which is a randomized but classified method, which results in more efficient sampling [22]. A total of 103 sample points were selected in the riparian forest and the agricultural lands around the riparian forest. Soil samples were taken at 20 cm depth by soil auger and collected from the riparian forest as well as agricultural lands (Figure 2).

Soil samples from each point were taken and transferred to the laboratory. Air-dried soil samples were passed through a two-millimeter sieve and used for further analysis. Soil pH was measured in deionized water (1:2.5 soil to deionized water suspension) [23]. The chromic acid wet oxidation method was used for soil organic carbon determination [24], and the Kjeldahl method [25] was used for determining the soil nitrogen by converting the nitrogen to ammonium during the digestion with concentrated sulfuric acid [25]. The method described by Olsen et al. was used to determine the soil available phosphorous using NaHCO₃ [26]. Soil exchangeable potassium was measured by ammonium acetate solution [27]. In addition, the hydrometer method was used to determine the soil clay, silt, and sand by sedimentation procedure [28].

2.4. Ancillary Data

In the current study, three main sources of spatial information including remote sensing images (RS), digital elevation model (DEM), and climate parameters were used as ancillary data. Specifically, three satellites were used, including MODIS (250 m resolution), Landsat-8 (30 m resolution), and Sentinel-2 (10, 20, and 60 m resolution). Further, satellite indices (e.g., NDVI; Normalized difference vegetation index) were calculated from these three RS images. The formula and references to estimate every RS index are reported in Table 1. Day and night land surface temperatures were also obtained from MODIS products (1000 m resolution).

MODIS (or Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (originally known as EOS AM-1) and Aqua (originally known as EOS PM-1) satellites. Terra’s orbit around the Earth is timed so that it passes from north to south across the equator in the morning, while Aqua passes south to north over the equator in the afternoon. Terra MODIS views the entire Earth’s surface every 1 to 2 days, acquiring data in 36 spectral bands.

The Landsat-8 OLI (Operational Land Imager) is a 12-bit multispectral sensor with nine reflective bands from 0.43 to 2.29 μm including four visible, one near-infrared (NIR), two shortwave infrared (SWIR), one cirrus, and one panchromatic band [29]. In this study, the 30 m blue (0.45–0.51 μm), green (0.53–0.59 μm), red (0.64–0.67 μm), near-infrared (NIR) (0.85–0.88 μm), and the shortwave infrared (SWIR-1) (1.57–1.65 μm) plus (SWIR-2) (2.11–2.29 μm) bands were used.

Sentinel-2 is a Satellite developed by the European Space Agency (ESA). The Sentinel-2 mission has also a pair of satellites: Sentinel-2A and sentinel-2B were launched on 23 June 2015 and 7 March 2017, respectively. Sentinel-2 has a various spatial and temporal (ten-day revisit time each satellite and together five-day return cycle) resolution satellite with 13 spectral bands and a 290 km field wide-swath of view, making it appropriate for monitoring vegetation, soil properties, and aquatic environments. Sentinel-2 has four spectral bands at 10 m, six bands at 20 m, and three bands at 60 m.

Furthermore, we derived 19 terrain attributes (e.g., mid-slope position and topographical wetness index) from a preprocessed Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) with 30 × 30 m resolution using SAGA GIS [30]. The list of terrain attributes is outlined in Table 2. In addition, the mean annual of temperature and precipitation were obtained from WorldClim (1000 m resolution) [31].

All ancillary data were aggregated (average resampling) or disaggregated (bilinear resampling) to a common grid of 30 × 30 m spatial resolution (Table 2). Note that ancillary data can be highly correlated, thus challenging soil modeling and the interpretation of those models. Accordingly, it is necessary to reduce the size of ancillary data by removing redundant informative variables thereby achieving a robust prediction model and reducing the risk of over-fitting. To reduce the size of ancillary data, in the current research, we used the Boruta algorithm. Boruta algorithm is based on random forest designed to automatically perform variable (i.e., ancillary data) selection on a database. Simply, the algorithm is developed to decrease the ancillary data and provide more accurate predictions [32]. Practically, the algorithm creates randomized ancillary data. At this point, the new database would include the original ancillary data and the randomized ancillary data (i.e., shadow variables). Then, the importance of each ancillary variable is computed and compared with a threshold, which is defined as the highest variable importance obtained among the randomized ancillary data. Those ancillary data are selected only if their importance is greater than the best randomized ancillary data.

2.5. Machine Learning Models

A classification and regression tree (CART) is among the most powerful algorithms for predictive machine learning models. By employing a tree-based structure, the algorithm can reveal and model the hierarchical relationships between the ancillary data and soil properties [33]. Creating the decision tree involves two stages; first, generating and growth of the tree, which includes bonding and branching, while the second is the pruning. This stage aims to minimize the prediction error [34].

A random forest (RF) classifier is an ensemble learning of decision tree algorithms. The RF algorithm is based on an ensemble of individual decision trees, which are fitted using a random bootstrap sample of the training data. To further incorporate randomness into the model, a random subset of the predictor variables is selected to create each node-splitting rule within the individual decision trees. The ensemble process is intended to mitigate the effects of model overfitting. The RF classifier is not only simple and fast [35] but also is a robust predictive model [36].

Artificial neural networks (ANN) are a kind of artificial intelligence technology designed to learn rules from examples and simulate the human brain’s neural function [37]. ANNs can detect patterns and draw results; thus, they can be used for data prediction with correlation, such as soil properties [9,38]. The ability to handle and model multiple outputs simultaneously is a primary benefit of ANN techniques. ANNs are interconnected by structures called perceptrons and consist of input (i.e., ancillary data), output (i.e., soil properties), and hidden layers which transform the input into new features the output layer can utilize.

Regression tree cubist (RTC) is a prediction algorithm based on the model tree approach [39]. This approach provides a prediction model based on the sets of rules [40] and has been widely applied for prediction of soil properties [41]. Although RTC uses the if-then-rules to split the datasets similar to other regression tree models, it uses the linear regression model for fitting to the leaf nodes of the trees.

K-nearest neighbors (K-NN) is a nonparametric approach that has been used for vegetation [42] and soil mapping [43]. This technique is substantially based on the principle of similarity and proximity of data. Briefly, the KNN classifier assigns a value to the new location which is in the closest proximity to the k training points within multivariate ancillary space.

To validate the DTM, RF, ANN, RTC, and KNN, the root means square error (RMSE), mean absolute error (MAE), and correlation (R²) were computed from measured soil properties values and predicted soil properties values (Equations (1)–(3)). To evaluate the model, we randomly divided the data into two groups. The first group consisted of the training data (80% of the data), and the second group included the test data (20% of the data). Note that to achieve a more robust model, the procedure was repeated 10 times.

R M S E = \sqrt{\frac{1}{n} \sum_{j = 1}^{n} {(y_{j} - \hat{y_{j}})}^{2}}

(1)

M A E = \frac{1}{n} \sum_{j = 1}^{n} |y_{j} - \hat{y_{j}}|

(2)

R^{2} = 1 - \frac{\sum_{j = 1 {(y_{j} - \hat{y_{j}})}^{2}}^{n}}{\sum_{j = 1}^{n} (y_{j} - (\frac{1}{n} \sum_{j = 1}^{n} y_{j}))}

(3)

where

\hat{y_{j}}

,

y_{j}

, and n are the estimated, measured, and the number of measured values, respectively.

3. Results and Discussion

3.1. Summary of Soil Properties

The results of physiochemical soil properties presented in Table 3 indicated that the mean nitrogen and organic carbon for the studies were 0.01 and 1.25%, respectively. In contrast, the mean phosphorous was 89.8 ppm. The minimum (10.30 ppm) and maximum (214.33 ppm) phosphorous appeared in the riparian buffer and agricultural land, respectively. The soil pH was 7.5 and, calcium carbonate was 4%. Moreover, the mean soil bulk density was 1.5 g/cm³, ranging within 1.28–1.85 g/cm³, in the agricultural lands and riparian forests, respectively. The mean sand value was 53.02%, ranging from 20 to 96% in the studied site. Although the mean soil organic carbon was 1.26, its minimum and maximum values were 0.11 and 2.79%, respectively (Table 3).

3.2. Importance of Ancillary Data

The importance of implemented ancillary data in predicting soil properties was obtained based on the ranking criterion employed in the Boruta algorithm. The results revealed a wide range of variability depending on both the ancillary data and soil properties (Figure 3). The result of the ancillary data in predicting soil physiochemical properties in the riparian area indicated that the best ancillary variables for bulk density prediction are LA.08 (Landsat-8 enhanced vegetation index), LA.04 (near infrared band of Landsat-8), and LA.12 (Landsat-8 brightness index), respectively (Figure 3). It was not surprising that these three ancillary variables were recognized as the most important variables in explaining the spatial variability of clay content. This is because of the high negative correlation between clay content and bulk density (r = −0.94) in the study area. Interestingly, terrain attributes (e.g., TE.13: Normalized height) were more successful in predicting sand and silt contents compared to the RS ancillary data, though there are also influences of the MODIS ancillary data (MO.04: soil adjusted vegetation index) and climate parameters (CL.02: annual mean precipitation). Similar to our findings are those reported by Taghizadeh-Mehrjardi et al. [44] and Mirzaeitalarposhti et al. [45], who found terrain attributes to be the most important predictors in soil texture fraction prediction. Figure 3 further indicates that climate parameters (e.g., CL.02: annual mean precipitation) were the most critical ancillary data for predicting soil calcium carbonate. Nevertheless, some ancillary data from Landsat-8, Sentinel-2, and MODIS also play important roles in predicting calcium carbonate. Taghizadeh-Mehrjardi et al. [46] also reported that RS data were particularly effective in the prediction of calcium carbonate in arid regions due to its presence on the soil surface, thus making it more visible in satellite imagery. Contrary to calcium carbonate, for predicting pH in the current research, terrain attributes (TE.14: slope position) were the best. Similarly, Mahmoudzadeh et al. [47] confirmed the importance of terrain attributes for the estimation of pH at soil surfaces of semi-arid regions. Terrain attributes also did a very good job to explain the variability of C and C:N ratio, though we expected the highest influence of RS data. Previous results by other researchers [41,47,48] indicated that SOC variability could be described by terrain variables (e.g., slope, aspect, and curvature). Elevation may cause local changes in erosion, vegetation, and sunlight, which together with the chemical and physical composition of the organic matter are the main controllers of soil organic matter decomposition. For predicting the spatial distribution of soil nutrients (i.e., N, P, and K), climatic parameters, terrain attributes, and RS data share fairly similar roles, indicating the importance of using a combination of ancillary data In DSM. Our results are in line with Zeraatpisheh et al. [49], who indicated that the combination of datasets generated spatial soil maps with lower uncertainty.

3.3. Machine Learning Performances

The performance of five ML models in terms of R², MAE, and RMSE is presented in Table 4. The results of different studied ML models indicated that RF provides the best result for soil N, K, C:N, pH, and BD estimation due to the highest coefficient of determination (0.56, 0.49, 0.51, 0.46, and 0.64, respectively). RF has been used around the world for predicting soil parameters [50,51] and indicated highly accurate prediction of soil properties [52]. This model revealed the best prediction for soil bulk density and pH compared to the other ML models. Dangal et al. [53] and Silva et al. [54] also reported that RF is the best model for predicting soil properties. A possible reason for the high prediction accuracy of the RF method is that it is less sensitive to noise and has low model bias as well as variance.

Furthermore, RTC provided the best performance in soil organic carbon (R² = 0.56, RMSE = 0.54, MAE = 0.47) and phosphorous (R² = 0.48, RMSE = 48.72, MAE = 41.66) estimation. RTC also indicated the best results for estimating soil clay (R² = 0.62, RMSE = 8.20, MAE = 7.09). Our experience here was also similar to the conclusions achieved by Adhikari et al. [55] and Zeraatpisheh et al. [56] who successfully employed RTC to map soil properties. However, Taghizadeh-Mehrjardi et al. [57] reported RF as the best ML model for predicting SOC. The studies mentioned above have shown that the output of ML models varies significantly from study to study. Although it is difficult to explain the reasons for these differences, the difference could be due to the different extents of the study areas, topography, sampling densities, or quantity and quality of the environmental covariates used.

The ANN model achieved a higher performance for estimating the soil calcium carbonate compared to the other machine learning models (Table 4), which is in line with the findings of Taghizadeh-Mehrjardi et al. [46]. They reported that the ANN models performed the best in predicting SOM, calcium carbonates, and gypsum content in arid regions. Further, the best estimation for sand and silt was obtained from ANN with coefficients of determination of 0.66 and 0.57, respectively (Table 4). Similar to our findings, Taghizadeh-Mehrjardi et al. [58] and Taghizadeh-Mehrajrdi et al. [44] demonstrated the effectiveness of ANN models for predicting particle size fractions of soils. Note that the computational time of ANN models has been higher compared to the other models, such as RF and RTC. Nevertheless, in this study, the efficiency of the computational process was not a serious issue due to the limited size of our dataset.

Finally, ML plays an essential role in predicting physiochemical soil properties. The results of the present study also revealed that ML could be a good predictor of soil properties in riparian forests.

3.4. Spatial Pattern of Maps

As the best goodness of fit was found for the RF, RTC, and ANN models, they were used to map the spatial distribution of soil properties across the study area (Figure 4). The highest soil nitrogen value was observed far from the river line, in the eastern part of the study site (0.017%) (Figure 4). These parts are indicated by blue color and primarily represent agricultural lands that are cultivated with Medicago sativa and Brassica napus. Further, red colors represent the minimum values for nitrogen. Central east and western parts of the studied site represent the lowest values for nitrogen (0.007%). Bare lands with no plant cover in the central east of the studied site represent the minimum nitrogen value (Figure 4). The green line alongside the river path, especially in the western part of the study site where the river is divided into two branches, is apparent and contains dens Tamarix arceuthoides and mixed Tamarix arceuthoides and Populus euphratica stands, representing less nitrogen compared to the agricultural lands far from the river line (Figure 4). The red and yellow colors around the river line are either bare lands or open-spaced stands of Tamarix arceuthoides with tracks for local people. The riparian buffer in the study site is represented by two green lines in the western part of the study site, which is noticeable in the generated soil nitrogen map. Our result indicated the importance of plant coverage for enhancing soil nitrogen [15], which is clearly seen by the generated soil nitrogen map. In the present study, the differences between bare lands and covered land in plants exist by different soil nitrogen and even different plant types are separated by different soil nitrogen. The soil nitrogen map revealed that in the studied riparian buffer, higher nitrogen values belonged to the places with the pure Populus euphratica stands than Tamarix arceuthoides stands. This might be related to the higher degradation rate of Populus euphratica leaves compared to the Tamarix arceuthoides leaves, due to less lignin component in Populus euphratica leaves compared to the Tamarix sp. [59].

The result of soil phosphorus spatial variability in the studied site indicated that lands along with the river line contained high values of phosphorus and were represented by green and blue colors. Compared to the bare lands (red and orange colors), riparian forests contain higher phosphorus values (Figure 4). Moreover, a high value of phosphorus was observed in agricultural lands. It indicated the highest values of phosphorus in the agricultural lands compared to the even riparian forest (Figure 4). The minimum soil potassium was observed in the bare lands with no plant cover (Figure 4). These areas are in the western and eastern parts with red colors (Figure 4). Agricultural lands indicated the highest values for soil potassium with a blue color (Figure 4). Furthermore, in the western part of the study site with open Tamarix arceuthoides stands, lower values of potassium were observed (Figure 4). Soil potassium and phosphorous indicated higher values in the agricultural lands. Moreover, higher values of these two elements observed closed to the river line in the riparian zone. Soil potassium and phosphorous are added annually to the lands by farmers, which might be the reason for increasing values of these elements in the agricultural lands. Riparian forests, compared to the open stands and bare lands, had higher soil potassium and phosphorous, and this appeared in the generated maps. It indicated the role of the riparian plant species in increasing these elements by litter fall [60] as well as the performance of ancillary data and machine learning to predict soil properties. A closer look for phosphorous revealed that there is high presence of phosphorus in the river. This result indicates high-quality agricultural lands around the riparian zone [61] and phosphorus wash off from agricultural lands to the river [62].

Soil organic carbon spatial variability in the studied site revealed the minimum organic carbon in the open spaced Tamarix arceuthoides stands and bare lands (Figure 4). In the central studied site, where open-spaced Tamarix exists with many local tracks, the lowest organic carbon was observed (Figure 4). However, the highest organic carbon values (blue color) belonged to the agricultural lands and Populus euphratica stands (Figure 4). The minimum C:N values appeared where Tamarix arceuthoides stands exist (Figure 4). Nevertheless, the highest values belonged to the Populus euphratica stands (Figure 4). (Figure 4). Agricultural lands had medium C:N (Figure 4). The minimum organic carbon was observed in the bare lands and local tracks in the open-spaced Tamarix arceuthoides stands. This is because human disturbance caused by walking resulted in the elimination of the plant cover in these areas. Local people and their livestock have made many tracks in the open-spaced Tamarix arceuthoides and their movement has resulted in plant elimination and soil compaction in these areas [63]. Thus, these conditions lead to diminishing soil organic carbon. Our result indicated the importance of prediction maps to discover these changes and also represented the importance of disturbances in the riparian zone [64].

C:N indicated the degradation speed of organic matter. Therefore, the less C:N representing of the higher degradation rate [65]. Thus, a higher degradation rate is observed in the bare lands. In contrast, the degradation rate in riparian forests was the minimum. Nitrogen mineralization occurred in open-spaced and bare lands, while organic nitrogen is observed in dense stands and riparian forests.

The highest and lowest values for soil pH were observed in the riparian forest along with the river line and bare lands in the western part of the studied site (Figure 4). The most negligible values of soil calcium carbonate were alongside the rive line (Figure 4), while the highest value belonged to the agricultural lands (Figure 4). River floods resulted in calcium carbonate leaching from the soil and, therefore the minimum values were observed in riparian forests.

Soil particle spatial mapping indicated higher sand values in the vicinity of the river, while higher clay values were observed in the agricultural lands. Note that soil texture is independent of plant species. Thus, the effects of soil erosion, flood, and leaching are well predictable in the digital soil mapping of the riparian zones. Soil with a fine texture indicated a lower bulk density than soil with a coarse texture. This can be seen in the riparian zone. Along with the river line, where the riparian forest exists, the soil was sandy, and therefore, bulk density was higher in these areas. On the other hand, in agricultural lands in the vicinity of riparian forests, soil texture is mostly fine particles and represents higher clay and silt. As such, riparian forests with sandy soil are covered with Tamarix arceuthoides, and Populus euphratica after which agricultural lands exist. These pedology phenomena are well indicated in our prediction maps. The highest value for sand was observed along with riverside (Figure 4). However, the most petite sand was observed in agricultural lands in the eastern part of the study site (Figure 4). Similar to sand, clay also had a specific trend in the study site. Along with the riverside, where riparian forests exist, the minimum clay value was observed (Figure 4). Further, the highest clay content was observed in the agricultural lands in the east part of the study site (Figure 4). The highest values of bulk density were observed along the riverside, where riparian forests exist. However, the lowest values belonged to the agricultural lands (Figure 4). This indicates the negative impact of the operation tillage in agricultural lands on soil bulk density. Compared to the riparian forest, agricultural land receives intense tillage operation causing blockage of soil pores and soil compaction [66].

4. Conclusions

The spatial mapping of soil physiochemical properties in riparian forests and agricultural lands can be well demonstrated by ancillary data such as digital elevation models, satellite images, and climate parameters. Furthermore, the results revealed that machine learning could be a good predictor of soil properties in riparian forests. Nevertheless, our results indicated that for each soil property, specific machine learning could be used to present highly accurate maps with less error. For example, the random forest, the cubist regression tree, and artificial neural networks have the best result in estimating soil pH, organic carbon, and calcium carbonate, respectively. These results confirm that there is no single machine learning which can be used for mapping soil properties, especially in the complex landscapes such as riparian forests. Finally, the soil property maps established in this study can act as input information for planning and land use evaluation of riparian forests.

Author Contributions

Methodology and supervision, M.M.; software, R.T.-M.; writing—original draft preparation, M.Z.N.; editing, G.M. All authors have read and agreed to the published version of the manuscript.

Funding

Ruhollah Taghizadeh-Mehrjardi has been supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC number 2064/1—Project number 390727645, and collaborative research center SFB 1070 ‘ResourceCultures’—Project number 215859406.

Data Availability Statement

Not applicable.

Acknowledgments

Authors use this opportunity to thank the Behbahan Khatam Alanbia University of Technology and its staff for their help during the work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Muñoz-Rojas, M.; Erickson, T.E.; Martini, D.; Dixon, K.W.; Merritt, D.J. Soil Physicochemical and Microbiological Indicators of Short, Medium and Long Term Post-Fire Recovery in Semi-Arid Ecosystems. Ecol. Indic. 2016, 63, 14–22. [Google Scholar] [CrossRef]
Visschers, R.; Finke, P.A.; de Gruijter, J.J. A Soil Sampling Program for the Netherlands. Geoderma 2007, 139, 60–72. [Google Scholar] [CrossRef] [Green Version]
Lamsal, S.; Bliss, C.M.; Graetz, D.A. Geospatial Mapping of Soil Nitrate-Nitrogen Distribution Under a Mixed-Land Use System. Pedosphere 2009, 19, 434–445. [Google Scholar] [CrossRef]
Klemas, V. Remote Sensing of Riparian and Wetland Buffers: An Overview. J. Coast. Res. 2014, 297, 869–880. [Google Scholar] [CrossRef] [Green Version]
McBratney, A.B.; Mendonça Santos, M.L.; Minasny, B. On Digital Soil Mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
Hengl, T.; Toomanian, N.; Reuter, H.I.; Malakouti, M.J. Methods to Interpolate Soil Categorical Variables from Profile Observations: Lessons from Iran. Geoderma 2007, 140, 417–427. [Google Scholar] [CrossRef]
Gundogdu, I.B.; Esen, O. The importance of secondary variables for mapping of meteorological data. In Proceedings of the International Conference on Cartography and GIS, Nessebar, Bulgaria, 15–20 June 2010. [Google Scholar]
Ließ, M.; Schmidt, J.; Glaser, B. Improving the Spatial Prediction of Soil Organic Carbon Stocks in a Complex Tropical Mountain Landscape by Methodological Specifications in Machine Learning Approaches. PLoS ONE 2016, 11, e0153673. [Google Scholar] [CrossRef] [Green Version]
Were, K.; Bui, D.T.; Dick, Ø.B.; Singh, B.R. A Comparative Assessment of Support Vector Regression, Artificial Neural Networks, and Random Forests for Predicting and Mapping Soil Organic Carbon Stocks across an Afromontane Landscape. Ecol. Indic. 2015, 52, 394–403. [Google Scholar] [CrossRef]
Viscarra Rossel, R.A.; Behrens, T. Using Data Mining to Model and Interpret Soil Diffuse Reflectance Spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
Hardoim, P.R.; van Overbeek, L.S.; Berg, G.; Pirttilä, A.M.; Compant, S.; Campisano, A.; Döring, M.; Sessitsch, A. The Hidden World within Plants: Ecological and Evolutionary Considerations for Defining Functioning of Microbial Endophytes. Microbiol. Mol. Biol. Rev. 2015, 79, 293–320. [Google Scholar] [CrossRef]
Štursová, M.; Bárta, J.; Šantrůčková, H.; Baldrian, P. Small-Scale Spatial Heterogeneity of Ecosystem Properties, Microbial Community Composition and Microbial Activities in a Temperate Mountain Forest Soil. FEMS Microbiol. Ecol. 2016, 92, fiw185. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ingole, S.P. A Review on Role of Physico-Chemical Properties in Soil Quality. Int. J. Chem. Stud. 2015, 3, 29–32. [Google Scholar]
Moradi Behbahani, S.; Moradi, M.; Basiri, R.; Mirzaei, J. Sand Mining Disturbances and Their Effects on the Diversity of Arbuscular Mycorrhizal Fungi in a Riparian Forest of Iran. J. Arid Land 2017, 9, 837–849. [Google Scholar] [CrossRef] [Green Version]
Moradi, M.; Imani, F.; Naji, H.; Moradi Behbahani, S.; Ahmadi, M. Variation in Soil Carbon Stock and Nutrient Content in Sand Dunes after Afforestation by Prosopis Juliflora in the Khuzestan Province (Iran). iForest 2017, 10, 585–589. [Google Scholar] [CrossRef] [Green Version]
Narumalani, S.; Zhou, Y.; Jensen, J.R. Application of Remote Sensing and Geographic Information Systems to the Delineation and Analysis of Riparian Buffer Zones. Aquat. Bot. 1997, 58, 393–409. [Google Scholar] [CrossRef]
Alencar-Silva, T.; Maillard, P. Assessment of biophysical structure of riparian zones based on segmentation method, spatial knowledge and texture analysis. In ISPRS TC VII Symposium—100 Years ISPRS; Wagner, W., Szekely, B., Eds.; IAPRS: Vienna, Austria, 2010. [Google Scholar]
Forogh Nasab, F.; Moradi, M.; Moradi, G.; Taghizadeh-Mehrjardi, R. Erratum to: Topsoil Carbon Stock and Soil Physicochemical Properties in Riparian Forests and Agricultural Lands of Southwestern Iran. Eurasian Soil Sci. 2021, 54, 459. [Google Scholar] [CrossRef]
Avazpoor, Z.; Moradi, M.; Basiri, R.; Mirzaei, J.; Taghizadeh-Mehrjardi, R.; Kerry, R. Soil Enzyme Activity Variations in Riparian Forests in Relation to Plant Species and Soil Depth. Arab. J. Geosci. 2019, 12, 708. [Google Scholar] [CrossRef]
Tockner, K.; Stanford, J.A. Riverine Flood Plains: Present State and Future Trends. Environ. Conserv. 2002, 29, 308–330. [Google Scholar] [CrossRef] [Green Version]
Rusnák, M.; Goga, T.; Michaleje, L.; Šulc Michalková, M.; Máčka, Z.; Bertalan, L.; Kidová, A. Remote Sensing of Riparian Ecosystems. Remote Sens. 2022, 14, 2645. [Google Scholar] [CrossRef]
Minasny, B.; McBratney, A.B. Chapter 12 Latin Hypercube Sampling as a Tool for Digital Soil Mapping. In Developments in Soil Science; Elsevier: Amsterdam, The Netherlands, 2006; Volume 31, pp. 153–606. ISBN 9780444529589. [Google Scholar]
Mclean, E.O. Soil PH and Lime Requirement. In Agronomy Monographs; Page, A.L., Ed.; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 2015; pp. 199–224. ISBN 9780891189770. [Google Scholar]
Walkley, A.; Black, I.A. An examination of the degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil Sci. 1934, 37, 29–38. [Google Scholar] [CrossRef]
Klute, A.; Page, A.L. (Eds.) Methods of Soil Analysis, 2nd ed.; Agronomy; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 1982; ISBN 9780891180883. [Google Scholar]
Olsen, S.R.; Cole, C.V.; Watanabe, F.S.; Dean, L.A. Estimation of available phosphorus in soils by extraction with sodium bicarbonate. USDA Circ. 1954, 939, 1–19. [Google Scholar]
Merwin, H.D.; Peech, M. Exchangeability of Soil Potassium in the Sand, Silt, and Clay Fractions as Influenced by the Nature of the Complementary Exchangeable Cation. Soil Sci. Soc. Am. J. 1951, 15, 125–128. [Google Scholar] [CrossRef]
Prihar, S.S.; Hundal, S.S. Determination of Bulk Density of Soil Clod by Saturation. Geoderma 1971, 5, 283–286. [Google Scholar] [CrossRef]
Loveland, T.R.; Irons, J.R. Landsat 8: The Plans, the Reality, and the Legacy. Remote Sens. Environ. 2016, 185, 1–6. [Google Scholar] [CrossRef] [Green Version]
Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 2015, 8, 1991–2007. [Google Scholar] [CrossRef] [Green Version]
Hijmans, R.; Cameron, S.; Parra, J.; Jones, P.; Jarvis, A.; Richardson, K. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. A J. R. Meteorol. Soc. 2005, 25, 1965–1978. [Google Scholar] [CrossRef]
Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta—A System for Feature Selection. Fundam. Inform. 2010, 101, 271–285. [Google Scholar] [CrossRef]
Bittencourt, H.R.; Clarke, R.T. Use of Classification and Regression Trees (CART) to Classify Remotely-Sensed Digital Images. In Proceedings of the IGARSS 2003, 2003 IEEE International Geoscience and Remote Sensing Symposium, Toulouse, France, 21–25 July 2003; Proceedings (IEEE Cat. No.03CH37477). IEEE: Toulouse, France, 2003; Volume 6, pp. 3751–3753. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Sarmadian, F.; Omid, M.; Toomanian, N.; Rousta, M.J.; Rahimian, M.H. Incorporating soil taxonomic distance and decision tree for spatial prediction of soil classes in Ardakan, Yazd. J. Arid Biome 2013, 3, 27–39. [Google Scholar]
Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random forests for classification in ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef]
Dramsch, J.S. 70 Years of Machine Learning in Geoscience in Review. In Advances in Geophysics; Elsevier: Amsterdam, The Netherlands, 2020; Volume 61, pp. 1–55. ISBN 9780128216699. [Google Scholar]
Behrens, T.; Förster, H.; Scholten, T.; Steinrücken, U.; Spies, E.; Goldschmitt, M. Digital Soil Mapping Using Artificial Neural Networks. Z. Pflanz. Bodenkd. 2005, 168, 21–33. [Google Scholar] [CrossRef]
Kalambukattu, J.G.; Kumar, S.; Arya Raj, R. Digital Soil Mapping in a Himalayan Watershed Using Remote Sensing and Terrain Parameters Employing Artificial Neural Network Model. Environ. Earth Sci. 2018, 77, 203. [Google Scholar] [CrossRef]
Quinlan, R. Learning with Continuous Classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Tasmania, 16–18 November 1992; pp. 343–348. [Google Scholar]
Appelhans, T.; Mwangomo, E.; Hardy, D.R.; Hemp, A.; Nauss, T. Evaluating Machine Learning Approaches for the Interpolation of Monthly Air Temperature at Mt. Kilimanjaro, Tanzania. Spat. Stat. 2015, 14, 91–113. [Google Scholar] [CrossRef] [Green Version]
Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. Remote Sens. 2020, 12, 2234. [Google Scholar] [CrossRef]
Sun, H.; Wang, Q.; Wang, G.; Lin, H.; Luo, P.; Li, J.; Zeng, S.; Xu, X.; Ren, L. Optimizing KNN for Mapping Vegetation Cover of Arid and Semi-Arid Areas Using Landsat Images. Remote Sens. 2018, 10, 1248. [Google Scholar] [CrossRef] [Green Version]
Mansuy, N.; Thiffault, E.; Paré, D.; Bernier, P.; Guindon, L.; Villemaire, P.; Poirier, V.; Beaudoin, A. Digital Mapping of Soil Properties in Canadian Managed Forests at 250 m of Resolution Using the K-Nearest Neighbor Method. Geoderma 2014, 235–236, 59–73. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, R.; Emadi, M.; Cherati, A.; Heung, B.; Mosavi, A.; Scholten, T. Bio-Inspired Hybridization of Artificial Neural Networks: An Application for Mapping the Spatial Distribution of Soil Texture Fractions. Remote Sens. 2021, 13, 1025. [Google Scholar] [CrossRef]
Mirzaeitalarposhti, R.; Shafizadeh-Moghadam, H.; Taghizadeh-Mehrjardi, R.; Demyan, M.S. Digital Soil Texture Mapping and Spatial Transferability of Machine Learning Models Using Sentinel-1, Sentinel-2, and Terrain-Derived Covariates. Remote Sens. 2022, 14, 5909. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, R.; Khademi, H.; Khayamim, F.; Zeraatpisheh, M.; Heung, B.; Scholten, T. A Comparison of Model Averaging Techniques to Predict the Spatial Distribution of Soil Properties. Remote Sens. 2022, 14, 472. [Google Scholar] [CrossRef]
Mahmoudzadeh, H.; Matinfar, H.R.; Kerry, R.; Eskandari, S.; Ebrahimi-Khusfi, Z.; Taghizadeh-Mehrjardi, R. New Hybrid Evolutionary Models for Spatial Prediction of Soil Properties in Kurdistan. Soil Use Manag. 2022, 38, 191–211. [Google Scholar] [CrossRef]
Mahmoudzadeh, H.; Matinfar, H.R.; Taghizadeh-Mehrjardi, R.; Kerry, R. Spatial Prediction of Soil Organic Carbon Using Machine Learning Techniques in Western Iran. Geoderma Reg. 2020, 21, e00260. [Google Scholar] [CrossRef]
Zeraatpisheh, M.; Garosi, Y.; Reza Owliaie, H.; Ayoubi, S.; Taghizadeh-Mehrjardi, R.; Scholten, T.; Xu, M. Improving the Spatial Prediction of Soil Organic Carbon Using Environmental Covariates Selection: A Comparison of a Group of Environmental Covariates. CATENA 2022, 208, 105723. [Google Scholar] [CrossRef]
Zhao, X.; Yang, Y.; Shen, H.; Geng, X.; Fang, J. Global Soil–Climate–Biome Diagram: Linking Surface Soil Properties to Climate and Biota. Biogeosciences 2019, 16, 2857–2871. [Google Scholar] [CrossRef] [Green Version]
Ding, J.; Li, F.; Yang, G.; Chen, L.; Zhang, B.; Liu, L.; Fang, K.; Qin, S.; Chen, Y.; Peng, Y.; et al. The Permafrost Carbon Inventory on the Tibetan Plateau: A New Evaluation Using Deep Sediment Cores. Glob. Chang. Biol. 2016, 22, 2688–2701. [Google Scholar] [CrossRef] [PubMed]
Hengl, T.; Heuvelink, G.B.M.; Kempen, B.; Leenaars, J.G.B.; Walsh, M.G.; Shepherd, K.D.; Sila, A.; MacMillan, R.A.; Mendes de Jesus, J.; Tamene, L.; et al. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions. PLoS ONE 2015, 10, e0125814. [Google Scholar] [CrossRef] [PubMed]
Dangal, S.R.S.; Sanderman, J.; Wills, S.; Ramirez-Lopez, L. Accurate and Precise Prediction of Soil Properties from a Large Mid-Infrared Spectral Library. Soil Syst. 2019, 3, 11. [Google Scholar] [CrossRef] [Green Version]
Silva, S.H.G.; Teixeira, A.F.d.S.; Menezes, M.D.d.; Guilherme, L.R.G.; Moreira, F.M.d.S.; Curi, N. Multiple Linear Regression and Random Forest to Predict and Map Soil Properties Using Data from Portable X-ray Fluorescence Spectrometer (PXRF). Ciênc. Agrotechnol. 2017, 41, 648–664. [Google Scholar] [CrossRef]
Adhikari, K.; Owens, P.R.; Libohova, Z.; Miller, D.M.; Wills, S.A.; Nemecek, J. Assessing Soil Organic Carbon Stock of Wisconsin, USA and Its Fate under Future Land Use and Climate Change. Sci. Total Environ. 2019, 667, 833–845. [Google Scholar] [CrossRef]
Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; Tajik, S.; Finke, P. Digital Mapping of Soil Properties Using Multiple Machine Learning in a Semi-Arid Region, Central Iran. Geoderma 2019, 338, 445–452. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, R.; Schmidt, K.; Amirian-Chakan, A.; Rentschler, T.; Zeraatpisheh, M.; Sarmadian, F.; Valavi, R.; Davatgar, N.; Behrens, T.; Scholten, T. Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate Space. Remote Sens. 2020, 12, 1095. [Google Scholar] [CrossRef] [Green Version]
Taghizadeh-mehrjardi, R.; Toomanian, N.; Khavaninzadeh, A.R.; Jafari, A.; Triantafilis, J. Predicting and Mapping of Soil Particle-Size Fractions with Adaptive Neuro-Fuzzy Inference and Ant Colony Optimization in Central Iran: Digital Mapping of Soil Texture. Eur. J. Soil Sci. 2016, 67, 707–725. [Google Scholar] [CrossRef]
Moline, A.B.; Poff, N.L. Growth of an Invertebrate Shredder on Native (Populus) and Non-Native (Tamarix, Elaeagnus) Leaf Litter. Freshw. Biol. 2008, 53, 1012–1020. [Google Scholar] [CrossRef]
Vendramini, J.M.B.; Silveira, M.L.A.; Dubeux, J.C.B., Jr.; Sollenberger, L.E. Environmental Impacts and Nutrient Recycling on Pastures Grazed by Cattle. R. Bras. Zootec. 2007, 36, 139–149. [Google Scholar] [CrossRef]
Crooks, E.C.; Harris, I.M.; Patil, S.D. Influence of Land Use Land Cover on River Water Quality in Rural North Wales, UK. J. Am. Water Resour. Assoc. 2021, 57, 357–373. [Google Scholar] [CrossRef]
Fones, G.R.; Bakir, A.; Gray, J.; Mattingley, L.; Measham, N.; Knight, P.; Bowes, M.J.; Greenwood, R.; Mills, G.A. Using High-Frequency Phosphorus Monitoring for Water Quality Management: A Case Study of the Upper River Itchen, UK. Environ. Monit. Assess. 2020, 192, 184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Günal, H.; Korucu, T.; Birkas, M.; Özgöz, E.; Halbac-Cotoara-Zamfir, R. Threats to Sustainability of Soil Functions in Central and Southeast Europe. Sustainability 2015, 7, 2161–2188. [Google Scholar] [CrossRef] [Green Version]
Yang, L.; Chen, S.; Li, Y.; Wang, Q.; Zhong, X.; Yang, Z.; Lin, C.; Yang, Y. Conversion of Natural Evergreen Broadleaved Forests Decreases Soil Organic Carbon but Increases the Relative Contribution of Microbial Residue in Subtropical China. Forests 2019, 10, 468. [Google Scholar] [CrossRef] [Green Version]
Wasak, K.; Drewnik, M. Land Use Effects on Soil Organic Carbon Sequestration in Calcareous Leptosols in Former Pasturelan—A Case Study from the Tatra Mountains (Poland). Solid Earth 2015, 6, 1103–1115. [Google Scholar] [CrossRef] [Green Version]
Liu, Z.; Cao, S.; Sun, Z.; Wang, H.; Qu, S.; Lei, N.; He, J.; Dong, Q. Tillage Effects on Soil Properties and Crop Yield after Land Reclamation. Sci. Rep. 2021, 11, 4611. [Google Scholar] [CrossRef]

Figure 1. Overview of employed methods.

Figure 2. Study site location and spatial distribution of sampling points (yellow dots) in the studied site on Sentinel 2 satellite image.

Figure 3. Role of the ancillary data for predicting soil physiochemical properties (darker color indicates the higher importance of ancillary data for each soil property).

Figure 4. The spatial variability of soil properties in the studied site.

Table 1. Algorithms of the calculation of VIs based on satellite images.

RS Index	Formula
NDVI	(NIR − Red)/(NIR + Red)
EVI	2.5 × ((NIR − Red)/(NIR + 6 × Red − 7.5 × Blue + 1))
SAVI	((NIR − Red)/(NIR + Red + 0.5)) × (1 + 0.5)
NDMI	(NIR − SWIR1)/(NIR + SWIR1)
COSRI	((Blue + Green)/(Red + NIR)) × NDVI
LSWI	(NIR − SWIR)/(NIR + SWIR)
Brightness Index	((NIR)² + (Red)²)^0.5
Clay index	SWIR1/SWIR2
Salinity index	(Red − NIR)/(Green + NIR)
Carbonate index	Red/Green
Gypsum index	(SWIR1 − NIR)/(SWIR1 + NIR)

Blue: blue band; Green = green Band; Red = red band; NIR = near-infrared band; SWIR: shortwave infrared band; NDVI: normalized difference vegetation index; EVI: Enhanced vegetation index; NDMI: normalized difference moisture index; COSRI: combined spectral response index; LSWI: Land surface water index.

Table 2. Ancillary data are used for estimating soil properties (RS indices refer to Table 1).

Name	Code	Name	Code
MODIS	MO	Shortwave infrared 1 (1.61 µm)	SE.09
MODIS (0.62–0.67 µm)	MO.01	Shortwave infrared 2 (2.19 µm)	SE.10
MODIS (0.84–0.87 µm)	MO.02	Normalized difference vegetation index	SE.11
Normalized difference vegetation index	MO.03	Enhanced vegetation index	SE.12
Soil adjusted vegetation index	MO.04	Soil adjusted vegetation index	SE.13
Brithness Index	MO.05	Land surface water index	SE.14
Land surface temperature daytime	MO.06	Brightness Index	SE.15
Land surface temperature nighttime	MO.07	Clay index	SE.16
Landsat-8	LA	Salinity index	SE.17
Blue (0.45–0.51 µm)	LA.01	Carbonate index	SE.18
Green (0.53–0.59 µm)	LA.02	Gypsum index	SE.19
Red (0.64–0.67 µm)	LA.03	Terrain attributes	TE
Near infrared (0.85–0.88 µm)	LA.04	Aspect (°)	TE.01
Shortwave infrared 1 (1.57–1.65 µm)	LA.05	Catchment slope	TE.02
Shortwave infrared 2 (2.11–2.29 µm)	LA.06	Chanell network base level	TE.03
Normalized difference vegetation index	LA.07	Vertical distance to channel network	TE.04
Enhanced vegetation index	LA.08	Elevation (m)	TE.05
Soil adjusted vegetation index	LA.09	Standardized height	TE.06
Normalized difference moisture index	LA.10	Flow accumulation	TE.07
Combined spectral response index	LA.11	General curvature (°)	TE.08
Brightness Index	LA.12	Slope length (m)	TE.09
Clay index	LA.13	Catchment area (m2)	TE.10
Salinity index	LA.14	MRVBF	TE.11
Carbonate index	LA.15	Vector terrain ruggedness	TE.12
Gypsum index	LA.16	Normalized height	TE.13
Sentinel-2	SE	Relative-slope position	TE.14
Blue (0.49 µm)	SE.01	Slope (°)	TE.15
Green (0.56 µm)	SE.02	Terrain surface	TE.16
Red (0.66 µm)	SE.03	Topographic wetness index	TE.17
Vegetation Red Edge (0.74 µm)	SE.05	Valley depth (m)	TE.18
Vegetation Red Edge (0.78 µm)	SE.06	Profile curvature (°)	TE.19
Vegetation Red Edge (0.70 µm)	SE.04	Climate parameters	CL
Near infrared (0.842 µm)	SE.07	Annual mean temperature (°C)	CL.01
Vegetation Red Edge (0.86 µm)	SE.08	Annual mean precipitation (mm)	CL.02

Table 3. Statistical summary of the soil physiochemical properties in the studied site.

Code	Soil Properties	Minimum	Maximum	Mean	Standard Deviation
N	Nitrogen (%)	0.00	0.02	0.01	0.00
P	Phosphorous (ppm)	10.30	214.33	89.24	52.23
K	Potassium (ppm)	2.71	23.57	10.48	4.61
C	Organic carbon (%)	0.11	2.79	1.26	0.63
C:N	C:N	9.73	240.57	110.50	52.18
pH	pH	6.96	7.91	7.50	0.23
CaCO₃	CaCO₃ (%)	1.25	6.79	3.98	1.24
Sand	Sand (%)	20.00	96.00	53.02	21.50
Silt	Silt (%)	0.00	64.00	31.02	13.61
Clay	Clay (%)	2.00	36.00	15.96	10.73
BD	Bulk density (gr/cm³)	1.28	1.85	1.50	0.15

Table 4. Different studied decision models in estimating soil physiochemical properties.

		CART	RF	ANN	Cubist	KNN
N	MAE	0.002	0.002	0.002	0.002	0.002
	RMSE	0.003	0.003	0.003	0.003	0.003
	R²	0.46	0.56	0.49	0.51	0.52
P	MAE	42.81	42.12	41.02	41.66	43.06
	RMSE	50.67	49.67	47.79	48.72	50.36
	R²	0.46	0.45	0.42	0.48	0.39
K	MAE	4.15	4.08	3.91	3.80	3.84
	RMSE	4.82	4.71	4.45	4.33	4.48
	R²	0.32	0.49	0.46	0.45	0.45
OC	MAE	0.56	0.50	0.48	0.47	0.48
	RMSE	0.36	0.58	0.56	0.54	0.51
	R²	0.43	0.49	0.55	0.56	0.51
C:N	MAE	42.23	42.16	42.48	42.79	41.52
	RMSE	51.17	47.85	48.32	50.08	48.27
	R²	0.46	0.51	0.48	0.48	0.39
pH	MAE	0.21	0.20	0.19	0.19	0.19
	RMSE	0.24	0.22	0.22	0.22	0.22
	R²	0.32	0.46	0.43	0.36	0.36
CaCO₃	MAE	1.04	0.94	0.87	0.95	0.9
	RMSE	1.24	1.09	1.04	1.09	1.06
	R²	0.46	0.45	0.57	0.48	0.52
Sand	MAE	10.34	7.03	6.45	7.82	6.86
	RMSE	13.64	9.79	8.45	10.11	9.1
	R²	0.27	0.56	0.66	0.47	0.5
Silt	MAE	11.43	9.75	9.28	10.72	9.44
	RMSE	13.38	11.79	11.26	12.37	11.35
	R²	0.45	0.46	0.57	0.47	0.54
Clay	MAE	7.39	7.18	7.05	7.09	6.7
	RMSE	9.14	8.60	8.30	8.20	8.30
	R²	0.55	0.62	0.56	0.62	0.55
BD	MAE	0.11	0.09	0.09	0.11	0.09
	RMSE	0.13	0.11	0.11	0.13	0.11
	R²	0.54	0.64	0.55	0.58	0.54

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zolfaghari Nia, M.; Moradi, M.; Moradi, G.; Taghizadeh-Mehrjardi, R. Machine Learning Models for Prediction of Soil Properties in the Riparian Forests. Land 2023, 12, 32. https://doi.org/10.3390/land12010032

AMA Style

Zolfaghari Nia M, Moradi M, Moradi G, Taghizadeh-Mehrjardi R. Machine Learning Models for Prediction of Soil Properties in the Riparian Forests. Land. 2023; 12(1):32. https://doi.org/10.3390/land12010032

Chicago/Turabian Style

Zolfaghari Nia, Masoud, Mostafa Moradi, Gholamhosein Moradi, and Ruhollah Taghizadeh-Mehrjardi. 2023. "Machine Learning Models for Prediction of Soil Properties in the Riparian Forests" Land 12, no. 1: 32. https://doi.org/10.3390/land12010032

APA Style

Zolfaghari Nia, M., Moradi, M., Moradi, G., & Taghizadeh-Mehrjardi, R. (2023). Machine Learning Models for Prediction of Soil Properties in the Riparian Forests. Land, 12(1), 32. https://doi.org/10.3390/land12010032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Models for Prediction of Soil Properties in the Riparian Forests

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Site

2.2. Procedures

2.3. Soil Data Sampling and Analysis

2.4. Ancillary Data

2.5. Machine Learning Models

3. Results and Discussion

3.1. Summary of Soil Properties

3.2. Importance of Ancillary Data

3.3. Machine Learning Performances

3.4. Spatial Pattern of Maps

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI