1. Introduction
Information on soil, a non-renewable resource, is of increasing importance due to growing utilization pressure as well as climate change. Both developments pose a risk to soils as they can lead to soil degradation such as erosion, compaction, contamination, and soil sealing [
1,
2]. Soils are very diverse and depending on the soil’s properties, the processes that enable the soil to be resistant towards soil degradation, or to fulfill soil functions, differ considerably. In order to maintain soil quality it is necessary to know where, and where not, certain practices are applicable, and to adjust land use planning appropriately. For this purpose, within the last two decades different methods to assess soil functions have been developed [
3,
4] and turned out to be invaluable tools to integrate soil-related issues in decision-making processes [
5]. Because of these developments, of which brief summaries can be found in [
5] and [
6], on the one hand soil awareness could be generally increased, on the other hand several approaches have been developed to convert available soil data to decision-relevant soil information [
6,
7,
8,
9,
10,
11]. Assessing and mapping soil functions means to differentiate soils due to their role in a functional context. What is really valuated is the degree to which a specific soil function is fulfilled by a specific soil on the basis of its characteristics, considering relevant soil processes and using meaningful soil parameters, pedotransfer functions, and standard algorithms [
5,
12]. Nevertheless, methodical differences exist in terms of selection and definition of soil functions, the soil parameters considered and algorithms used. An overview can be found for instance in [
6]. Nowadays, soil function assessment is increasingly discussed in the context of the popular ecosystem service approach [
13,
14,
15]. To that end, [
6] provided a review on soil function assessment methods in order to quantify the contributions of soils to ecosystem services, which leads to the concept of soil-based ecosystem services.
The challenge experts face when processing soil information, is the fact that original information is point-related as it is based on soil surveys in the field. However, decision-relevant soil information usually needs to be presented in an area-related form. With increasing popularity of readily available Geographic Information Systems and statistical software within soil science, this step of deriving area-related soil information from point-related soil information in combination with area-related information on soil-forming factors such as for instance topography, climate, or vegetation, has gained importance and is the core of Digital Soil Mapping (DSM), which was defined by [
16] as ’the creation, and population of spatial soil information systems by the use of field and laboratory observational methods coupled with spatial and non-spatial soil inference systems’. In the case of soil functions, this transformation step can either be done before the assessment (with the individual soil parameters) or afterwards (with the assessed levels of function fulfillment).
Terrain parameters, i.e., variables derived from digital terrain models (DTM) and consequently are a representation of topography, have long played an important role in the spatial modeling of processes associated with a wide range of research topics, such as geomorphodynamics, soil, natural hazards, or habitat modeling. With regard to soils in an Alpine environment, many authors emphasize the strong influence of topography on soil formation, and therefore terrain derivatives play an important role as explanatory variables in DSM in the Alps [
17,
18,
19]. An overview of terrain parameters, their calculation, and application can be found for instance in [
20,
21]. In general they can be divided into primary (for instance slope or catchment area) and compound (for instance the topographic wetness index which relates the catchment area at a given point to the slope) topographic attributes [
20]. While slope as a primary attribute is an example of a local terrain parameter, which characterizes the grid cell by its position with regard to its immediate surrounding, compound terrain parameters mostly belong to the group of regional terrain parameters, as they describe a grid cell in the broader context of the surrounding landscape. Landform classes, representing compound regional terrain attributes, can be derived from topographic attributes either through segmentation algorithms, statistical learning, or knowledge-based rule sets and have also been used as predictors in spatial modelling (e.g., [
17,
22,
23]). A number of landform classifications are described in [
24], and are compared with regard to their ability to recreate topographic position as mapped by soil surveyors. Amongst these classification algorithms is a support machine vector (SVM) classifier using local and regional terrain parameters, an example for a supervised statistical learning method performing non-linear classification.
SVM classification has also been applied in the more general sense of DSM, i.e to infer spatial soil information from point data, for instance by [
19,
25,
26]. Other examples of statistical learning approaches and their usage in regionalizing soil data include, amongst others, decision trees and their extension random forests [
18,
27,
28,
29], artificial neural networks [
26,
30], generalized linear models [
31] and geostatistical methods such as regression kriging [
32,
33]. More information on DSM and comparisons of techniques used to infer spatial soil information can be found in [
16,
25,
34,
35].
In this study, we present the soil evaluation tool SEPP (Soil Evaluation for Planning Procedures) [
36], which, based on a number of soil and location parameters, performs a soil function assessment (SFA) and assigns levels of fulfillment for a number of soil functions to a soil profile. The main aim of this study is to investigate how and to what extent topography controls soil characteristics, and hence the SEPP tool’s output. We therefore apply support vector machine (SVM) classification to the levels of soil function fulfillment as assessed by the SEPP tool for available soil pit information from the Oltradige/Überetsch region of the Autonomous Province Bolzano-South Tyrol, thereby evaluating the extent to which these levels can be reproduced based only on topographic information. Within this approach, a cross-validated feature selection is used to identify those terrain parameters which are best suited to recreate the results of the SEPP tool. A further intention is to provide the means for using readily available information on topography such as digital terrain models (DTM) to get a first impression of the degree to which soils at different locations can be expected to fulfill a range of soil functions.
3. Results
The level of fulfillment for each of the 14 soil functions was calculated for each of the 108 soil profile pits in the study area with the SEPP application.
Figure 4 shows the distribution of the fulfillment levels for each soil function as computed by the SEPP tool, whereas
Figure 5 presents the predicted distributions based on SVM classification.
Table 4 gives a overview of the feature selection and validation results, presenting the terrain parameters which were selected for modeling the levels of soil function fulfillment. In many cases, these parameter combinations consisted only of local terrain parameters, but sometimes a local terrain parameter was complemented with landform classes. In addition to the median cross-validated accuracy of 100 model runs, the test accuracy which results from using the same data for model fitting and validation is provided in
Table 4.
Often two, or even just one, parameters are sufficient, as there is no increase in cross-validated prediction accuracy by adding more predictors. This can be observed in
Table 5, which compares the accuracies from
Table 4 to those achieved with SVM classifiers using an increased number of predictor variables. This larger set consists of all unique predictor variables which appeared in the entire five-fold cross-validation of a feature selection procedure which chooses 10 predictors per selection run. This amounts to 25–30 variables per model.
3.1. Habitat for Drought-Tolerant Species
Figure 4 shows that of the 108 soil profile sites in the study area, 38 fall into fulfillment level class 4 and 32 into class 5 regarding the soil function of habitat for drought-tolerant species. The intermediate class 3 contains 21 soil profiles whereas the classes with high fulfillment levels (1 and 2) are attributed to only 4 and 13 sites, respectively. As the predictor set neither contains land use nor soil type, the SVM classification essentially attempts to model the different classes of available field capacity. In the majority of the feature selection runs a landform map based on a flatness threshold between 3 and 5
, a spatial resolution of 10 m, and a search radius of 100 m was chosen as the first predictive feature. The landform flat is dominant amongst the profile sites with the lowest level of soil function fulfillment, which is accordingly connected to minimal curvature values around 0. The landform slope is most common for profiles at level 4, whereas spurs and hollows can present profile locations at fulfillment level 2 and, as expected, have increasingly negative minimum curvature values. A support vector classifier using these landforms and slope at a low DTM resolution as predictor variables results in a median cross-validated prediction accuracy of 50%, where the most common error is that a large number of sites are mistakenly classified as having fulfillment level 4. Nevertheless, the general implications of the feature selection are plausible, as flat areas can be expected to have higher field capacity values than sloping regions with negative curvature values.
3.2. Habitat for Moisture-Tolerant Species
None of the soil profile sites in the study area is awarded the best level of fulfillment (1) for its function as a habitat for moisture-tolerant species. As seen in
Figure 4, the intermediate fulfillment level classes (2-4) are quite evenly distributed with 33, 23 and 35 members, respectively. The class with the lowest level consists of 17 soil profile sites. The soil parameters and profile site characteristics used for the evaluation of this soil function are similar to those used for the function as a habitat for drought-tolerant species. Consequently, a very similar landform classification is chosen in the feature selection procedure (
Table 4), the only difference being a slightly tighter search window of 70 m. This feature is complemented by the local terrain parameter longitudinal curvature to achieve the best median cross-validated accuracy of 53.7%. The model predictions show that while the SVM classifier associates high levels of fulfillment with curvature values around zero, soil pits with the lowest level can be found at locations with negative longitudinal curvature. This trend is also visible in the landform distribution, were the landform flat, and, to a lesser degree, footslopes are characteristic for soil pits which fulfill the function as a habitat for moisture-tolerant species to a high degree. This combinations seems reasonable, given the potential hydrological situation in these positions.
3.3. Habitat for Soil Organisms
With the exception of the lowest soil fulfillment level class (5), which does not occur, the four other levels are distributed relatively evenly amongst the soil pits in the study area.
Figure 6A shows the spatial distribution of the profiles sites and their soil function fulfillment levels in the study area. The class with the rather high level of 2 is the most common with 34 members. The feature selection procedure distinguished three local terrain parameters as being most useful in separating the soil profile sites with different fulfillment levels, specifically representatives of the parameters slope, convexity, and cross-sectional curvature. Lower convexity values characterize those soil profile sites best suited for soil organisms. Similarly, high slope values are helpful in separating members of the intermediate level (3) from the remaining three classes, based on the general trend that the two best levels are more closely associated with lower slope angles than the levels 3 and 4. An analysis of the cross-sectional curvature values shows that the class of soil profiles with level 4 tends to have more members related to slightly positive curvature values when compared to the other classes. A SVM classifier trained with the three aforementioned terrain parameters leads to a median accuracy rate of 59.3%, which is relatively high compared to other soil functions which also have profil sites belonging to more than three different fulfillment levels.
Figure 6B shows how this prediction turns out spatially for the study area.
3.4. Habitat for Crops
The soil pit locations within the study area exhibit a very low diversity with regards to the extent to which they fulfill the soil function as a habitat for crops. Only eight of the locations achieve the intermediate level 3, whereas the remaining soil pit sites are attributed with the poorer levels 4 (n = 62) and 5 (n = 38) by the SEPP tool. The feature selection procedure leads to a model which incorporates only the local terrain parameter slope based on the high resolution DTM and leads to a median cross-validated accuracy of 86%. The class of soil pits with fulfillment level 3 is not depicted in the model output, as the soil pits belonging to this class are misclassified as being part of the class with level 4, and, to a lesser degree, level 5.
Analysis of the distribution of the slope values of the different fulfillment levels based on the SEPP tool as well as predicted by a SVM classifier (
Figure 7) shows that the SVM classifier applied a threshold of 15
to separate the locations with the levels 4 and 5, which is a direct result of this exact slope threshold applied by the algorithm in SEPP which calculates the fulfillment level of the soil function as a habitat for crops. The consequent difference in slope of these classes apparently overrules possible effects of other terrain parameters. This leads to the non-representation of the class with fulfillment level 3 in the model output. This issue is a consequence of the specifics of the study area, which can be roughly divided into the valley floors reserved for agriculture and the forested steep slopes. This is further complicated by the dominance of the classes with low function fulfillment levels, which cannot be solely attributed to the slope threshold, but also the generally rather shallow and skeleton-rich soils encountered.
3.5. Retention of Precipitation
The fulfillment levels for the soil function of precipitation retention are relatively evenly distributed over the study area when calculated with minimum permeability, with only the best class having significantly less members than the other classes. However, when using average permeability coefficients the distribution shows a skew towards higher levels of fulfillment (
Figure 4). Level 2 has the most members, constituting almost half of the soil profiles. The feature selection procedure for both calculation approaches shows that terrain parameters describing various forms of curvature are best suited to model the difference with regard to this soil function. Surprisingly, the specific curvatures differed for the two calculation methods.
For average precipitation retention, high resolution cross-sectional curvature was combined with profile curvature at medium (10 m) resolution to achieve a median cross-validated accuracy of 50.9%. The model output predicts four out of the five possible fulfillment level classes, with the intermediate level 3 missing. The predictions lead to an even larger dominance of level 2 than in the original data. While the 15 soil profile points with fulfillment level 1 which were attributed to level 2 by the SVM classifier seem acceptable, the 14 soil profiles with level 4 but classified as having level 2 are of more concern with regard to the predictive power of the model. The box plot analysis shows that for the model fitting, the SVM classifier concentrated on the outliers of the soil pits belonging to the classes with fulfillment levels 4 and 5 (
Figure 7), which also leads to a low producer’s reliability for these classes.
For minimum precipitation retention, the best suited parameters were found to be high resolution planar curvature together with minimal curvature based on the low resolution DTM (50 m), i.e., a more regional scale topography. Combined by a SVM classifier, these predictor variables lead to a median cross-validated accuracy of just 41.6%. The confusion matrix shows misclassification between all classes, indicating that not only are the soil profiles distributed evenly amongst the different fulfillment levels, but this is also the case with regard to topography, leading to a low prediction accuracy.
3.6. Short-Term Retention of Heavy Precipitation
The capacity for short-term retention of heavy precipitation is assessed by SEPP as very high for 75 soil profiles in the study area, whereas the other fulfillment levels have relatively low membership numbers, distributed more or less evenly over the remaining classes. Feature selection identifies longitudinal curvature as helpful to separate the soil profiles with fulfillment level 4 from the rest of the profiles, as members of this class show higher, positive curvatures. This leads to a median accuracy of 73.1%, but also results in a large number of misclassifications to level 1 with only a limited number of soil profiles correctly attributed with fulfillment level 4. The other classes are not considered in the model output due to the dominance of level 1 on concave terrain and the assignment of convex areas such as ridges to fulfillment level 4.
3.7. Groundwater Recharge
The evaluation of the quantity and quality of groundwater recharge using the SEPP tool shows that almost 40% of the soil profiles exhibit the relatively high soil function fulfillment level 2, but also 41 profiles belong to the classes of soil profiles with levels 4 and 5. Accordingly, the SVM algorithm employs cross-sectional and profile curvature to generally divide the soil profiles into the classes 2, 4, and 5. This classification leads to a median cross-validated accuracy of 47.5%. Higher, and, in the majority, positive cross-sectional curvatures are attributed to soil profile classes with low function fulfillment levels. On the contrary, curvature values surrounding zero are linked to level 2, a class which incorporates a large proportion of soil profiles originally attributed with the levels 1 and 3 by the SEPP tool. While this sort of misclassification may seem acceptable when seeking a general trend with regard to the quality of groundwater recharge, the still substantial number of level 4 and 5 soils predicted to have level 2 by the SVM classifier indicates a strong influence of other factors beside topography.
3.8. Nutrient Provision to Plants
With regard to fulfilling the soil function of providing plant nutrients, 90% of the soil profiles belong to one of the extreme classes with level 1 or 5, whereas the intermediate level 3 has only 11 members. Compared to the predictive performance for other soil functions, this bimodal distribution leads to a relatively high cross-validated accuracy of 71.3% when using the local terrain parameter minimal curvature at a high DTM resolution of 2.5 m as the sole predictor variable. It is however important to consider that compared to other soil functions, the SEPP tool provides only three possible levels of fulfillment for this soil function. Due to the predominance of the more extreme levels, the model output predicts membership to one of these two soil profile classes, with the majority of the members of the intermediate class being attributed with fulfillment level 1. In this case, the classifier links the soil profiles with low fulfillment levels to negative minimal curvature values, while soils with higher levels are generally characterized by minimal values not far below zero.
3.9. Carbon Storage
Of the 108 soil profiles evaluated in the study area with the SEPP tool, 51 were assessed as having the highest level of fulfillment with regard to the function of soil as carbon storage. The class of soils with the second most members is that with fulfillment level 4, followed by the intermediate level 3. Almost the same local terrain parameter was chosen by the feature selection procedure as for the function of providing nutrients to plants, with minimal curvature leading to a prediction accuracy of 61.1%. When interpreting the model result, which only leads to two classes being predicted for the study area, representing the soil function fulfillment levels 1 and 4, it is important to acknowledge that the SVM classifier is essentially modeling forest land use for those profile sites with level 1. Consequently, areas with less distinct curvature, i.e., values surrounding zero, are classified as having lower fulfillment levels, whereas the sloping regions surrounding the paleovalley, which are in fact mostly covered by forest, are attributed a high fulfillment levels for carbon storage. The majority of the misclassifications are connected to level 4, which has a low producer’s reliability of 24%.
3.10. Retention of Heavy Metals
The distribution of the fulfillment levels for the soil function of retaining heavy metals can be characterized as bimodal. The more extreme levels 1 (high) and 5 (low) each have 42 members, whereas the remaining 24 profiles are distributed amongst the three intermediate classes, with level 3 being the largest class containing ten profile sites. For the model best suited to correctly predict fulfillment levels for as many soils as possible, a landform classification based on the 10 m DTM and a search window of 70 m surpassed the other predictor variables with a median classification accuracy of 63.9%. However, due to the dominance of two distinctly different classes over the remaining classes, the resulting model only predicts these two dominant fulfillment levels. Providing the model with further predictor variables did not improve the number of correctly predicted classes. In this model, the highest fulfillment level for the soil function of retaining heavy metals is linked to the landform flat, whereas all locations with a different automated landform class were attributed the lowest fulfillment level. The members of the three intermediate levels are almost equally divided among the dominant classes without a clear trend.
3.11. Transformation of Organic Contaminants
The class which represents the lowest function fulfillment level with regard to transforming organic contaminants is the most common with 39 member soil profiles. This class, which is characterized by low pH and/or low organic matter and clay content, can be predominantly found on ridges formed of rhyolite outcrops or steeper slopes. Accordingly, the feature selection procedure produces maximal curvature as an important terrain parameter and links soils with fulfillment level 5 to areas with higher, positive values of this parameter. Additionally, a landform classification based on a search window of 500 m, thus representing regional-scale topography, is implemented in the SVM classifier and identifies the landform slope as being closely correlated with a low fulfillment level for transforming organic contaminants. The landform class flat, on the other hand, is mainly linked to level 2, which is the class with the highest fulfillment level predicted by the SVM classifier (
Figure 8).
Together, a model fed with these two explanatory variables leads to a median cross-validated correct classification rate of 46.3%. It predicts the levels 2, 3, and 5, with the majority of class 1 soils being misclassified as level 3, and all but two of the level 4 soils incorporated into the class with fulfillment level 5.
3.12. Filtration and Buffering of Organic Contaminants
The distribution of the fulfillment levels of this soil function over the study area is very one-sided, with all but four soil profiles evaluated as belonging to the class with the lowest level. As a consequence, modeling the fulfillment levels of this soil function is not very productive, as the classifiers will simply classify the four soils with level 4 as level 5, which still leads to the high correct classification rate of 96%.
3.13. Retention of Water-Soluble Contaminants
The most dominant soil function fulfillment level with regard to retaining water-soluble contaminants such as nitrate is level 1 with 47 soil profiles. Levels 2 and 3 are similar with 14 and 17 members, followed by a slight peak in membership for level 4 with 22 profile sites. As the climatic framework is the same for the study area, this soil function is assessed in the SEPP tool based on only soil texture. A SVM classifier applying high resolution slope and minimum curvature as explanatory variables leads to a median cross-validated accuracy of 53.7%. Due to its dominance in the SEPP tool output, the predicted class with level 1, which the model links to very low slope angles and minimum curvature values not far away from zero, incorporates a large number of soil profiles with different levels. While this leads to a very high user’s accuracy of 96%, it is also responsible for a producer’s reliability of only 60%. The model predicts all classes except the one with fulfillment level 2, however levels 3 and 5 are projected for only two soils each, with level 3 producing a user’s accuracy of only 12%. The reason for this can be identified in the box plot analysis, which shows that for these 2 less populated classes the classifying algorithm concentrates on outliers in order to produce significant differences to the distribution of the relevant terrain parameters of the other classes.
3.14. Buffer for Acidic Substances
Soils with the highest level of fulfillment for this soil function are the most numerous in the study area, constituting 36% of all profile sites. With decreasing function fulfillment, the membership numbers also decrease, with only 9 soils attributed with the lowest level (5) by the SEPP tool. Planar curvature and slope, both computed with the high resolution (2.5 m) DTM and a local-scale window size of 12.5 m, result in a model with a median accuracy of 48.1%. A major drawback of this model is that with regard to the result based on the majority vote of 100 model runs, it fails to reproduce any members of the classes with level 2 and 5. While the small sample size may be an issue for level 5, 83% of the soils originally with level 2 are fitted into the predicted level 1, as these classes share a very similar terrain parameter distribution for both slope and curvature. Levels 3 and 4 are distinguishable by increasingly higher slope and also planar curvature values.
4. Discussion
The presented study shows that generally the levels of fulfillment of most soil functions can be linked to topography, however there are substantial differences with regard to the strength of this connections. As indicated in
Table 4, the cross-validated accuracy of modeling soil function fulfillment varies from soil function to soil function, ranging from 41.6 to 86.1%. It must be kept in mind that the algorithms implemented in the SEPP tool are mostly expert knowledge-based and were not specifically intended for use in Alpine regions such as the study area. Furthermore, the study area has a long history of changing land use and a complex geologic setting, all of which have influence on the results of soil function assessments. In addition to the error which can be thereby be attributed to the non-inclusion of environmental covariates not directly derived from DTMs, such as parent material or local climate, a number of issues were encountered.
For one, topography, mainly represented by the terrain parameter slope, plays a role in some of the SEPP algorithms, depending on the soil function and also the specific fulfillment levels, which leads to high correct classification rates particularly for these classes. This influence of topography on the output of the SEPP tool can be either direct or indirect. An example for direct influence is the soil’s function as a habitat for crops. Originally, the assessment is based on a wide range of physical, chemical and biological soil parameters such as aggregate structure, bulk density, alkaline cation exchange capacity, organic matter, and others. However, the result is later directly adjusted based on a slope threshold of 15 degrees, as locations with steeper slopes are generally considered less suited for agricultural production (excluding pastures and forestry). This results in the highest classification accuracy of 86.1%, as the SVM classifier detects this threshold which leads to a model output that directly reflects this part of the evaluation procedure (
Figure 7). This is also a consequence of the characteristics of the study area, which can be roughly divided into two topographically different regions: the valley floors reserved for agriculture and the forested steep slopes. Due to the generally rather shallow and skeleton-rich soils encountered, the classes with poor fulfillment levels (4 and 5) are dominant. This circumstance further boosts the above mentioned dualism, as slope turns out to be the main difference between fulfillment levels 4 and 5. The indirect implementation of the factor slope in the evaluation algorithm of the SEPP tool can be exemplified by the function of carbon storage, where soil profiles with the land use forest are immediately awarded the best fulfillment level. As mentioned in the results discussion of the function as a habitat for crops, forestry in the study area is more or less constrained to the steep slopes in the western part of the study area and the Mitterberg ridge which represents the eastern border. Consequently, curvature values which highlight non-flat areas but also strongly convex regions such as ridges, also characterize sites which are used for forestry rather than agriculture, thus constructing the link between curvature and the function of soil to store carbon. When considering the above mentioned direct and indirect implementations of slope, and the division of the study area based on slope and land use, it is necessary to keep in mind which other effects this may have. For instance, the less dominant influence of other topographic factors or landform classes on the calculation of function fulfillment levels may be overprinted by the effect of this slope threshold value or the link between slope and land use. Unfortunately, this cannot be mitigated by the addition of further terrain parameters to the model.
The analysis of the SVM classifier predicting the levels of function fulfillment for the average precipitation retention highlights the problem of outliers in the distribution (
Figure 7). If the distributions of terrain parameters do not show obvious differences between the different levels, the classifier algorithm may revert to using outliers to characterize the different groups. Although these may indeed be values that best distinguish a certain subset of the data points and consequently lead to the best achievable correct classification rate, these values are unfortunately not representative of the data group in general. Nevertheless, such groupings may still result in valuable insights into some general trends in the data sets, even if they are exaggerated by the SVM classifier and lead to a small subset being overrepresented in the predicted fulfillment level membership. Another example for a soil function where certain groups in the model are in fact characterized by outliers is the function of retaining water-soluble contaminants.
An issue which is related to that of outliers can be found when investigating the model performance for soil functions which show a somewhat bimodal distribution with regard to their fulfillment levels as assessed with the SEPP tool. For instance, this is the case when comparing the distribution of the fulfillment levels of the soil function of providing nutrients to plants or the function of carbon storage in
Figure 4 and
Figure 5. The result is usually that this bimodality is reinforced in the SVM model, leading to the majority, if not all, of the soil profile sites being predicted as belonging to one of the two dominant classes and consequently limiting the number of fulfillment level classes in the model output. It is important to consider, especially for future research, that this bimodality, which can be found for a number of soil functions, may be linked to more dualistic characteristics of the study area, such as forest vs. agricultural land use, or silicate bedrock vs. limestone parent material, rather than topography.
Another aspect which necessitates discussion and is also related to the distribution of the fulfillment levels amongst the individual classes, is the question of sampling and, consequently, balanced classes. The advantage of a quite even distribution of the levels can be observed for the soil function as a habitat for soil organisms. This soil function shows a relatively high cross-validated accuracy, combined with a spatial prediction where all levels of fulfillment that are present in the SEPP output are also recreated in the model output (
Figure 6), which is not the case for most other soil functions. While a sampling scheme which incorporates soil pits from all different levels of soil function fulfillment would be preferred, leading to a balanced data set, this is not always feasible, especially when relying on soil pit data from surveys focused on other soil aspects. If this is the case, the use of a smaller number of classes, as is done by the SEPP tool in the case of the function of providing nutrients for plants, may be more appropriate when searching for general trends with regard to the influence of topography on the evaluation of soil functions. This is also the case when a certain function shows only a limited number of levels in the specific study area. A different approach could be to quantify the severity of misclassifications when evaluating the prediction accuracy. As the fulfillment levels are in an ordinal scale, the difference in levels between the prediction and the actual level could be considered for weighted accuracies in future evaluations.
SVM classification was preferred to other statistical models, for instance logistic regression or random forests, for its smooth predictions [
58] when applied as a spatial predictor. Additionally, SVM classification in the presented study is also used as an exploratory tool by applying a feature selection procedure to highlight those terrain parameters which are especially informative with regard to the fulfillment levels of a specific soil function. An issue encountered was that the accuracy values were found to be subject to variation when a classification was performed repeatedly with different random seeds. Using the median cross-validated correct classification rate was therefore deemed useful for evaluating the results of feature selection. Furthermore, other accuracy measures such as the kappa index are based on the assumption of simple random sampling [
59] and not practicality, as is often the case especially in steep and forested regions where for instance access roads are essential. Additionaly, the use of the kappa index is not undisputed [
60]. The analysis of the feature selection results has also shown that such a feature selection procedure is relevant also for SVM classification. To test this assumption, predictor variable sets consisting of up to 30 explanatory variables most frequently chosen in an expanded feature selection process were used to train SVM classifiers to model the fulfillment levels of the various soil functions (
Table 5). While the test accuracy always increased with a larger predictor set, the median cross-validated accuracy, i.e., correct classification rate, was almost always lower than when using only the minimum set of explanatory variables as indicated by the feature selection procedure. This demonstrates that the issue of over-fitting should always be considered.
5. Conclusions
In this study, we presented the soil function assessment tool SEPP, and evaluated the extent to which local terrain parameters and landform classes can influence or recreate the level to which a soil fulfills certain soil functions. This was investigated by using support vector machine (SVM) classification and a feature selection procedure. For each of the 14 soil functions assessed by SEPP, the presented approach highlights those topographic attributes which are best suited for use as explanatory variables to model fulfillment levels. To a certain degree every soil function fulfillment level can be linked to different aspects of topography, but the accuracy with which a SVM classifier can predict them varied from function to function. The parameters slope and minimal curvature were the most frequently chosen terrain attributes, one or both being part of the explanatory variables for 8 of the evaluated soil functions. Landform classes constituted a part of the predictor variable set for three soil functions. The reasons for the wide range of cross-validated prediction accuracies are plentiful. For instance, the terrain parameter slope directly plays a role as a threshold in the evaluation algorithms of the SEPP tool for some soil functions, and indirectly through its influence on land use in the study area. This results in high prediction accuracies for the soil’s functions as a habitat for crops, and carbon storage, respectively. Another issue is that when the levels of function fulfillment as assessed with the SEPP tool were rather equally distributed amongst the soil profiles in the study area, this lead to all of these levels being predicted by the SVM classifier. An example for this is the soil’s function of habitat for soil organisms. However, when one or two fulfillment levels dominated the data set, this modal or bimodal distribution tends to be exaggerated in the predictive model at the cost of the fulfillment levels with less members. Furthermore, the plausibility of the results of such models should always be questioned under consideration of the possible influence by data outliers. The study also showed that feature selection procedures are of value also in SVM classification, especially when modeling is used as an exploratory data mining technique to increase understanding of underlying processes rather than simply predicting spatial distributions.
In the presented study, the regionalisation of point data was performed at the level of soil function fulfillment, however future research should also investigate the possibility of first predicting individual soil parameters and then performing the soil function assessment for each grid cell. Although the predictive power of models of soil function fulfillment based exclusively on terrain parameters is limited, the authors are of the opinion that the presented work is an important first step towards assessing the influence of topography on soil characteristics and, as a consequence, on the results of soil function assessments. By giving a first overview of the link between topography and the assessment of different soil functions, the study highlights those soil functions for which more detailed investigations into topographic influence are worthwhile. Additionally, the study shows that terrain parameters such as minimal curvature may be helpful as indicators of the degree to which soil functions are expected to be fulfilled by the soils in a given topographic setting.