1. Introduction
The spatial distribution of soils and lithology provides essential input information for different scientific and economic applications, including landscape reconstruction [
1], digital soil mapping (DSM) and mineral exploration for agricultural [
2] or mining applications [
3]. Though the soil must be considered as a three dimensional medium, a wide range of remote sensing sensors provide useful information in assessing various details of the mineral composition and other physical and/or chemical properties of the uppermost parts of the soils, as well as for spatially contiguous areas [
4,
5,
6]. The topsoil is generally the most relevant part of the soil, considering food production, degradation and soil management [
7]. Although the definition of topsoil varies in different soil taxonomies [
7,
8,
9,
10], the uppermost part of the soil belongs to the topsoil. The topsoil thickness is related to local conditions of pedogenesis, erosion and deposition processes. Normally, topsoil is characterized by a thickness of 10–30 cm [
7,
8]. In this study, we regard the soil surface properties as topsoil/lithology proxy. We hypothesize that the analysis of physical-chemical properties, the collection of field reference data and the remote sensing analysis of the upper surface strata yield valuable information about the topsoil and/or lithologic characteristics. Moreover, the topographic position and geomorphological processes also influence the topsoil characteristics and, hence, should be included in a comprehensive analysis of the spatial distribution of topsoils.
The surface reflectance of the mineral composition of a surface, which is received by a multi- or hyper-spectral sensor, is influenced by soil organic matter, moisture content, as well as texture and surface roughness [
11]. Backscatter signals from Synthetic Aperture Radar (SAR) sensors of different wavelengths are dependent on the surface roughness and are sensitive to the dielectric properties of soils [
12,
13,
14]. Soil mapping using remote sensing data show limitations due to the complex physical and chemical nature of soils. Remotely derived datasets can characterize the surface (optical remote sensing systems) or the uppermost part of soils (SAR systems) [
5,
15]. Since soils are complex three-dimensional structures, the surface characteristics may not represent the underlying layers of soil. The remote sensing signal may also be a product of different soil surface properties. This effect will increase with a lower spatial resolution of the datasets. Very high-resolution sensors, like WorldView-2 and GeoEye-1, provide a high spatial differentiation. On the other hand, lower spatial resolution sensors, like the Landsat series or ASTER, provide a better spectral coverage, especially in the mid-infrared region, which is important for mineral mapping purposes [
5,
16]. Vegetation cover is another important factor to consider. Already sparse vegetation cover may influence the identification of soil attributes using remote sensing methods [
17,
18]. Spectral indices from multi- or hyper-spectral remote sensing images are effective tools for the classification and evaluation of photosynthetic vegetation activity. Vegetation indices (VI), like the Normalized Difference Vegetation Index (NDVI), utilize the difference of absorption and reflection in the spectral wavelengths of the red (0.625–0.74 µm) and near-infrared (IR; 0.74–1 µm) [
19]. Dead materials in grasslands blur VI, making it hard to distinguish between dead materials and some other land cover [
20]. This is especially a problem in arid and semiarid regions, due to relatively long dry periods. A strategy to resolve these problems consists of long-term monitoring via remote sensing and collection of ground information [
21,
22].
A wide range of studies proved the applicability of techniques using remote sensing data for topsoil mapping. In the following, some of them are described. Landsat 5 TM imagery was used to detect basalt outcrops for supporting soil mapping, applying reflectance values, band ratios and indices [
23]. Landsat 7 ETM+ data were used to determine surface soil properties with the help of laboratory-analyzed surface soil samples [
24]. The ASTER multispectral bands and derived indices and ratios were often utilized for lithological mapping [
25,
26,
27,
28]. ASTER data were also used to identify mineral components in tropical soils using reflectance spectroscopy signatures from soil samples [
29].
Various studies include additional variables, especially in geostatistical approaches of the spatial soil distribution [
5]. Topographical features, in particular, provide information on the terrain and, hence, on soil formation processes [
30]. Mulder
et al. [
31] used ASTER data and derivatives, as well as elevation as topographical proxy for DSM. Hahn and Gloaguen [
32] compared different input variable combinations of ASTER-derived land use, geology, topographical parameters and others to estimate soil distribution by support vector machines (SVM). Rossel and Chen [
33] used Landsat data and derivatives, topographical derivatives, climate parameters, as well as soil, geological and radiometric maps and spectrometry results from soil samples to determine the surface soil properties for Australia. Selige
et al. [
34] found out that soil organic matter and soil texture of topsoil correlate with the spectral properties of a hyperspectral sensor. They were also able to model the distribution of sand, clay, organic carbon (C
org) and nitrogen. SAR backscatter intensity information from X-, C- and L-band sensors proved to be sensitive for soil moisture differences, surface roughness and, to some extent, also to soil texture [
13,
14,
35,
36,
37,
38,
39,
40,
41]. Hengl
et al. [
42] applied an automated random forest approach to map soil properties of Africa with DEM-based landforms parameters and MODIS data at a spatial resolution of 250 m for the Africa Soil Information Service (AfSIS) project. A comprehensive overview about remote sensing in soil mapping is provided by Mulder
et al. [
5] and with a special focus on Africa by Dewitte
et al. [
6].
The lithologies and the soils of the Lake Manyara basin have complex genetic origins. The Proterozoic gneissic basement, tectonic and volcanic processes, as well as the (paleo-)hydrological processes and the sedimentation of the paleolake Manyara influence soil formation. This results in a small-scale distribution and fuzzy transitions of today’s soils, topsoils and outcropping lithology, which cannot be depicted by the available soil map for the region with a scale of 1:2,000,000 [
43]. Consequently, the categorization of soils is a complex process due to their three-dimensional nature. Hence, remotely-sensed surface features yield auxiliary information of topsoil characteristics and their distribution. Combined with topographic information, the analysis results in valuable information that allows also a rough identification of soil types.
The aim of this study is to map the distribution of the topsoil and surface substrate characteristics using multispectral, topographical and SAR input data. The laboratory analysis of surface samples provides soil properties used to categorize and characterize the topsoils and surface substrates. In order to improve the topsoil classification, we followed a multiscale approach using: (i) image object segments from a high-resolution WorldView-2 scene; (ii) low-resolution ASTER multispectral data and indices; (iii) X- and C-band SAR backscatter; as well as (iv) topographical derivatives. We compare and discuss the final mapping results with soil catenae covering characteristic transects of the study area.
2. Study Area
The study area is located within the East African Rift System of northern Tanzania; in the surroundings of the Makuyuni village. The area is drained towards the west by the Makuyuni River disemboguing into the endorheic Lake Manyara Basin (
Figure 1). The precipitation calculations from the daily Rainfall Estimate Product 3B42 (V7) of the Tropical Rainfall Measurement Mission (TRMM) show a bimodal rainfall pattern for the years 2000–2013 [
44]. For this period, the average annual precipitation of 651 mm is mainly caused by two wet seasons. One occurs between November and January and a second between March and May [
45]. This results in a sparsely-vegetated semiarid environment dominated by bushy grassland. The study area is also characterized by a variety of degradation processes due to long dry periods and short, but intensive rainfall events, as well as contributing anthropological factors, like overgrazing [
46].
The lithology of the study area is very complex, because different lithological units interleave here. The underlying basement of the Masai Plateau is formed of Proterozoic intermediate quartzite and gneisses and is exposed by tectonic faults [
47]. Explosive volcanism, especially from the volcano Essimingor, and faulting associated with the rifting of the basin produced alkaline lavas, like alkali basalt, phonolite, nephelinite and tuffs. The volcano Ol Doinyo Lengai (90 km north of the study area) has a carbonate volcanism, and its carbonate tephra deposits are widespread [
47,
48,
49]. Lacustrine and fluvially deposited sediments can be found 140 m above today’s level of Lake Manyara. The so-called Manyara Beds crop out where the Makuyuni River and gully system incise into the lacustrine and terrestrial deposits. The lower member of the Manyara Beds is of lacustrine origin and is composed mainly of mudstones, siltstones, diatomites, marls and tuff that have been deposited in a reducing environment. These sediments have an age of approximately 1.03–0.633 Ma. A tephra layer, which was dated to 0.633 Ma, marks the transition of the younger upper member of the Manyara Beds [
50,
51,
52].
5. Results and Discussion
The comparison of different input feature groups shows that all additional input features increase the overall accuracy of the classification (
Table 6). The classification of only the spectral bands of WorldView-2 with an RBF-kernel reaches an accuracy of 62.9%. By incorporating more features from the ASTER data, SAR scenes and topographic indices, an overall accuracy of 70.4% was achieved. By conducting the classification with the parameters selected by RFE (
Table 7), the highest accuracy of 71.9% was reached. The application of a linear kernel instead of an RBF-kernel led to lower accuracies.
Table 6.
Overall accuracies for different input feature groups. RBF, radial basis function.
Table 6.
Overall accuracies for different input feature groups. RBF, radial basis function.
Input Feature Groups | No. of Input Features | C | g | Overall Accuracy |
---|
WorldView-2 | 8 | 1896.0 | 0.0625 | 62.9% |
WorldView-2 + WV2 derivatives | 8 + 11 | 185,363 | 0.00006 | 63.7% |
WorldView-2 and SAR scenes | 8 + 5 | 16 | 0.0625 | 65.2% |
WorldView-2 and ASTER bands / indices | 8 + 33 | 181.02 | 0.0055 | 64.4% |
WorldView-2 and topographic parameters | 8 + 16 | 65,536.0 | 0.00005 | 67.4% |
All available parameters | 73 | 2.82 | 0.0883 | 70.4% |
Selection from RFE (linear kernel) | 24 | 1.41 | - | 66.6% |
Selection from RFE (RBF-kernel) | 38 | 4.00 | 0.1250 | 71.9% |
Table 7.
Relevance ranking of RFE selected input features. SD, standard deviation.
Table 7.
Relevance ranking of RFE selected input features. SD, standard deviation.
RFE Rank | Input Feature | RFE Rank | Input Feature | RFE Rank | Input Feature |
---|
1 | Geomorphons | 2 | MRRTF | 3 | WV2—Band 3 |
4 | Ferric Iron (Fe3+) Index | 5 | WV2—Band 1 | 6 | Calcite Index |
7 | AlOH Group Index | 8 | WV2—SD Band 3 | 9 | Ferrous Iron 1 Index |
10 | Ferric Oxide Index | 11 | RBD8 | 12 | WV2—Band 8 |
13 | WV2—Band 4 | 14 | WV2—SD Band 1 | 15 | Alteration/Laterite Index |
16 | ASTER SWIR Band 6 | 17 | Opaque Index | 18 | Clay 2 Index |
19 | MgOH 2 Index | 20 | WV2—SD Band 4 | 21 | WV2—Band 2 |
22 | Kaolinite Index | 23 | Terrain Ruggedness Index | 24 | TSX1 HH intensity |
25 | Envisat ASAR (1 October 2011 VV) | 26 | WV2 NDVI | 27 | ASTER SWIR Band 4 |
28 | Morphometric Protection Index | 29 | WV2—Band 7 | 30 | ASTER SWIR Band 3 |
31 | WV2—SD Band 8 | 32 | WV2—Brightness | 33 | Envisat ASAR (2 August 2011 VV) |
34 | Elevation (height a.s.l.) | 35 | Topographic Wetness Index | 36 | Texture (homogeneity) |
An RFE was performed for the dataset with all 73 input features. The RFE shows that with seven input features, an accuracy exceeding 60% can be attained (
Figure 5). The classification accuracy for the SVM, with an RBF-kernel, peaks with a selection of 36 input features, then performs relatively stable until the maximum number of input features is reached. The so-called Hughes phenomenon, which describes the decrease in classification accuracy when additional input features are added to an already large dataset, cannot be observed with the RBF-kernel [
100]. Yet, a small decrease can be noted for the linear kernel (
Figure 5). The 36 input features from the RFE selection represent all input feature groups (
Table 7). Out of the first seven input features, two are topographic indices. The MRRTF results in high values for flat elevated areas [
65], and the geomorphons (geomorphologic phonotypes) classify the topography into landscape elements [
63]. Both features describe the position of the target classes in the study area. WV2 contributes, along with the spectral Bands 3 and 1, two further input datasets. The ASTER Calcite Index and the Ferric Iron (Fe³
+) Index may explain the distribution of the two target classes with high CaCO
3 content (Classes 2 and 3) and the topsoil class with iron oxide properties (Class 8). The AlOH Group Index may support the discrimination of clay minerals [
56].
Figure 5.
Accuracy curves from RFE for a linear and an RBF-kernel.
Figure 5.
Accuracy curves from RFE for a linear and an RBF-kernel.
The confusion matrix of the RFE-selected input feature dataset reveals that the most competitive classes, concerning the user’s and the producer's accuracy, are Class 2 “carbonate-rich substrates” and Class 3 “calcaric topsoil” (
Table 8). Both classes have high carbonate content, and the topographic position is overlapping. The difference between both classes is related to the amount of CaCO
3 concretions, which are much higher in the lacustrine deposits. If we were to merge both classes, the overall accuracy would reach 79%. However, the visual validation shows a reasonable distribution for both classes. Class 3 also overlaps with Class 4 “dark topsoil”. Class 4 is associated mainly with colluvial and fluvial deposits and shows low CaCO
3 content. The transition to Class 3 is gradual. The low producer’s accuracy of Class 5 “tuff outcrop” can be explained by the relatively small area of these outcrops. The producer’s accuracy of this particular class is higher (75%) when only applying the WV2-related input parameters, but the medium-resolution information of the ASTER- and DEM-derived features seems to corrupt the correct identification.
Table 8.
Confusion matrix for RFE classification with RBF-kernel (C = 4, g = 0.125).
Table 8.
Confusion matrix for RFE classification with RBF-kernel (C = 4, g = 0.125).
| Classified Data | Producers Accuracy |
---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|
Reference Class | 1 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100% |
2 | 0 | 8 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 62% |
3 | 0 | 5 | 12 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 60% |
4 | 0 | 0 | 2 | 19 | 0 | 0 | 0 | 0 | 0 | 1 | 86% |
5 | 0 | 1 | 0 | 0 | 5 | 1 | 2 | 1 | 0 | 0 | 50% |
6 | 0 | 0 | 0 | 0 | 0 | 9 | 3 | 0 | 0 | 0 | 75% |
7 | 0 | 1 | 1 | 0 | 0 | 0 | 12 | 0 | 0 | 0 | 86% |
8 | 0 | 0 | 1 | 2 | 0 | 0 | 1 | 10 | 2 | 0 | 63% |
9 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 6 | 0 | 60% |
10 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 83% |
Users Accuracy | 100% | 53% | 48% | 73% | 100% | 90% | 67% | 91% | 75% | 91% | Overall Accuracy 97/135 = 71.9% |
“Carbonate-rich substrates” mainly represent the lacustrine lower member of the Manyara Beds, which are exposed prevalently at the foot of slope and mid-slope positions of the Makuyuni River system, as well as in associated gully systems (
Figure 6). The class “calcaric topsoil” indicates soils that show an enrichment of CaCO
3 due to inputs from carbonatic volcanic ash deposits or development processes upon the “carbonate-rich substrates”. In some cases, CaCO
3-rich soils developed on secondary translocated carbonates or consist of eroded soils exposing CaCO
3 concretions. The latter ones were identified during fieldwork in areas with higher slope degrees or large specific catchment areas. “Tuff outcrops” (Class 5) were recognized at a stratigraphic position above the lower member of the Manyara Beds, which coincides with the results of fieldwork and reviewed scientific literature [
47]. The outcrops are too minuscule to be displayed in the map (
Figure 6). The class “reddish topsoil” is identified with satisfying accuracy. This class is located mainly on stable flat ridge tops and is used agriculturally. Consequently, topsoils are disturbed and reworked by ploughing activity, bringing leached CaCO
3 back to the surface (
Table 5). This makes the difference in Class 7 “silica-rich topsoil”. These soils are not disturbed, and consequently, silica enriches at the surface due to selective erosion processes. “Silica-rich topsoils” and “reddish topsoils” developed on the Proterozoic intermediate quartzite and gneisses of the Masai Plateau, occur especially in the south of the study area. However, also, these areas were subject to carbonatic volcanic ash deposits. The topsoils with iron oxide properties (Class 8) occur in association with mafic ridges (phonolite, nephelinite) or along the slopes of the Essimingor volcano (
Figure 6). Class 9 “Mafic-dominated cover beds” was identified well. Like Class 8, Class 9 can be found at the volcano slopes and on the mafic ridges. Since the cover beds are densely vegetated by shrubs, only small, vegetation-sparse areas were used for the classification. The “mafic river beds” are often covered by vegetation and water. Nevertheless, the mafic material at point bars in the Makuyuni River was traced with high accuracies.
Figure 6.
Final classification of topsoil distribution in the study area.
Figure 6.
Final classification of topsoil distribution in the study area.
Out of 24 soil profile analyses conducted in the study area, we identified seven main soil types (see
Figure 7). In the following, we show that these topsoils can be related to or associated with specific WRB soil types according to the applied catena approach. Vertisols are found in flat areas and in depressions characterized by high clay contents and representing formerly wet positions, related to a high biomass production. They are associated with “dark topsoil” (Class 4). Vertisols occur in association with Vertic Cambisols (Clayic) (Soil Profile 1;
Figure 7b) that also relates to the pedo-lithological Class 4 “dark topsoil”. In the study area, Calcisols occur with lacustrine “carbonate-rich substrates” (Class 2) and “calcaric topsoils” (Class 3), which are characterized by eroded Luvisols exposing CaCO
3 concretions.
Andosols are located on flat and stable ridge positions with low erosion potential. These soils developed from parent material of volcanic origin, such as volcanic ash, tuff and pumice. They show high mineral proportions indicating fertile soils suitable for crop production. In our analysis, Andosols co-exist with “reddish topsoils” (Class 6). Cambisols are widely distributed in the study area and occur mainly on relatively flat mid-slope positions. Along the Makuyuni River terraces, they are distinguished as Cambisols (Colluvic) (Soil Profiles 15–17;
Figure 7). On flat ridge positions, they develop as Andic Cambisol (Soil Profiles 6, 8 and 9,
Figure 7). Rhodic Cambisols (Soil Profile 20;
Figure 7c) are particularly located on intensively-used agricultural fields and correlate with “reddish topsoils” (Class 6), showing a dark reddish brown 5 YR 3/4 Munsell® color for the first 15 cm of soil depth. Cambisols and Luvisols are associated with each other and correlate with “silica-rich topsoils” (Class 7) and “reddish topsoils” (Class 6). The Haplic Ferralsol (Soil Profile 14;
Figure 7d) correlates with “silica-rich topsoil” (Class 7). These soils developed on a weathered felsic basement.
Figure 7.
Section of the classification with soil profile transects. (a) Map with soil profile transects; (b) Soil profile transect 1 (SSW – NNE orientation); (c) Soil profile transect 2 (NE–SW orientation); (d) Soil profile transect 3 (W–E orientation).
Figure 7.
Section of the classification with soil profile transects. (a) Map with soil profile transects; (b) Soil profile transect 1 (SSW – NNE orientation); (c) Soil profile transect 2 (NE–SW orientation); (d) Soil profile transect 3 (W–E orientation).
The resulting map provides a very detailed distribution of topsoils and surface substrates for the study area, which outcompetes other spatial soil information available for this region, like the official soil map by De Pauw [
43], the 250 m Africa Soil Information Service (AfSIS) product [
42] or the products from the Soil and Terrain Database (SOTER) program [
101]. Furthermore, the comparison with the soil profile catenae shows that the detailed topsoil information can be related to specific WRB-based soil types with little additional fieldwork and/or expert knowledge. Nevertheless, providing detailed information on topsoils and surface substrates in comparison to other DSM studies [
31,
32,
42] remains the main intention of the paper.