**The Recent Sediment Record as an Indicator of Past Soil-Erosion Dynamics. Study in Dehesa Areas in the Province of Cáceres (Spain)**

#### **María Teresa de Tena Rey \*, Agustín Domínguez Álvarez and Lorenzo García-Moruno**

Department of Graphical Expression, University of Extremadura, 06800 Mérida, Spain;

adomguez@unex.es (A.D.Á.); lgmoruno@unex.es (L.G.-M.)

**\*** Correspondence: mtdetena@unex.es; Tel.: +34-924-28-93-00

Received: 14 September 2019; Accepted: 28 October 2019; Published: 2 November 2019

**Abstract:** The work presented is a study of the recent sediment deposits in a pilot basin in dehesa areas in the province of Cáceres (Spain) through analysis of the sediment record, radiocarbon dating and correlation with historic data to assess the factors that conditioned the deposit in these areas over time. It is a qualitative study based on the important role of sediments as recorders of history, given that sediment facies and their architecture provide one of the best records of past processes and environmental factors. For the study, sediment profile surveys were used to determine the configuration and characteristics of the infill and its chronology. The sediment model of the facies studied is associated with a context of slope water erosion that led to the infill of the watercourse areas, mainly sand and fine gravel, where alterations in the normal rate were detected due to the insertion of a thicker level of materials (soil stoniness) that was able to be dated. The sediment and chronological results obtained can be used to determine the historical events in the area that could have affected the erosion and deposit processes in the basin for the estimated period, from the late 18th to the early 19th century. During this period, pastureland that maintained the ecological balance of the dehesa, with a balanced, stable displacement of soil particles, was converted to cropland, in most cases resulting in soil with a limited profile, overuse and the consequent loss of structure and texture, making it more vulnerable to erosion. Greater remobilisation would have carried thicker material to the watercourses than the material deposited as a result of limited ploughing. This study provides data for the dehesa areas studied with regard to their hydrogeomorphological dynamics, from which past environmental impacts due to tillage can be inferred.

**Keywords:** sediment; land use; erosion crises; environmental impact

#### **1. Introduction**

Sediment studies are a source of information for inferring, through sediment sequence analysis, the conditions that have governed sedimentation processes in an area over time. Sediment facies and their architecture provide one of the best archives of the evolution and transformation of terrestrial systems, because they preserve the record of past processes and environmental factors that have occurred naturally or, in recent times, through human intervention [1,2] including soil erosion. Historical reconstruction of the effects humans have on their surrounding environment shows that the period of greatest impact was the last few centuries, and in recent decades the role of sediments as recorders of history has become increasingly more valued [3–5]. The sediment record can indicate the nature and level of the environmental impact of past events, because although we can use landscapes to deduce the relationship between humans and the environment [6], the traces of these actions can be blurred or erased. The sediment record, however, does not suffer the same fate. By characterising the sediment record, we can reconstruct the erosion processes that occurred in the source area in the past.

Starting from this premise, and applying it to dehesas (agrosilvopastoral areas), the study focused on recent sediment infill in a pilot basin in the province of Cáceres established by the Physical Geography department of the University of Extremadura to study the current hydrogeomorphological processes operating in dehesas in this region, including soil erosion, hydrological balance and rain interception by the holm oak [7–11] and sediment volume measurement and characterisation [12].

The objective of the work was not limited to a contemporary perspective, in which it is difficult know at what stage of the dynamic the system is in. It is a historical study of accumulated sediment, using the interpretation of the sequence to determine the circumstances that have conditioned the deposition processes in the basin over time and the possible influence of humans through varied land-use management practices.

Dehesas make up more than half the useful agrarian area in the west–southwest provinces of Spain and occur in 53% of municipalities in Extremadura. These ecosystems combine forestry, agriculture and livestock raising, and are a paradigm of sustainable development. Rational harvesting of a dehesa's natural resources can optimise production yield and ensure the ecological stability of the system [13]. Any study of this regionally important, extensive ecosystem is relevant, in particular those that provide a greater understanding of its functioning and limitations, to avoid clearly unsustainable practices. In this context, it is particularly important to study sediment deposits as a historic record of the events that have shaped degradation processes in the source area, because soil erosion is one of the issues that the dehesas are now facing.

The small basins of the streams that drain the peneplain dehesas collect sediment from the slopes, and the deposits from past erosion processes are, therefore, stored in the watercourses. The watercourse areas in the study basin and the adjacent areas form paleovalleys that have been filled in by materials from the surrounding areas. These materials can be associated with historic erosion crises [7]. Sediment production on slopes is determined to a large extent by how the land is used. Degradation of dehesas in semi-arid areas is triggered by continued harvesting and overgrazing [14]. Knowledge of how these processes occur today is essential to interpret past events in the sediment record. Other experimental studies have addressed aspects such as management of agricultural land, land management and soil erosion and the impact of land abandonment [15–17]. Changes in erosion rates due to different land management practices in the past must be recorded in some way in the deposits that occurred under the conditions at the time.

Under this initial premise, the study comprises an analysis and interpretation of the sediment sequences. After the entire deposit is studied, any alterations must be detected and dated for correlation with the historical and documentary data available. This is, therefore, an interpretive work, based on the data obtained from the sediment and chronological study.

#### **2. Study Area**

The study area is in the southeast of the province of Cáceres, in the municipality of Trujillo, on the Trujillo peneplain created by erosion of slate and granite material from the Central Iberian Hercynican basement [18].

Upper Precambrian-Lower Cambrian successions occupy extensive outcrops of detrital sediment series made up primarily of slate and greywacke, ranging in altitude from 400–500 m (Figure 1).

The area presents a topography of gentle forms, with slight variations because the current hydrographic network is firmly established in the peneplain. The most significant feature is the absence of any high relief due to the predominance of Precambrian materials with medium to low resistance and a long history of erosion. The small pilot basin is at the head of the Magasca river, a tributary of the Almonte, part of the network of the Tajo. It has an area of approximately 35.4 ha and includes the upper basin of the 813 m-long Guadalperalón stream, whose downstream limit is at the measuring station used to study the erosion and hydrological processes operating in these areas.

**Figure 1.** Study area location. Source: authors' own figure.

It can be considered a typical dehesa, with similar features to many others in the region in terms of topography, rock substratum, soil, vegetation and current and past uses, all of which can be extrapolated to other areas of Extremadura [19].

Following the Food and Agriculture Organization (FAO) nomenclature, the soil in the area is dystric cambisol, with significant slate outcrops. In slope areas it is degraded to leptosols. In most of the area, the soil is less than 30 cm thick, with deeper soils occurring at the bottom of the watercourses [20], on alluvial deposits.

In these conditions, the most rational land use from a conservation perspective is livestock grazing in combination with controlled cultivation. While this would maintain the balance of the structure and low potential of the area, tillage erosion increases with the number of tillage operations [21], and the lack of an alternative livelihood or opportunity to obtain maximum return in the short term may have led to intensive cropping or overgrazing of the land in the past.

#### **3. Methodology**

The procedure used to recreate the conditions of the deposits in the study area was based on sediment and geochronological studies and correlation with historic and documentary data. This is a qualitative study, based on the important role of sediments as recorders of history.

After the deposit areas in the basin had been identified and defined, the study was conducted by surveying small sediment columns throughout the available sections in the basin. The levels and structures appearing in the sequence were examined, measured and described. The initial premise was that the conditions occurring during the sedimentation process must be reflected in some way in the sediment sequence.

The laboratory work comprised textural analysis of the sediment samples from each level and radiocarbon dating of charcoal samples collected in these levels.

#### *3.1. Sediment Study*

The deposits accumulated throughout the watercourse areas occupy an area of around 18,800m2, representing approximately 5% of the total area of the basin [22], and are strongly incised by gullies. In the middle and lower section of the bed, where the incisions reach the substratum, sediment

thickness can be measured directly. These areas provide the sections where the sediment profiles can be surveyed, because the entire sediment infill can be easily observed to describe the vertical and lateral variations it has experienced. In the upper section of the bed, the material observed is almost homogeneous and the sediment infill thickness is limited. Using geophysical methods and applying vertical electric soundings (VES) when the conditions of the deposit outcrop prevented the use of other methods, we were able to study the sediment sequence indirectly and correlate the levels detected in the gully areas [22].

The representation of the sediment profiles shows the type of material, the thickness of the deposit, and the recognisable sections or levels.

Seven columns were surveyed across an area of approximately 140 m, starting from the first profile (P1) 180 m upstream from the final part of the study area (location of the measuring station) and continuing downstream (Figure 2).

**Figure 2.** Location map of column survey areas. Source: authors' own work.

No distance was set between the profiles. The survey points were chosen in the field after identifying the sections where it would be easiest to measure and describe the levels, following the evolution of the sediment sequence. The final column (P7) was surveyed in the area next to the measuring station. The strong incision by gullies down to the rock bed permitted a complete survey of the sediment columns. Because of the continuity and quality of the sections in the study area, it was possible to establish correlations and identify variations observed in the deposit sequences. The levels identified in the deposit sequences were shown in a correlation diagram.

The sediment was texturally characterised by taking 12 samples of approximately 1 kg across the three levels differentiated in the surveyed columns. The textural analysis was performed at the Extremadura Regional Government Agricultural Laboratory. The content of coarse elements (expressed in %) was determined and the fine elements (sand, silt and clay) were used to define the texture class.

#### *3.2. Sample Dating*

Radiocarbon (or carbon-14) dating was used to date the sediment deposits. The material chosen was charcoal, because it was likely to occur in the deposits. Seven samples were collected, corresponding to the lower area, intermediate area (location of larger grain pebbles) and upper area in each profile to determine the age of each level. Dating by the conventional method was unsuccessful because of the small amount of carbon in the samples.

Accelerator mass spectrometry (AMS), which requires 10 mg carbon, was attempted. This method dated sample No. 1 (M-1), collected at the intermediate level of largest grain size (Figure 3) and identified as Ua-22602, but obtained no results for the other samples because they contained insufficient carbon.

**Figure 3.** Profile (**A**) and detail of level (**B**) where sample M-I was collected. Source: authors' own work.

It was possible to date only one sample, collected at a depth of 50 cm, in the level with the elements of largest grain size detected in the sediment sequence. No results were obtained for the samples taken in the base and the upper part of the deposit. However, because the dated sample was found at a level that can be followed throughout much of the basin (guide level), it was possible to correlate the age obtained throughout this level and relate its occurrence to the processes that shaped its formation.

#### **4. Results**

The overall deposit sequence occurring in the study basin was determined through a complete survey of the sediment infill, by direct observation throughout each profile. The sediment profiles were indicated by drawing all the levels distinguished and the thickness of each one, as shown in the example of profile 1 (Figure 4).

The data for the thickness of the levels differentiated in each profile are shown in Table 1. The lower thickness of the deposit closer to the measuring station can be seen. Level 2 is absent in the final profile surveyed.

With regard to texture, the predominant sediment in the overall sequence comprises around 62% coarse elements, mainly fine gravel. The texture of the fine fraction (sand, silt and clay) present in most of the samples analysed is sandy loam.

These results indicate that the infill materials in the watercourse areas are mainly sand and fine gravel with a silty matrix and scattered pebbles. A level of mainly medium-coarse gravel of 1.6 to 3.2 cm occurs in the mid-upper level of the deposit, and the sediment below and above this level has similar characteristics. The sequence was divided into three sections and their thickness was measured and described in each column surveyed. The upper level, named level 1, is located above the section with a predominance of medium gravel, named level 2. Level 3 is located in the lower part of the sequence, above the slate bed.

**Figure 4.** Sediment profile 1 (**A**). Detail of the medium-coarse gravel level (**B**). Source: authors' own work.


**Table 1.** Thickness of each level in the sediment profiles surveyed in the deposit areas.

The upper part of level 1 culminates in a level of fine materials enriched with organic matter, the base of the herbaceous layer.

Contact between level 2 and the upper and lower levels is mainly irregular and at times poorly defined, but at some points it is well defined.

The correlation of the sediment columns is shown in the drawing in Figure 5, which shows the decreased thickness of the deposit further downstream. In the last profile surveyed it was not possible to distinguish the larger grain size level.

**Figure 5.** Correlation of the sediment profiles surveyed in the study area. Different images are shown corresponding to points of the deposit area where the profiles have been carried out (P1-P7). Source: authors' own work.

#### *4.1. Overall Deposit Sequence*

The sediments that make up the watercourse areas are terrigenous, with a predominance of sand and fine gravel in which facies of sand, clay and silt were identified. These materials tend to accumulate, creating a deposit with a thickness ranging from a few centimetres in some areas to 1.80 m in others. The silt and clay in the upper part of the profile are dark, with abundant edaphic features.

A level with a higher percentage of coarse elements with medium-coarse gravel particles is inserted in the mid-upper part of the profile. In situ measuring revealed sizes of 16 to 32 mm. These elements are mainly angular slate and quartz pebbles and scattered boulders. This level can be followed throughout most of the sediment infill of the bed, primarily in the lower section, where the greater thicknesses were recorded. The direct data from the survey of the columns indicated that it is located at a depth of 14 cm to 50 cm, occurring at lower depths in the thicker profiles. The minimum thickness measured in this level was 10 cm and the maximum was 50 cm, although the level was not identifiable when the deposit was less than 30–40 cm thick.

The sediment model that makes up these facies is associated with a context of slope erosion that resulted in the dense fraction and the sand. The soil corresponding to the fine fraction was displaced by rainwater.

Rainfall and runoff provide the initial energy necessary for the process of dislodging and transporting soil particles, and the topographical features determine the energy of the water current for transporting the particles. Similarly, soil composition and structure (particle size, cohesion, etc.) affect the capacity of the rainfall and runoff energy to carry soil particles. Therefore, human activity, through cropping, is determinant in altering soil erosion susceptibility. The flow erodibility coefficient depends on the type of soil, and changes in this parameter produce considerable changes in sediment production [23].

The sediment of the sequence studied represents the material corresponding to soil erosion in the area where the basin is located; i.e., soil of the Cáceres peneplain occurring on slate, with shallow depths of 25 to 50 cm. In the slope erosion process, a greater transport capacity was registered for rainfall, seen in the level of larger grain pebbles identified in the sequence. This could be due to the higher amount of stones in the soil profile caused by more intense tillage than in an earlier period. Soil redistribution by mechanical displacement during tillage has been recognised as a process of intense soil degradation (mechanical erosion, also known as tillage erosion). Empirical models describing the mechanisms of mechanical soil redistribution have shown that most agricultural tools used in very diverse farming conditions generate very high rates of soil remobilisation [24–29].

The deposit analysed can be interpreted as the effect of soil redistribution caused by past agricultural practices, entailing a modification or interaction in the water-erosion processes that altered the normal rhythm of the deposit of the sediment sequence in the study area, displacing larger grain size material.

#### *4.2. Radiocarbon Dating Results*

The results of dating sample M-1, identified as Ua-22606, provided a conventional radiocarbon age of 180 ± 30 BP, corresponding to a calibrated result (2 sigma, 95% probability) of Cal 1650 to 1950. The calibration graph is shown in Figure 6.

**Figure 6.** Calibration curve for sample Ua-22606.

This calibrated age range indicates that the materials deposited above the level where the sample is located are later than 1650 and have a 56% probability of being later than 1720–1820.

Because the results were obtained from the sample taken at the bottom of the level of coarse pebbles (indicative of a change in the conditions of the deposit), a connection can be made to events in the environment that are documented, occurred over time, and could have influenced the variation in the conditions of the deposit. Although carbon 14 does not allow a specific date to be assigned to an occurrence, it helps to identify a possible period for investigation.

If the dated sample had been taken at a level with continuity in the rhythm, it would be possible only to indicate the interval in which the deposit might have occurred. In a deposit where the sequence shows no alteration or variation, it is difficult to determine the time frame if there is no known external episode that it can be referenced to. Unfortunately, no other examples were found for dating to determine the age of the sediment at the lowest or the uppermost part of the deposit and extend the chronology of all the materials. However, dating provided an important reference to study the recent evolution of these spaces.

#### **5. Discussion**

The sediment study shows that the predominant sediments in the overall sequence are sand and fine gravel with a silty matrix, with the insertion of a level with a high percentage of coarse elements (close to 80%) of medium-sized gravel. After this level was measured in each column, it was shown to have a variable thickness (maximum 50 cm, minimum 14 cm). It is located at a depth of 50 to 20 cm, decreasing in thickness and depth as the deposit becomes less thick, and is not identifiable when the profile has a limited thickness. Therefore, it is not a lens without lateral continuity, because it is distinguishable and continues in each column surveyed. Interpretation of vertical electric soundings [22] also shows these variations in deposit grain size at other parts of the basin where the sequence was not observed directly.

At the level with the largest grain size identified, remains of carbonised organic matter collected at a depth of 0.5 m show a conventional radiocarbon age of 180 ± 30 BP. This age corresponds to a calibrated result (2 sigma, 95% probability) Cal of 1650–1950, with a higher probability in the interval 1720–1820. It is therefore likely that the higher grain size level is later than this time interval.

The sediment model that makes up these facies is associated with a context of slope water erosion that caused the infill in the watercourse areas. The infill is terrigenous, with a predominance of sand and fine gravel in which the level of coarser pebbles must be associated with a variation in the normal rhythm of sedimentation, resulting in displacement of larger grain elements. This greater displacement could be associated with agricultural tasks, because tillage has considerable impact on water erosion processes and greatly increases soil particle mobilisation [29].

Studies of gully erosion in an experimental basin (Parapuños) near the study area reported a similar level of material of fluvio-colluvial origin, with a percentage of coarse fragments inserted in the infill, indicating a variation in the soil erodibility coefficient comparable to that of the study area. After field observation followed by grain size analysis [30] reported a level of coarse material (>2 mm) in all profiles. This layer was located at different depths of each profile and at least 40% of the sample material was coarse. However, in most of the profiles, the coarse material it contained was more than 70% of the total sample. This level mainly comprised fragments with a diameter of 2 to 6 cm, but in some cases as much as 20 cm.

Obtaining similar results in areas close to the study basin was a very significant finding, because they have common source areas in which the same model of sediment production must have occurred.

After analysing all the data, it was necessary to return to the study objective of assessing the deposit conditions in the basin to determine whether the sedimentation process was natural or human-accelerated.

Using the sedimentation and chronological results obtained, it was possible to determine the historical events that might have influenced the erosion and deposit processes in the basin. The data for the land-use changes in the dehesa in the estimated period were studied, focusing on any modifications caused by human activity. The documentary sources for the estimated period do not refer to any exceptional rainfall events associated with crises or epidemics that could have caused anomalous transportation of materials or high erosion rates.

In the late 18th and early 19th centuries, historical events led to changes in land use in the region. This period was the culmination of a policy that began in 1770 to reallocate land for crops. A Royal Decree of 1793 declared that extensive tracts of land previously available for transhumance were to be used for grazing and cropping [31].

The land reform, initiated by Godoy, was preceded by strong complaints about practitioners of transhumance, who were blamed for the lack of cropland, the decline of agriculture and the debilitation of the region [32].

Pastureland that kept the dehesa in ecological balance was converted to cropland, in most cases resulting in soil with a limited profile, overuse and the consequent loss of structure and texture, making the land more vulnerable to erosion. The rock fragments characteristic of the soil were more exposed to displacement processes after the land was tilled [21,28]. Soil disintegration and redistribution due to agricultural tasks have been recognised as a process of intense soil degradation [29,33]. The accumulated results of this process are evidenced in recent deposits in the watercourse areas.

During this phase of extensive, widespread ploughing, a considerable amount of materials would have reached the watercourses and gradually filled in their beds. However, the infill would have had little stability, because the beds were exposed to active processes from the concentration of surface runoff [21] that created deep gullies along the watercourses.

To a certain extent, the introduction of the agricultural reform, followed by the decline of long-distance transhumance, marked a turning point in land use in the region. The reform changed agriculture by increasing the area under crops after the granting of wasteland and uncropped land, although some of the land was cultivated only for a short time or left untouched because of its poor quality or the high costs of farming it.

It is likely that the dehesa where the Guadalperalón and Parapuños basins are located was no exception to these land-use changes, which must have had a repercussion or impact in the short to medium term. The results of analysing the sediments corresponding to this historical period in a typical dehesa such as the pilot hydrographic basin studied should be considered in this light.

#### **6. Conclusions**

Using the data provided in this study, we can assess the possible contribution of tillage erosion in the past to the evolution of the landscape. Studies of this kind, which are lacking for Extremadura, can be used to reconstruct many aspects of past actions in these spaces with regard to their hydrogeomorphological dynamics, and have the added value of completing the findings of other studies, thus helping to increase knowledge about recent processes in these areas of high regional significance.

The sediment study of the watercourse areas indicates an alteration in the sediment sequence (level of coarse elements of larger grain size) attributable to changes in the deposit conditions. In this qualitative study, the materials in the level where the grain size changed were dated, providing an age range in which there were no data about the chronology. This is important for identifying the historical context in which the deposit occurred; i.e., the late 18th century.

The continuity of this level of coarser pebbles detected in the deposit sequence (continuing through a large part of the bed) and in other watercourses nearby indicates that the origin of the change was a process associated with the entire source area rather than an isolated process. It must have continued over a sufficient period to make it significant in the overall sequence, interpreted as slope erosion. After the change in grain size occurred, there was a further period of stabilisation.

Although it is difficult to assess the scope of the historical change in land use and determine whether it can be considered an environmental crisis, the almost constant appearance of this level of larger grain size materials in the infill sequence of a nearby experimental basin with similar environmental features [30] supports the idea of a repercussion on the environment caused by historic events that occurred during the period studied. However, more detailed studies of the agricultural history are needed to verify the agricultural transformations in the study areas.

With regard to the limitations of the study, the pilot basin was chosen as the experimental area despite its limited scope because its uses and physical and geological features make it a typical dehesa. It is also located in an extensively studied seminatural ecosystem that forms part of the drainage network degrading the Cáceres peneplain. Because of this, it can be considered representative of the evolution of the area it belongs to and optimum for possible data extrapolation.

It should be noted that although it was possible to date only one sample from the sediment level where the alteration in the rhythm of the sequence occurred, the dating can be considered valid because the sample is located at an extensive guide level that continues along much of the deposit. The carbon dating indicates the interval from which sedimentation of this level occurred and identifies the period, allowing the deposition to be related to documented historical events. Although the exact period is not defined, the range is important because it provides sedimentary evidence to investigate the cause of the larger grain-size level. If no alteration in the rhythm of deposition had been detected in the sequence, it would not have been possible to relate the deposit of the material to external processes that took place in the area during the period when the deposit occurred.

Despite these limitations, the sedimentary evidence and radiocarbon dating revealed increased erosion of coarse material in the study basin that could have undergone the same historical transformations occurring in similar areas. This concurs with the historical sources cited, probably in the late 18th century and early 19th century, due to human intervention in the region after changes in land use. The results shed light on the environmental implications of past human activity, such as tillage operations, and have future applications for conserving this dominant ecosystem in a region of high economic and environmental importance, in which humans play a significant role in its regeneration and conservation.

**Author Contributions:** Formal analysis, M.T.T.R., A.D.Á. and L.G.-M.; Funding acquisition, L.G.-M.; Investigation, M.T.T.R., A.D.Á. and L.G.-M.; Methodology, M.T.T.R.; Project Administration, M.T.T.R.; Supervision, M.T.T.R., A.D.Á. and L.G.-M.; Writing—original draft, M.T.T.R.; Writing—review and editing, M.T.T.R., A.D.Á. and L.G.-M.

**Funding:** This article has been funded by the Government of Extremadura (Ref. GR18176).

**Acknowledgments:** Publication of this article has been possible thanks to the funding of the Government of Extremadura and the European Regional Development Fund (ERDF), reference GR18176, granted to the research team INNOVA (Diseño, Sostenibilidad y Valor Añadido) of the University of Extremadura.

**Conflicts of Interest:** The authors declare no conflict of interest

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Experimental Setup for Splash Erosion Monitoring—Study of Silty Loam Splash Characteristics**

**David Zumr 1,\*, Danilo Vítor Mützenberg 1, Martin Neumann 1, Jakub Jeˇrábek 1, Tomáš Laburda 1, Petr Kavka 1, Lisbeth Lolk Johannsen 2, Nives Zambon 2, Andreas Klik 2, Peter Strauss <sup>3</sup> and Tomáš Dostál <sup>1</sup>**


Received: 21 November 2019; Accepted: 22 December 2019; Published: 24 December 2019 -

**Abstract:** An experimental laboratory setup was developed and evaluated in order to investigate detachment of soil particles by raindrop splash impact. The soil under investigation was a silty loam Cambisol, which is typical for agricultural fields in Central Europe. The setup consisted of a rainfall simulator and soil samples packed into splash cups (a plastic cylinder with a surface area of 78.5 cm2) positioned in the center of sediment collectors with an outer diameter of 45 cm. A laboratory rainfall simulator was used to simulate rainfall with a prescribed intensity and kinetic energy. Photographs of the soil's surface before and after the experiments were taken to create digital models of relief and to calculate changes in surface roughness and the rate of soil compaction. The corresponding amount of splashed soil ranged between 10 and 1500 g m−<sup>2</sup> h<sup>−</sup>1. We observed a linear relationship between the rainfall kinetic energy and the amount of the detached soil particles. The threshold kinetic energy necessary to initiate the detachment process was 354 J m−<sup>2</sup> h<sup>−</sup>1. No significant relationship between rainfall kinetic energy and splashed sediment particle-size distribution was observed. The splash erosion process exhibited high variability within each repetition, suggesting a sensitivity of the process to the actual soil surface microtopography.

**Keywords:** splash erosion; rainfall simulator; splash cup; soil loss; soil detachment; disdrometer; rainfall kinetic energy

#### **1. Introduction**

The initial stage of the erosion process (splash erosion) occurs when raindrops with high kinetic energy hit bare soil, breaking down aggregates and detaching soil particles. Such particles are translocated a short distance from the raindrop's impact and they then settle on the soil's surface and block the interaggregate pores reducing the topsoil's infiltration capacity and accelerating the formation of surface runoff. Hence, understanding the relationships between various rainfall characteristics and splash erosion is important to be able to predict the dominant runoff mechanisms of unprotected soils and to determine the rainfall kinetic energy threshold for erosion initiation.

Various monitoring techniques have been developed over the years to measure the degree of soil detachment in relation to the kinetic energy of raindrops. Besides splash cups (which are used in this study), splash boards or tracers have also been used in previous studies (as reviewed by Fernández-Raga et al. [1]). When monitoring splash erosion, there are several considerations to take into account when developing study design:

(A) collection mechanism:


(B) sample preparation:


Each approach has its advantages and disadvantages. One needs to estimate the contributing area of the surrounding soil (Figure 1b), otherwise it is not possible to calculate the detached soil amount per specific area. This problem occurs (to some extent) in the Morgan setup [2] because some soil particles are transported within the sampling area (Figure 1a). Therefore, the optimum sampling area is a tradeoff between underestimating the splash, representativeness of the collected sample, as well as ease of sample handling [3].

**Figure 1.** Two methods to collect detached soil particles: (**a**) collector is around the soil sample; (**b**) collector is surrounded by the sampled soil.

Wei et al. [8] showed that the splash erosion rate is dependent on soil sample water saturation and texture. Sandy soil was not sensitive to degree of saturation whereas samples with increasing clay content were, and similar results were reported by Khaledi Darvishan et al. [9]. To the contrary, Watung et al. [10] did not observe any significant difference in soil splash under variable saturated conditions for tropical soil (Oxisol). Utilizing disturbed soil samples allows for the control of soil sample conditions. In situ measurement on undisturbed samples does not allow for the control of soil conditions; on the other hand the measurement may be more representative for a given location since soil structure and soil surface characteristics are preserved. Therefore, there are a lot of factors to consider when designing an experimental setup.

The splash-collecting device needs to have sufficient dimensions to trap most of the detached particles. As Leogun et al. [11] and Marzen et al. [12] showed, the amount of detached soil decreases exponentially with increasing distance from the soil sample. The transport splash distance increases with decreasing soil particle size. Fu et al. [13] added that the splashed distance is not only related to particle or aggregate sizes, but also to rainfall kinetic energy.

Transport distances ranged between 10 cm and 20 cm for aggregates with diameters between 0.5 mm and 1 mm and up to 35 cm for soil aggregates with diameters from 0.05 mm to 0.5 mm in a study by Legout et al. [11]. Most of the splashed particles were observed within 20 cm from the soil sample in Fu et al. [14] and within 35 cm in [12].

The most common problem with splash erosion experiments is that it is difficult to compare results across studies due to each author's differing experimental setup [15]. Many different splash cup setups with various sizes and trapping principles have been used to measure splash erosion. Pioneering studies were performed using single soil fractions and cylindrically shaped cups [16–18]. Kinnell [6] added a thin external ring around the soil sample to exclude surface runoff which may occur during ponding conditions. Scholten et al. [7] developed splash cups in which sample saturation was controlled, their setup maintained nearly constant water content of the sample, and allowed for the simultaneous draining of rainfall. Two types of splash cups (cups and funnels) were compared by Fernández-Raga et al. [4]. The funnels collected systematically more particles because they prevent particle transport back to the surrounding soil (backsplash). The most frequently used splash cup design is inspired by Morgan (1981) (for example, [9,19–22]). In all studies reported above, the sediment loss was determined by weighing the dry sample prior to and after the measurement period. The Morgan setup is also suitable for use in both indoor and outdoor conditions with disturbed or undisturbed soil samples.

In general, the splash erosion process is very complex and the reported results usually exhibit high variability. Angulo-Martinez et al. [23] evaluated the effects of rainfall characteristics, rainfall erosivity index and soil type with a linear mixed-effects model. The rainfall erosivity index explained 55% of the data variability but soil type did not have a statistically significant influence on erosion. Up to 74% of the variability within a single soil type was attributed to random effects. The role of slope (and upward/downward splash) and rainfall intensity were investigated. It was reported that slope altered the splashed particle-size distribution and the role of slope for total splashed material varied for various rainfall intensities [22]. The rainfall itself is also a very important factor, and authors emphasize the need of accurate drop-shape estimation in order to obtain adequate kinetic energy of the rainfall. Rainfall changes the surface microtopography [24] which may have further effects on water infiltration, surface water retention and surface runoff [25]. A common method for the analysis of surface relief changes is close-range photogrammetry [26]. It has been shown that especially loose soils are prone to a fast decrease in microrelief roughness, leading to accelerated soil erosion [27,28].

Rainfall kinetic energy (KE) is often estimated based on measured rainfall intensity (KE-I relationship) due to the lack of a direct rainfall kinetic energy measurement [29]. Lobo and Monilla [30] tested several KE-I relationships for various geographical locations and concluded that parameters of the KE-I relationships are site specific. Meshesa et al. [31] tested parametric relationships between rainfall intensity and rainfall kinetic energy using artificial rainfall, noting that artificial rainfall exhibits raindrops with different sizes than natural rainfall. Therefore, KE-I relationships derived under natural conditions should not be applied to rainfall simulators.

The relationship between the rainfall kinetic energy and the amount of splash erosion on a bare surface varies for different soil types and tillage practices. Most of the studies of splash erosion on real soils come from arid or semi-arid climates, such as the Mediterranean region, Loess Plateau of China or southern states of the USA [1]. In Central Europe, soil erosion processes have been studied extensively, but splash erosion has not usually been considered or evaluated. Rainfall in Central Europe often does not generate overland flow (due to low intensity and/or short duration), but soil detachment and resulting soil surface changes take place from the impact of first drops with sufficient kinetic energy [32]. The lack of knowledge of splash erosion rates on agriculturally cultivated Cambisols is the main motivation for the presented research.

In this study we present a splash erosion experimental setup which utilizes techniques from previously published works. We utilize the Morgan design and provide an open-source, easy to manufacture splash cup. The objective of these experiments is to determine the impact of rainfall kinetic energy on splash detachment for a typical agricultural soil in Central Europe. An associated aim is to evaluate the particle-size distribution of the eroded material and to analyze the effects of the rainfall kinetic energy on soil consolidation.

#### **2. Materials and Methods**

#### *2.1. Splash Erosion Collection Device*

The monitoring setup is designed for both indoor and outdoor measurements and is similar to the system proposed by Morgan [2].

The monitoring device includes (Figure 2): (1) the sediment collector, (2) the cylindric splash cup, (3) the photogrammetry reference targets, (4) the LED illumination ring, (5) the outlet for the sediment collection and (6) the splash cup holder. The sizes and proportions of the device components and the angle between the rim of the splash cup and the collector were adjusted to capture the maximum amount of splashed soil while allowing unchanged raindrop impact on the soil sample.

**Figure 2.** Splash erosion device and its parts and dimensions (units in centimeters).

The splash cup (2 in Figure 2) consists of a polypropylene cylinder with a height of 60 mm, inner diameter of 100 mm and a surface area of 7854 mm2. The cylinder's wall is 2.7 mm thick and the upper rim of the splash cup is sharpened. The bottom of the cup is perforated in order to allow for draining of percolated water. A fine mesh is placed in the bottom of the cup to prevent soil loss during manipulation and water percolation. Each tested soil is filled to one centimeter below the rim of the splash cup to prevent runoff in case of ponding water on the sample's surface. The sediment collector's purpose is to capture eroded (splashed) soil particles. Its walls are angled in order to funnel the water with soil particles to the outlet (5 in Figure 2) which leads to a storage bucket under the sediment collector. In the center, there is a holder (6 in Figure 2) which is used to support the splash cup above the sediment collector's base so that the soil sample is protected from the backsplash of soil particles that accumulate inside the collector. The holder does not have a bottom, therefore, water percolating through the soil sample can freely drip out. This water is not usually collected. The collector is made from a commercially available polypropylene bucket, and all the firmly attached components (the holder and the outlet) are butt welded to the collector. The CAD drawings of the splash cup's components as well as mounting procedure are freely available here: rain.fsv.cvut.cz/splashcup.

After every experiment, the detached particles still attached to the collector's walls or settled on the collector's bottom were washed into the outlet and added to the remaining eroded particles. The suspension of the collected sediment and rainfall water was then filtered, oven dried and weighed.

The splash erosion device was then prepared for photogrammetrical analysis of soil sample surface changes due to rainfall impact. Typically, 10 to 15 referenced photographs from different angles were taken for the successful reconstruction of a digital surface model. Therefore, around the splash cup there is a white ring with the photogrammetry reference targets (3 in Figure 2). An LED illumination ring (4 in Figure 3) was attached to the sediment collector to provide adequate illumination to ensure that there were no shadows on the surface. The specification of the LED light strip was: chromaticity 4250 K, power 12 W/m, 60 LEDs/m, luminous flux 1050 lm/m.

**Figure 3.** Sketch of the 10 fixed positions and the rainfall intensity distribution below the nozzles. The positions of four nozzles used in the experiment are depicted with a spray symbol, and the X marks denote additional positions where the rainfall intensity was also measured.

#### *2.2. Soil Sample Preparation and Analysis*

The soil used for testing of the splash collection system was taken from a topsoil horizon (upper 10 cm) of a cultivated field at the experimental site of Bykovice in Central Bohemia, Czech Republic. The soil type is classified as a Cambisol, and the texture corresponds to silty loam according to the World Reference Base (WRB) classification [33] (12.7% sand, 76.6% silt, 10.7% clay), CaCO3 content is <0.92%, pH 6.9, and total organic carbon 1.7%.

Soil was collected in April 2017 during seedbed conditions. The soil was transported to the laboratory, stripped of large organic residues (stems, roots), large clods and stones, and then air dried.

Collected soil was sieved to remove particles and aggregates larger than 10 mm before filling the splash cup. A piece of permeable geotextile was placed inside the cup to prevent the soil from passing through the splash cup's perforations. The splash cup was then loosely packed with the same amount of prepared soil to reach a similar bulk density to that of seedbed conditions (0.83 g cm−3). The soil sample was not compacted; we only distributed soil aggregates equally along the sample surface and removed any remaining organic residues. Then, the filled splash cups were placed inside the sediment collectors so that the splash cup's surface was level.

After each rainfall simulation the eroded soil particles were carefully washed out from the sediment collector, and the suspension of rain water and eroded sediment was transferred to the laboratory. The obtained sample was filtered on a paper filter with a mesh size of 5 mm, oven dried (at 40 ◦C) and weighed.

The dried soil was further analyzed using a laser diffraction particle-size analyzer (Mastersizer 3000, Malvern Panalytical Ltd., UK) to determine soil texture. We mixed the splashed material from all the repetitions to obtain enough soil for this analysis. Each soil sample was dispersed in distilled water and placed into an ultrasonic bath for 320 s to disaggregate the soil. Then the sample was analyzed by the laser diffractometer. Measurements were repeated 25 times for every sample. The procedure is described in detail by Kubínová [34].

#### *2.3. Rainfall Simulation*

A laboratory Norton Ladder type rainfall simulator was used to generate rainfall. The simulator had an experimental area of 0.9 × 4 m. The rainfall was produced by eight oscillating nozzles, type Veejet 80100, which were mounted in two parallel sections at 2.6 m above the soil samples. Tap water was used, water pressure was set to 32 kPa and rainfall intensity was controlled by the nozzle oscillating frequency. The average raindrop diameter generated by the simulator was 2.3 mm according to monitoring with disdrometers [35].

In this experiment we took advantage of the fact that rainfall intensity spatially varies over the experimental plot. Eleven positions with the rainfall intensity between 20 and 70 mm h−<sup>1</sup> were chosen for further testing. Figure 3 shows the spatial distribution of rainfall intensity. The pattern is based on the intensity measurements of the splash cup positions and inverse distance weighted interpolation. The rainfall kinetic energy was measured with the Laser Precipitation Monitor (LPM) by Thies Clima® and the KE-I relationship for the given rainfall simulator was established in advance of the splash erosion simulations. The rainfall simulation lasted 15 min, then the soil samples were collected and the splashed amount was analyzed. The whole procedure was repeated five times (totaling 55 samples analyzed).

#### **3. Results and Discussion**

The measured relationship between rainfall intensity and rainfall kinetic energy is shown in Figure 4. The observed trend of the KE-I is linear with the slope of 18.69. Compared to the published relationships for natural rainfall (e.g., [36–39]) the KE of the simulated rainfall is lower by approximately 35%. Similar results show that underestimation of the simulated rainfall kinetic energy were also obtained by Petr ˚u and Kalibová [40]. The KE-I relationship is strongly dependent on the rainfall simulator design, nozzle types and water pressure (e.g., simulators generating larger raindrops than the natural rainfall overestimate KE) [31].

**Figure 4.** Measured relationship between simulated rainfall intensity and kinetic energy.

The measured soil splash rate ranged between 10 and 2012 g m−<sup>2</sup> h−<sup>1</sup> for rainfall kinetic energy between 380 and 1450 J m2 h−1. The threshold kinetic energy needed to initiate the detachment process was identified by extrapolation to be 354 J m−<sup>2</sup> h−1. The recorded mass of the detached particles exhibits large variability across the five replicates at each position (Figure 5). The variability was higher for the positions where higher soil erosion was recorded (positions with higher rainfall intensity and kinetic energy). Similar variability during comparable experiments was reported in literature [23]. The variability could be explained by very complex soil erosion behavior which is influenced by size distribution and arrangement of the soil particles and aggregates on the sample's surface (random roughness).

**Figure 5.** Relationships between rainfall intensity (**A**) and rainfall kinetic energy (**B**) and splash erosion rate. The bars stand for standard deviation, the marker denotes the mean value.

The overall relationship between the mass of the eroded particles and the rainfall kinetic energy has a linear trend (Figure 5). The coefficient of determination is 0.7 which is higher than or in a similar range as presented in comparable studies (e.g., [6,15,41]). Table 1 summarizes selected splash erosion experiments from the literature to show how the experiments vary in setup. They show similar, most often linear, relationships between the amount of detached soil particles and various rainfall characteristics. The slope of the linear trendline differs in each study because of differing experimental setups (rainfall duration, sample preparation) and soil properties [6,15,42]. Bisal [17] and Mazurak and Mosher [18] already experimentally showed that the splashed amount is linearly dependent on drop size and velocity. Surprisingly, in contrast to the more recently published studies, Bisal [17] did not find a significant relationship between rainfall intensity and the amount of sand splashed as long as no ponding occurred on the sand's surface.


**Table 1.** Comparison of the results with the published splash erosion studies.

The observed splash–KE trendline is strongly influenced by duration of the rainfall experiment. Splash erosion varies over time, especially in the case of structured soils with developed aggregates that are initially broken down into smaller fractions by raindrops. It has been observed that the splash increases with decreasing aggregate size [45,46] and increasing event duration [47,48]. The fact that splash erosion is strongly dependent on surface microtopography is another reason why it is difficult to compare results across studies. Artificial samples filled with smooth, fine-grained sand produce different erosion than natural soils with higher surface roughness and particle cohesion. This is, as noted above, due to the smaller size of individual particles, but is also due to microrelief variation. The effect of surface roughness on splash erosion is not a straightforward process [49]. Some authors found decreasing erosion with increasing roughness [50], and some the opposite trend [49]. The changes in surface microrelief in one sample (applied kinetic energy of 1150 J m−<sup>2</sup> h<sup>−</sup>1, recorded average soil surface consolidation of 0.75 mm) is shown in Figure 6. The soil surface consolidated due to rainfall and splash erosion. Figure 7 shows the linear relationship between rainfall kinetic energy and soil consolidation even though the measured soil settling is very heterogeneous and the coefficient of determination is low.

**Figure 6.** Soil sample surface before and after simulated rainfall. The lower photos represent digital surface models (DSM) in which green areas have higher elevation than the red areas. The soil surface consolidated in the average by 0.75 mm.

**Figure 7.** Relationship between rainfall kinetic energy and soil surface settling.

Analysis of the splashed material particle-size distribution (PSD) did not show a significant relationship between rainfall kinetic energy (KE) and detached sediment texture (Figure 8). The Pearson's correlation coefficient of KE versus clay content was 0.36 (*p*-value 0.39, for a = 0.05), KE versus silt −0.22 (*p*-value 0.61) and KE versus sand −0.24 (*p*-value 0.56). The PSD of the detached sediment is not significantly different from the texture of the original soil sample (see the horizontal dashed lines on Figure 7). Therefore, all particle fractions are detached uniformly with no preference toward fine or coarse fractions, no matter the kinetic energy applied. It is important to note that the splashed sediment is usually detached in the form of aggregates and therefore the aggregate size distribution should be evaluated. We have not done this analysis as we were not able to collect the undisturbed splashed soil aggregates. For example, Fu et al. [13] show that especially the fine particle and aggregate (<0.053 mm) ratios change with variable rainfall KE. The KE per mm of rainfall is the same for all the measured points shown in Figure 8, which is due to the design of the rainfall simulator. The results may change if different types of the rainfall simulators (with various drop size distribution or drop velocities) are applied.

**Figure 8.** Relationship between rainfall kinetic energy and textural classes (clay, silt, sand) of detached sediment. The dashed lines represent the texture of the Bykovice soil. No significant difference between the soil sample and splashed sediment was observed.

Fernández-Raga et al. [15] demonstrated that splash erosion estimation is strongly dependent on the splash collection setup. Poesen and Torri [3] found that the area of the splash cup is the main influencer defining the amount splashed and therefore coefficients for different splash cup areas should be used. Even the trendline between the amount splashed and the rainfall kinetic energy differs based on the methodology of collection. The experimental design proposed in this study, a modified version of Morgan's splash cup, proved to be reliable, practical and easy to handle. As the splash cups are compact, light, and robust, they can be easily mounted to any support mechanism either in the laboratory and in the terrain. Due to the materials used, the device is durable and can be used for several seasons under field conditions. The soil sample can be packed separately from the collection tray and fixed to its position just before the rainfall experiment.

#### **4. Conclusions**

A splash cup methodology was presented and used to analyze splash erosion of a silty loam agricultural topsoil with simulated rainfall across various kinetic energies.

The splash cup, which consists of commercially available components, proved to be a versatile and practical tool for the monitoring of splash erosion. The design follows the dimensions proposed by Morgan, therefore, the splash cup can be used for comparison with other studies in which Morgan's device was employed. Even though the soil particle detachment process is very sensitive to factors other than experimental design, standardization or harmonization of splash cup designs would be a beneficial step forward in the complex research area of the splash erosion process. Therefore, we provide a detailed description of the splash erosion setup, including technical drawings, assembly manual and description of sample preparation and collection on the website rain.fsv.cvut.cz/splashcup.

The results of the presented splash erosion experiment show similar results to previously published studies. The relationship between rainfall kinetic energy and splashed soil amount is linear and there is a kinetic energy threshold to initiate erosion. Even under controlled experimental conditions, when the soil samples were prepared the same way and rainfall characteristics remained constant during the experiment, the eroded soil amount varies across each replicate. This reinforces that splash erosion is a very complex process and the resulting erosion is sensitive to small changes in soil properties and soil surface relief which is problematic and remains an open question that needs to be studied further.

**Author Contributions:** Conceptualization, D.Z. and T.D.; Formal analysis, D.V.M., J.J., M.N., T.L., L.L.J. and N.Z.; Methodology, D.Z. and P.K.; Project administration, T.D. and A.K.; Supervision, A.K., P.S. and T.D.; Visualization, D.V.M., T.L. and J.J.; Writing—original draft, D.Z.; Writing—review & editing, All. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was performed within the project "Kinetic energy of rainfall as a driving force of soil detachment and transport". Financial support was provided through the Czech Science Foundation (GACR): GF17-33751L and the Austrian Science Fund (FWF): I 3049-N29.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Using Machine Learning-Based Algorithms to Analyze Erosion Rates of a Watershed in Northern Taiwan**

#### **Kieu Anh Nguyen 1, Walter Chen 1,\*, Bor-Shiun Lin <sup>2</sup> and Uma Seeboonruang 3,\***


Received: 14 February 2020; Accepted: 4 March 2020; Published: 6 March 2020

**Abstract:** This study continues a previous study with further analysis of watershed-scale erosion pin measurements. Three machine learning (ML) algorithms—Support Vector Machine (SVM), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Artificial Neural Network (ANN)—were used to analyze depth of erosion of a watershed (Shihmen reservoir) in northern Taiwan. In addition to three previously used statistical indexes (Mean Absolute Error, Root Mean Square of Error, and R-squared), Nash–Sutcliffe Efficiency (NSE) was calculated to compare the predictive performances of the three models. To see if there was a statistical difference between the three models, the Wilcoxon signed-rank test was used. The research utilized 14 environmental attributes as the input predictors of the ML algorithms. They are distance to river, distance to road, type of slope, sub-watershed, slope direction, elevation, slope class, rainfall, epoch, lithology, and the amount of organic content, clay, sand, and silt in the soil. Additionally, measurements of a total of 550 erosion pins installed on 55 slopes were used as the target variable of the model prediction. The dataset was divided into a training set (70%) and a testing set (30%) using the stratified random sampling with sub-watershed as the stratification variable. The results showed that the ANFIS model outperforms the other two algorithms in predicting the erosion rates of the study area. The average RMSE of the test data is 2.05 mm/yr for ANFIS, compared to 2.36 mm/yr and 2.61 mm/yr for ANN and SVM, respectively. Finally, the results of this study (ANN, ANFIS, and SVM) were compared with the previous study (Random Forest, Decision Tree, and multiple regression). It was found that Random Forest remains the best predictive model, and ANFIS is the second-best among the six ML algorithms.

**Keywords:** Erosion rate; ANFIS; ANN; SVM; Shihmen Reservoir watershed

#### **1. Background and Introduction**

Soil erosion is of major concern to agriculture and has had a detrimental long-term effect on both soil productivity and the sustainability of agriculture in particular. Soil erosion can lead to water pollution, increased flooding, and sedimentation, which damage the environment [1]. This has influenced the introduction of erosion control practices and policies as a necessity in almost every country of the world and under virtually every type of land use. Soil erosion causes both on-site and off-site consequences [2]. The on-site effects, such as soil losses from a field and depleted organic matter or nutrients of the soil, are particularly relevant on agricultural land. Off-site, downstream

sedimentation decreases the capacity of rivers and reservoirs, increasing the threat from flooding. Many hydroelectricity and irrigation projects have been ruined as a consequence of soil erosion [3].

In Taiwan, 74% of the land area is slopes. Moreover, the average annual rainfall amount is 2500 mm, mainly occurred from May to August. As a result of climate change, and the concentration of rainfall duration and increasing rainfall intensity, year-round water availability has become a critical issue in Taiwan. Most of the soil erosion in Taiwan has been the result of unsuitable agricultural behaviors and overuse of slope lands. For example, about 65% of the hillslope lands are for crop production [4]. Coupled with the abundant rainfall, severe soil erosion occurs. Brought into service in 1964, the Shihmen Reservoir ranks third among all reservoirs in the country in terms of storage capacity. However, typhoons cause serious sediment problems and soil erosion [5]. Therefore, a thorough evaluation of the causative factors (predictors) of soil erosion in the watershed is important to the people of Taiwan.

Lo [6] applied the Agricultural Non-Point Source Pollution (AGNPS) model, which simulates water erosion and the transport of sediments to predict sedimentation in the watershed of the Shihmen reservoir. The results showed that the depth of sedimentation in the watershed averages around 2.5 mm/yr. Soil erosion in the watershed was also evaluated using USLE on different Digital Elevation Models (DEMs), and the average annual soil erosion rate varied greatly depending on the DEM [7]. Chiu et al. [8] used 137Cs radionuclide collected from 60 hillslope sampling sites of the basin of the Shihmen reservoir to determine the soil erosion rate and found it to be one or two orders of magnitude lower than predicted by USLE. Recently, Liu et al. [9] improved the analysis of the Shihmen reservoir watershed using the slope unit method in addition to the common grid cell approach. The results were verified with erosion pin measurements.

Erosion pins are a conventional method of measuring soil erosion. Erosion pins have been employed to measure ground-lowering rates and compare erosion rates of rill and interrill areas, plots of different sizes, slope areas, and many other regions in need of study. For example, Sirvent et al. [10] used erosion pins to measure the ground lowering in two plots every six months. The result demonstrated that erosion rates in the rill areas were 25%–50% higher than those in the interrill areas. Edeso et al. [11] used 29 erosion pins to evaluate soil erosion of different plots in northern Spain and showed that soil erosion in all plots was increasing over time. In Taiwan, Lin et al. [12] concluded that the soil erosion depth measured by erosion pins sharply increased when the accumulated rainfall exceeded 200 mm in a rainfall event.

An Artificial Neural Network (ANN) is an artificial intelligence model that was used to process information and was inspired by the human brain [13]. An Adaptive Neuro-Fuzzy Inference System (ANFIS) is an artificial intelligence model that combines neural networks with fuzzy inference systems [14]. The unique structure of ANFIS incorporates the ability of fuzzy inference systems (FIS) to improve the precision of prediction. Finally, SVM (Support Vector Machine) is a machine learning (ML) method that is very suited to intricate classification problems. In recent years, these artificial intelligence techniques have been applied to many different fields [15]. For example, Quej et al. [16] used ANN, ANFIS, and SVM to predict the daily global solar radiation in the Yucatan Peninsula, Mexico. Model performance was evaluated by the Mean Absolute Error (MAE), Root Mean Squared of Error (RMSE), and R-squared (R2). The result indicated that SVM has a better performance than the other techniques. Zhou et al. [17] compared the accuracy of four models including ANN, SVM, WANN (Wavelet preprocessed ANN), and WSVM (Wavelet preprocessed SVM) to predict groundwater depth. The results showed that the models are ranked as follows: WSVM > WANN > SVM > ANN. Angelaki et al. [18] also used ANFIS, ANN, and SVM to predict the cumulative infiltration of soil. The results of model performance based on RMSE and Correlation Coefficient (CC) suggest that ANFIS works better than both SVM and ANN.

In the field of soil erosion modeling, there have been many studies carried out to estimate soil erosion using models such as USLE [9,19], SWAT [20,21], WEPP [22,23], WaTEM/SEDEM [24,25], and EUROSEM [26]. However, there has been no application of ML algorithms to estimate soil erosion except for in our previous work [27], in which we used Decision Tree (DT), Random Forest (RF), and Multiple Regression (MR) to create ML models to predict the soil erosion depths (rates). Therefore, this study aims to extend the investigation to the application of other ML techniques. Our main goal is to further our understanding of soil erosion rates in the study area.

#### **2. Dataset and Research Method**

The research design involves the use of site-specific data collected in the study area. They are described in the sections below.

#### *2.1. Area of Study*

The Shihmen reservoir dam is situated in the northern part of Taiwan on the banks of the Tahan River. Its watershed has an area of approximately 759.53 km2 (Figure 1), and elevation rises towards the south. Hills extend over most areas, and for more than 60% of the watershed, the slope is more than 55% [28]. The yearly average temperature is 19 °C and average humidity, 82%. The typical rainy season is from May to October and the dry season from November to April. The average annual precipitation is approximately 2500 mm/yr [28].

**Figure 1.** The study area is Northern Taiwan's Shihmen reservoir.

#### *2.2. Data Preparation*

We collected various data about the locations of the installed erosion pins as well as the measurements of the erosion pins themselves, as described in the following sections.

#### 2.2.1. Predictors

A total of 14 environmental factors were utilized as the predictors (independent factors or input variables) in the model, namely distance to river, distance to road, type of slope, sub-watershed, slope direction, elevation, slope class, rainfall, epoch, lithology, and the amount of organic content, clay, sand, and silt in the soil. These factors have been gathered from different sources and become a geospatial database, as described in Nguyen et al. [27].

#### 2.2.2. Target

The specification and installation of the erosion pins were described in Lin et al. [12]. Within the boundary of the Shihmen reservoir watershed, a total of 550 pins were installed on 55 slopes (10 pins per slope). The measurement data were collected from 8 September 2008 to 10 October 2011. The annual erosion depths were averaged at each slope, and the value ranges from 2.17 to 13.03 mm/yr. The metal rods (pins) used in this analysis were mounted on slopes without any signs of a landslide, collapse, or gully erosion. Therefore, our findings cannot be generalized beyond sheet and rill erosion.

#### *2.3. Model Configuration*

In this study, ML algorithms were used to predict the erosion rates of sheet erosion and rill erosion. The overall framework of the study consisted of three main parts (Figure 2), as summarized below.

**Figure 2.** The research framework of this study.

First, the entire dataset, which includes 14 predictors and one target, was divided into a training set (70% or 38 samples) and a test set (30% or 17 samples) as is commonly done in the literature [29–32]. This step was repeated three times (i.e., Grouping #1, Grouping #2, and Grouping #3) to reduce the data variability to sampling. Because stratified random sampling has been shown to produce better outcomes than the simple random sampling [27], only stratified random sampling was used in this study to ensure the proper representation of the population. In stratified random sampling, the dataset was divided into several strata, and each stratum was sampled proportionately (70/30). Sub-watershed

was selected as the stratification variable for this study, because it had been shown that the erosion depths in different watersheds were statistically different [33]. Hence, using stratified random sampling with sub-watershed as the stratification variable can provide a better estimate of the sample statistics and therefore create a better ML model.

Second, the ANN, ANFIS, and SVM models were built using the 70% training data, and the resulting models were tested using the test data.

Third, the performance metrics of R2, NSE (Nash–Sutcliffe Efficiency), RMSE, and MAE were calculated on the training and test data. The Wilcoxon signed-rank test was conducted to determine if a statistically significant difference exists between the three models. Finally, the results of ANN, ANFIS, and SVM were tabulated and compared with results from our previous research [27].

#### *2.4. Machine Learning Algorithms*

Three widely used and potentially applicable ML algorithms are used in this study. They are described below.

#### 2.4.1. Artificial Neural Network

An Artificial Neural Network (ANN) mimics how the human brain processes information. The purpose of the ANN model is to predict a target outcome by using input data through a back-propagation learning algorithm [34]. A typical ANN model has a multi-layer feed-forward structure that is connected by nodes with three main layers, namely the input layer, the hidden layer(s), and the output layer. The ANN determines the weight for each node and builds its results through training. In this study, the ANN model was created using the 'nntool' in the MATLAB 2016 software. A three layers feed-forward back-propagation network type was used. It consists of an input layer (14 neurons representing 14 environmental factors), one hidden layer (29 neurons), and one output layer (erosion rate), as shown in Figure 3. The number of neurons of the hidden layer is determined based on the following equation [35]:

$$N = 2\mathbf{x} + \mathbf{1} \tag{1}$$

where N is the number of hidden nodes, and x is the number of input nodes.

**Figure 3.** The model of ANN used in this study.

#### 2.4.2. Adaptive Neuro-Fuzzy Inference System

An Adaptive Neuro-Fuzzy Inference System (ANFIS) is a combination of ANN and fuzzy logic, which utilizes the strengths of both techniques. Jang [14] introduced the concept of ANFIS in 1993. ANFIS contains five layers connected by directed links. These five layers are the Fuzzification layer, the Product layer, the Normalized layer, the Defuzzification layer, and the Output layer. The main purpose of ANFIS is to define the optimum parameter values of an equivalent fuzzy inference system by applying a learning algorithm. ANFIS can be constructed using two different methods, namely Genfis 1 (grid partitioning) and Genfis 2 (subtractive clustering). Genfis 1 has a limitation of six input variables. Since we have 14 factors, we used Genfis 2 in our analysis instead of Genfis 1. In this research, the ANFIS model was constructed using the "anfisedit" tool in the MATLAB 2016 software following the steps of loading data, generating FIS (sub clustering), and testing.

#### 2.4.3. Support Vector Machine

A Support Vector Machine (SVM) is a supervised learning model developed by Schölkopf et al. [36] that can be used for regression and classification. The SVM model algorithm creates a line or a hyperplane that divides a dataset into two classes. The distance from the hyperplane to the nearest data points on both sides is defined as the margin. The purpose is to select a hyperplane with the greatest possible margin, thus giving a greater chance of new data being classified correctly into these two classes. In this study, the SVM model was implemented using custom codes and the 'fitrsvm' package on the MATLAB 2016 software.

#### *2.5. Evaluation Criteria of Model Performance*

Model evaluation is an indispensable part of developing a useful model. It supports the discovery and selection of a good model that can be used in the future. There are several statistical indices commonly used to estimate and calculate the performance and the validity of the ML algorithms. Here, the statistical parameters employed to measure the errors between the predicted and the observed values are R2, NSE, RMSE, and MAE [37–39].

The R2 value indicates the consistency with which the predicted values versus the measured values following a regression line [27]. It ranges from zero to one. If the value is equal to one, the predictive model is considered "perfect." The definition of R2 is as follows:

$$R^2 = 1 - \frac{SSE}{SST} = 1 - \frac{\sum \left(Y - Y\_1\right)^2}{\sum \left(Y - \overline{Y}\right)^2} \tag{2}$$

where SST represents the total sum of squares, SSE is the error sum of squares, Y is the prediction of the model, Y1 is the prediction of the regression line, and *Y* is the average of predicted values (Figure 4).

It is worth noting that although R<sup>2</sup> has been widely used for model evaluation, the statistics are highly sensitive to extreme values and are insensitive to additive and proportional differences between model and measured data [40]. More importantly, R2 is calculated against the regression line (Figure 4), not the 1:1 line. A high R2 only means a good fit to the regression line, not necessarily a set of good predictions concerning the observations (see the distinction made between the regression line and the 1:1 line in Figure 4). Therefore, we retain R<sup>2</sup> in this study only for completeness. Instead of relying on R2, we computed RMSE, MAE, and NSE to compare the model performance. They are more appropriate than R2.

RMSE and MAE are statistical parameters that are 'dimensioned.' They express the average errors of the model in the unit of the output variable (Equations 3 and 4). The RMSE is of particular importance, because it is one of the most commonly reported parameters in the climatic and environmental literature [41]. Smaller values of RMSE and MAE suggest nearer approximation of observed values

by the models. The RSME and MAE are widely used basic metrics for assessing the performance of predictive models [42]. They are defined as follows:

$$RMSE = \sqrt{\frac{\sum \left(Y - Y\_2\right)^2}{n}} \tag{3}$$

$$MAE = \frac{1}{n} \sum |Y - Y\_2| \tag{4}$$

where Y2 is the predicted value of the 1:1 line, and n is the number of samples.

**Figure 4.** Illustration of the symbols used in the Equations (2)–(5).

In addition to R2, RMSE, and MAE, which all have been used in our previous study [27], an additional parameter (NSE) is included in this study. As shown in Equation (5), NSE is a normalized statistical parameter that defines the relative magnitude of the residual variance compared to the measured data variance [43]. It shows how well the plot of the predicted values versus the observed values fits the 1:1 line. The value of NSE ranges from −∞ to one, with NSE = 1 being the optimal value. If the value of NSE is between zero and one, the model is said to have acceptable performance (the higher, the better). On the other hand, if the value of NSE is smaller than 0.0, it will indicate that the average of observed values is better than the predicted value, and the model performance is not acceptable [43].

$$NSE = 1 - \frac{\sum \left(Y - \mathbf{Y\_2}\right)^2}{\sum \left(X - \overline{X}\right)^2} \tag{5}$$

where X is the observed value and *X* is the average of observed values.

#### *2.6. Wilcoxon Signed-Rank Test*

In addition to using NSE, RMSE, and MAE to evaluate the effectiveness of the models, it is necessary to examine if the differences in NSE, RMSE, and MAE are statistically significant. In this study, we used the Wilcoxon signed-rank test to compare the errors generated by different predictive models because the Wilcoxon signed-rank test is non-parametric (distribution-free). The steps of the Wilcoxon signed-rank test are as follows: Let *Y* denote the observed value, *M*<sup>1</sup> denote the predicted value of model 1, and *M*<sup>2</sup> denote the predictive value of model 2. The absolute error of each prediction is measured (Equations (6) and (7)) by:

$$E\_1 = |\mathcal{Y} - M\_1|\tag{6}$$

$$E\_2 = |\mathbf{Y} - \mathbf{M\_2}|\tag{7}$$

To determine whether one model predicts more accurately than the others do, we perform the one-tailed hypothesis test. The null hypothesis (*H*0) is that "Two models have the same predictive error"(*E*<sup>1</sup> = *E*2). The alternative hypothesis (*Ha*) is that "The first model has a smaller error than the second model"(*E*<sup>1</sup> < *E*2)**.** In the Wilcoxon signed-rank test, we chose a significance level of 0.05. Then, the decision to whether to reject the null hypothesis or not is based on the resulting *p*-value. If the *p*-value is greater than 0.05, it will fail to reject the null hypothesis. Otherwise, if the *p*-value is less than 0.05, the null hypothesis will be rejected at a confidence level of 95%. In this case, the Wilcoxon signed-rank test is run using the R command wilcox.test.

#### **3. Results and Discussions**

We conducted all analyses using the dataset described in Section 2. Our findings are presented in the following sections.

#### *3.1. Evaluation of Predictive Models*

This study used quantitative criteria to assess the performance of predictive models. Table 1 shows the computed R2, NSE, RMSE, and MAE statistics for ANFIS, ANN, and SVM under three different stratified random samplings. Among the four parameters, R2 and NSE are statistics used to evaluate the predictive performance of the models under consideration. The higher the R2 and NSE, the better the models simulate the results. However, as shown in Figure 4, R<sup>2</sup> is a measure of the goodness-of-fit of the regression line, not the 1:1 line. Therefore, a high R2 does not necessarily mean a good performance of the model for predicting the soil erosion rate. In this regard, NSE is a much more suitable index to use because it is based on the differences between the predicted and observed values (1:1 line). Therefore, in the following, we will restrict our discussion to NSE only and ignore R2.

On the other hand, MAE and RMSE are evaluation metrics commonly used to gauge the performance of the models that output continuous numbers. Both RMSE and MAE produce average errors of the models in units of the model output variables. However, it is worth noting that RMSE and MSE could be calculated against the 1:1 line or the regression line [27], and different results could be obtained. In this study, we computed both RMSE and MSE based on the 1:1 line to reflect the differences between the predicted and observed values.

As can be seen from Table 1, the average NSE of the training data for the ANN, ANFIS, and SVM models is 0.62, 1.00, and 0.51, respectively. Theoretically, the value of NSE ranges from −∞ to one, and the higher the NSE, the better the model performs. Thus, it can be inferred from these numbers that ANFIS is the best model with a perfect NSE value of 1.00, and it outperforms the other two models substantially. However, the average NSE deteriorates significantly when the test data are used to judge the true performance of the models. They are 0.32, 0.49, and 0.17 for the ANN, ANFIS, and SVM models, respectively. The ANFIS model remains the best performing model, while the ANN model still beats the SVM. It is worth noting that a negative NSE was obtained for grouping #2 of SVM. This means that the model did not contribute to the improvement of the prediction, and the average of the observed values is better than the predicted value [44].

According to Table 1, there is also a significant difference in the RMSE values across the three models. The average RMSE of the training data for the ANN, ANFIS, and SVM models is 1.23, 0.01, and 1.43 mm/yr, respectively. Again, the ANFIS model outperforms the other two models by a substantial margin. The prediction of the ANFIS model was so accurate that there was almost no error in calculating the differences between the predicted and the observed values. However, the model performance falls rapidly when test data were used. The average RMSE of the test data for the

ANN, ANFIS, and SVM models increases to 2.36 mm/yr, 2.05 mm/yr, and 2.61 mm/yr, respectively. The disparity in model performance between the training data and the test data indicates that an over-fitting problem has occurred in the training of the ANFIS model. The other two models do not show signs of overtraining, but their predictive performances are inferior to that of the ANFIS model.

Lastly, if we compare the MAE results of the three models, a largely similar conclusion to the observation made above will be reached. The average MAE of the training data for the ANN, ANFIS, and SVM models is 0.75 mm/yr, 0.01 mm/yr, and 0.99 mm/yr, respectively. The ANFIS model performs much better than the ANN and SVM models. Moreover, the average MAE of the test data for the ANN, ANFIS, and SVM models increases to 1.85, 1.67, and 2.14 mm/yr, respectively. The errors of the test data are bigger than the errors in the training data. This again shows that the ANFIS model has been over-fitted. Overall, a general picture emerging from the analysis of NSE, RMSE, and MAE is that ANFIS is the best model, followed by ANN and SVM. They are ranked as follows, from best to worst: ANFIS, ANN, SVM.


**Table 1.** The performance metrics of the ANFIS, ANN, and SVM models.

#### *3.2. Visual Comparison of Models*

The data in Table 1 provide convincing evidence that ANFIS is the best performing model. We further plotted the predicted values versus the observed values with the regression line and the 1:1 line in the scatter plots of Figure 5. The left-side figures show the training datasets, while the right-side figures are the test datasets. Three different sampling results (groupings) are presented in Figure 5. The first row is grouping #1 (5a and 5b). The second row is grouping #2 (5c and 5d). Finally, the bottom-most row is grouping #3 (5e and 5f). Figure 5 shows that during the training stage, ANFIS successfully tunes its parameters to minimize the errors associated with its predictions. Therefore, the regression line of ANFIS (red) almost coincides with the 1:1 line, and it resulted in an average NSE of 1.00. Coincidentally, the regression line of ANN (blue) is also very close to the 1:1 line in all three cases. However, the data points (blue) are much more scattered around the 1:1 line than those of ANFIS (red) are. Therefore, ANN has a much lower average NSE of 0.62.

**Figure 5.** X-Y plots of predicted values vs. observed values: (**a**) Grouping #1—Training data; (**b**) Grouping #1—Test data; (**c**) Grouping #2—Training data; (**d**) Grouping #2—Test data; (**e**) Grouping #3—Training data; (**f**) Grouping #3—Test data.

As for the test data on the right-hand side of Figure 5, all three models showed substantially more scatter than that of the training data, which indicates substantially bigger errors between the predictions and the observations. The regression line of ANFIS is no longer the closest to the 1:1 line. However, the ANFIS model still has the highest average NSE (0.49) and the lowest average RMSE (2.05 mm/yr). It appears that ANFIS gained its performance advantage due to its hybrid learning approach of combining the ANN and the Fuzzy Inference System.

#### *3.3. Results of Wilcoxon Signed-Rank Test*

This study was undertaken to determine the best ML model using NSE, RMSE, and MAE. To determine if the differences between different model predictions were statistically significant, we used the Wilcoxon signed-rank test to compare the errors between different models. With the absolute error of each model being EANFIS, EANN, and ESVM, respectively, the results of the Wilcoxon signed-rank test on the training data for each combination of the three models are summarized in Table 2. The results of the test data were shown in Table 3.

It can be seen from Table 2 that for the training data, the *p*-values between ANFIS and ANN are all very small and less than the threshold value of 0.05. The same can be said for the *p*-values between ANFIS and SVM. Taken together, the data presented here provide evidence that ANFIS was a statistically better model than both ANN and SVM when the models were trained with the training data. By contrast, the *p*-values between ANN and SVM are not small enough to reject the null hypothesis. Therefore, it is inconclusive whether a statistical difference exists in the predictive performance between the two models.

In terms of the test data, however, no statistically significant difference exists between the three models. As shown in Table 3, the *p*-values are all higher than the threshold value of 0.05. Therefore, we cannot reject the null hypothesis in favor of the alternative hypothesis.


**Table 2.** The results of the Wilcoxon signed-rank test (training data).


**Table 3.** The results of the Wilcoxon signed-rank test (test data).

#### *3.4. Comparison with Other ML Algorithms*

In our previous research, DT, RF, and MR were used to predict the depths of erosion at the Shihmen reservoir watershed study site [27]. The result showed that RF outperforms other ML algorithms. In this study, we further compared the performance of ANN, ANFIS, and SVM to predict soil erosion depths (rates). The results of the above six models are shown together in Figure 6.

**Figure 6.** Radar plots of six ML models: (**a**) Training data; (**b**) Test data.

Figure 6 shows the comparison between six ML models in terms of the average RMSE (green points) and the average MAE (red points). The sub-Figure 6(a) is based on the training data, and 6(b) is based on the test data. If the points are closer to the center of the radar web, the model will have a better predictive performance. It can be seen from Figure 6a that in terms of RMSE starting from the best to the worst, the six models can be ranked as follows: ANFIS (0.01 mm/yr) < RF (0.93 mm/yr) < ANN (1.23 mm/yr) < MR (1.25 mm/yr) < SVM (1.43 mm/yr) < DT (1.73 mm/yr). It is obvious that ANFIS is the best model for training data. However, because models can perform very well during training but perform poorly in the testing against new data (over-fitted), it is the results of test data that distinguish a good model from a poor one. To evaluate model performance, we should focus on the metrics of test data. It can be seen from Figure 6 that although ANFIS is the best model among the three ML models compared in this study, it is still not quite as good as RF in the previous study. If we rank the six ML models again using the average RMSE of the test data, we will obtain a different rank (starting from the best to the worst): RF (1.75 mm/yr) < ANFIS (2.05 mm/yr) < ANN (2.36 mm/yr) < DT (2.45 mm/yr) < SVM (2.61 mm/yr) < MR (3.47 mm/yr). In other words, RF replaces ANFIS and becomes the favored choice. It is also possible to draw similar conclusions from the MAE values, also in Figure 6.

The results of test data are critical and are emphasized because evaluating the predictive performance of models by training data could lead to over-optimistic and overfitting models. As a result, although ANFIS performs better in training, RF is still considered the best model for predicting the soil erosion rate in the study area. Hannan et al. [45] and Barenboim et al. [46] also compared the performance of RF and ANFIS in their studies. Both studies indicated that the ANFIS and RF models were effective; however, Hannan et al. [45] preferred RF to ANFIS and ANN, and Barenboim et al. [46] recommended both RF and ANFIS.

Figure 7 shows the interpolated distribution of erosion rates (mm/yr) in the study watershed using the Inverse Distance Weighting (IDW) method. Figure 7a was obtained from erosion pin measurements, whereas Figure 7b,c were predicted by ANFIS and RF, respectively. It can be seen from the figures that the same distribution pattern is observed. The erosion rate is the highest on the east side of the study area, the lowest on the west side, and the north and south sides have an in-between erosion rate.

**Figure 7.** Distribution of erosion rates in the study area: (**a**) Observed; (**b**) Predicted by ANFIS; (**c**) Predicted by RF.

#### **4. Conclusions**

In this study, the ANN, ANFIS, and SVM algorithms were used to create predictive models of soil erosion rates in the study area of the Shihmen reservoir. The soil erosion rates were measured by 550 erosion pins installed on 55 slopes, and the results of the measurements reflect the sheet and rill erosion that took place within the study area. After dividing the dataset by a 70/30 ratio into training and test datasets using stratified random sampling, ANN, ANFIS, and SVM were used to generate respective models based on the 14 types of factors included in the training data. Then the models were applied to the test data, and the discrepancies from the real measurements were evaluated by R2, NSE, RMSE, and MAE.

Without making an ex-ante choice of soil erosion model, the ex-post outcomes of ML models were quite satisfactory. The average RMSE of the training data ranges from mere 0.01 to 1.43 mm/yr. Among the three models, the performance of ANFIS is considerably higher than those of ANN and SVM, as indicated by its RMSE of 0.01 mm/yr. However, the performance of all three models degraded when they are applied to the test data. Results showed that the average RMSE of the test data varies from 2.05 to 2.61 mm/yr, with ANFIS still the best among the three models. To examine if the difference in prediction is statistically significant, the Wilcoxon signed-rank test was used to conduct pairwise comparisons of the three models. The results indicate that the ANFIS model is better than both the ANN and SVM models for the training data. However, no statistically significant difference exists between the three models when the models are applied to test data.

Moreover, the advantage of ANFIS disappeared when it was compared with the ML models (DT, RF, and MR) developed in our previous study. Although the average RMSE of ANFIS on training data is still unmatched, the average RMSE of ANFIS on test data was worse than that of RF. This shows that ANFIS may have been over-trained, and RF is still considered the best model for predicting the soil erosion rate in the study area.

In this and previous studies, we have made a substantial effort and progress in applying ML algorithms to the prediction of soil erosion rates without resorting to any soil erosion models. Although the effort was made, there is still no shortage of ML algorithms that promise better results than what has been obtained to date. It remains to be seen if ML algorithms are truly viable alternatives to traditional soil erosion models. Future research will have to address this issue in more detail.

Finally, because of the easy installation and wide availability of erosion pins, we believe that the approach presented here is generally applicable to other regions of the world. It would be desirable to obtain such measurements and carry out similar analyses for comparison.

**Author Contributions:** Conceptualization, W.C.; Data curation, K.A.N. and B.-S.L.; Formal analysis, K.A.N.; Funding acquisition, W.C.; Investigation, B.-S.L.; Methodology, W.C.; Project administration, W.C.; Software, K.A.N.; Supervision, W.C. and U.S.; Writing—original draft, K.A.N.; Writing—review and editing, W.C., B.-S.L. and U.S. All authors have read and agreed to the published version of the manuscript

**Funding:** This study was partially supported by the National Taipei University of Technology-King Mongkut's Institute of Technology Ladkrabang Joint Research Program (grant number NTUT-KMITL-108-01) and the Ministry of Science and Technology (Taiwan) Research Project (grant number MOST 108-2621-M-027-001).

**Acknowledgments:** We thank Kent Thomas for proofreading an early draft of this manuscript. We also thank the anonymous reviewers for their careful reading of our manuscript and insightful suggestions to improve the paper.

**Conflicts of Interest:** The authors declare no conflict of interests.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Evaluation of the SEdiment Delivery Distributed (SEDD) Model in the Shihmen Reservoir Watershed**

#### **Kent Thomas 1, Walter Chen 1,\*, Bor-Shiun Lin <sup>2</sup> and Uma Seeboonruang 3,\***


Received: 3 June 2020; Accepted: 30 July 2020; Published: 2 August 2020

**Abstract:** The sediment delivery ratio (SDR) connects the weight of sediments eroded and transported from slopes of a watershed to the weight that eventually enters streams and rivers ending at the watershed outlet. For watershed management agencies, the estimation of annual sediment yield (SY) and the sediment delivery has been a top priority due to the influence that sedimentation has on the holding capacity of reservoirs and the annual economic cost of sediment-related disasters. This study establishes the SEdiment Delivery Distributed (SEDD) model for the Shihmen Reservoir watershed using watershed-wide SDRw and determines the geospatial distribution of individual SDRi and SY in its sub-watersheds. Furthermore, this research considers the statistical and geospatial distribution of SDRi across the two discretizations of sub-watersheds in the study area. It shows the probability density function (PDF) of the SDRi. The watershed-specific coefficient (β) of SDRi is 0.00515 for the Shihmen Reservoir watershed using the recursive method. The SY mean of the entire watershed was determined to be 42.08 t/ha/year. Moreover, maps of the mean SY by 25 and 93 sub-watersheds were proposed for watershed prioritization for future research and remedial works. The outcomes of this study can ameliorate future watershed remediation planning and sediment control by the implementation of geospatial SDRw/SDRi and the inclusion of the sub-watershed prioritization in decision-making. Finally, it is essential to note that the sediment yield modeling can be improved by increased on-site validation and the use of aerial photogrammetry to deliver more updated data to better understand the field situations.

**Keywords:** sediment delivery distributed model; sediment yield; SEDD; sediment delivery ratio; β coefficient; Shihmen Reservoir watershed

#### **1. Introduction**

Land degradation has been confirmed as a threat to agricultural productivity worldwide [1]. The gradual dissipation of quality topsoil transported as sediments along hill slopes and deposited into rivers and reservoirs has impacted multiple agricultural areas across the world. Sediment delivery by soil erosion and landslides has also led to the decreased longevity and failure of reservoirs and other water infrastructure. Moreover, due to the importance of reservoirs and dams to the water security of large populations in the developed world, the impact of sedimentation has been highlighted as a major issue in need of research worldwide over the past half a century and into the future. These processes influence many of the United Nations' (UN) Sustainable Development Goals (SDGs) [1–3]. Heavy sustained sedimentation leads to a severe influence on industries and economic growth for

agriculture-based communities (SDG 8, 9, 10), the sustainability of cities and communities (SDG 11), and the availability of clean water downstream (SDG 6). These processes also pollute rivers, streams, and the sea by the transport of chemicals with sediments damaging marine flora and fauna and influencing the sustainability of marine environments (SDG 14).

Southeast Asia presents an even more advanced dilemma as the environmental and geomorphological characteristics across the region displays a strong influence on the rate of soil erosion, and the region's population and economies are already suffering from the impact [1]. After Typhoon Gloria impacted Taiwan, the Shihmen reservoir watershed was one of the many watersheds heavily inundated by sediments from soil erosion and landslides. This event was an example of typhoon events affecting Taiwan due to its tropical monsoon climate. The impact of these events creates frequent and high magnitude sediment disasters and long-term sediment inundation. The effect of extreme meteorological events on Shihmen and other reservoirs and the threat to urban and agricultural water supply quality throughout the nation has encouraged the Government of Taiwan to facilitate soil conservation research and countermeasures within the watershed through the introduction of soil conservation policies and programs. Taiwan's geographical location places it in a precarious position that geomorphologically has a high frequency of highly erodible soil materials and, additionally, its tropical monsoon climate brings frequent extreme rainfall or typhoon events which have led to massive sediment loads, dissociated by mass movements, and soil erosion, to be transported from slopes of watersheds into rivers and streams, affecting water supply in many communities. These events increase the siltation of reservoirs, such as the Shihmen Reservoir in Northern Taiwan, which diminishes the country's capacity to sustainably manage its water supply and poses a threat to the long-term sustainability of the dam and the populations it serves [4,5].

Therefore, sediment modeling and monitoring in Taiwan is of paramount importance to improve watershed conservation and sustainable water management by understanding how sediments are dissociated, translocated, and transported down the slopes of watersheds, entering streams as massive sediment loads. These sediments deposit along the waterway with detrimental impacts to river flow and along slopes, increasing future sediment disaster hazards and clogging reservoirs and dams, affecting water supplies to nearby populated districts and farms.

#### *1.1. Sediment Delivery Ratio (SDR)*

Researchers have established that the sediment yield (SY) and siltation found in rivers and dams are not equivalent to the gross sediments that have dissociated from the slopes through land degradation processes. The sediment delivery ratio (SDR) was established as the relationship between the SY and the gross erosion (A, in some research denoted as Soil Loss, SL). The SDR acts as a reducer to equate the maximum amount of soil erosion to the sediments delivered to the outlet. SDR is a continuous variable with a range from zero to one. In the initial stages of soil erosion research, SDR was evaluated for an entire watershed and a single outlet, such as a reservoir, denoted as SDRw. However, as computational power has improved and research methods have become more developed, SDR is discretized to sub-watersheds, micro watersheds, and even more minute hydrological units within a watershed. This has been denoted as a geospatial SDRi and is defined as the fraction of the gross soil loss (SL) from the hydrological unit, i, that reaches a flow pathway [6].

Many methods for determining the SDR have been developed over the past six decades. Long-term monitoring of SY in test plots and different regions were the starting point in SDR research and established multiple geophysical and hydrological factors that affect SDR. Wu et al. [7] stated that there are three prominent methods of estimating SDR, which are: (1) the ratio between soil erosion and sediment yield, (2) empirical formula relationships between SDR and its contributing factors, and (3) spatial modeling and numerical modeling using hydraulic and hydrological theories. These include models for the calculation and classification of SDR based on regional and local observed data.

Li [8] determined from the comparison of the USLE results and the outlet sediments at the Shihmen Reservoir that the SDR was 0.49 from the annual siltation amount of 71.2 t/ha/year (between 2011 and 2015), which includes sheet and rill erosion, gully erosion, and landslides. There have been other studies of the SDR in the Shihmen Reservoir watershed but these have mainly focused on the post-landslide erosion SDR or event-based SDR. Tsai et al. [9] defined the SDR of the landslide areas in the watershed from 0.42 to 0.78 for eroded areas and zero for areas where aggradation occurred. Moreover, Tsai et al. [10] employed the model of Bathurst et al. [11] of the sediment delivery of landslides. They determined that the sediment delivery of landslides in the Shihmen Reservoir watershed ranged from 0.75 to 0.89 (1986–1998), 0.48 to 0.64 (1998–2004), and 0.40 to 0.58 for typhoon Aere (2004). Lin et al. [12] explored the impact of different factors (such as curve number and Manning's) on the SDR in the Shihmen Reservoir watershed.

SWCB [13] previously employed DEMs of difference (DoD) method to determine a correlation between SDR and the entire watershed (SDRw) in the Shihmen Reservoir watershed and concluded the relationship to be *SDR* = 41.23*W*−0.017 *Area* where WArea is the watershed area, and the SDRw was 0.368. Chen et al. [14] used the Grid-based Sediment Production and Transport Model (GSPTM) to determine SDRw to be 0.299 (rounded to 0.30) for a simulation of the impact of the Typhoon Morakot on the Yufeng and Xiuluan sub-watersheds, and Chiu et al. [15] extended the GSPTM to identify unstable stream reaches. Additionally, Liu [16] determined that the total soil loss (also commonly-termed gross soil erosion or the total amount of soil erosion) was 90.6 t/ha/year by converting measured average erosion depth of erosion pins [17,18] to the amount of soil erosion. This study emphasizes the SDR of recent years and assumes SDRw = 0.49.

#### *1.2. Spatial Discretization*

This research compares the implementation of the SEDD with grid–cell unit analysis across the watershed and two discretization levels of sub-watersheds (25 and 93 SW). Spatial discretization of watersheds in hydrological models has been a critical issue for many researchers within the remote sensing, physical modeling, hydrology, and mathematical computations fields [19]. Wood et al. [20] elaborated that watersheds are areas made of infinite minute points of related hydrological processes (for example, evaporation, infiltration, and runoff). Watersheds can be further divided into sub-watersheds or sub-basins that are continuous assemblages or spatial objects that drain towards a specific point, contains similar vegetation layers and/or linked slope, or elevation characteristics, and are commonly used in the study of landslides, soil erosion, hydrology, and other natural processes and are specific areas within a watershed. There are many methods of defining watersheds, sub-watersheds, and even more minute forms of discretization, such as hydrological units depending on the characteristics of the areas [19–22].

Contextually, at the time of the development of many soil erosion models and concepts from the 1970s to the 1990s, computational cost, time, and power were limitations that hindered the discretization of watersheds to more elements [19]. Therefore, many older models were developed with sub-watershed as the smallest hydrological unit. This has continued even to the 2000s with the SWAT (Soil and Water Assessment Tool) model, and research has supported the object area discretization in soil erosion, LULC (Land Use and Land Cover), and sedimentation models and has criticized the use of areal discretization. SWAT is a comprehensive soil erosion model that is popular in the field of hydrology. In this study, we utilized the first stage in the SWAT analysis, the discretization of a study area into sub-basins/sub-watersheds, to develop 25-subwatershed divisions. The two inputs used in this analysis was the surface topography and the land use type [23]. Previously in the Shihmen Reservoir watershed, grid cells and slope units have also been employed in soil erosion research [16].

Researchers implementing the SEdiment Distributed Delivery (SEDD) model have mainly focused on the hydrological or morphological unit [6,24–29] but Di Stefano and Ferro [30] utilized grid cells in the SEDD model.

#### *1.3. Study Objectives*

Geomorphological changes, climate change, and land degradation processes are highly influential to soil erosion studies. Therefore, it is important to highlight the time scale. The prioritization of watersheds or sub-watersheds for soil erosion, landslide, and conservation works has been a long-standing topic in geomorphology, hydrology, disaster management, and environmental planning research. In the past, the gross soil loss (often derived by the USLE model) and its contributing factors have been the primary focus for prioritization of soil conservation planning. This study explored the sediment yield value to the prioritization of sub-watersheds for remediation and conservation.

This research incorporated and considered the implementation of the SEDD model in the Shihmen Reservoir watershed and the research objectives, methods, and assumptions are as follows:


This study is a theoretical and statistical analysis of the SDR distribution based upon the SEDD model using real data from the Shihmen Reservoir watershed.

#### **2. Study Area**

The Shihmen Reservoir watershed (elevation map shown in Figure 1a) is a 759.53 km<sup>2</sup> watershed found in Northern Taiwan between latitudes 24.426◦ N and 24.861◦ N, and longitudes 121.192◦ E and 121.479◦ E. Local authorities use the reservoir to regulate 23% of the water supply for the nearby communities, industries, and agriculture. The effective storage volume of the reservoir is 207 million m3. Still, it has been affected by heavy sedimentation, and some of the additional check dams established to diminish sedimentation have been affected or damaged in past years. The established subdivision of the watershed by local government is into five sub-watersheds, namely Ku-Chu, Yu-Feng, San-Kuang, Pai-Shih, Tai-Kang sub-watersheds. However, in this study, the entire watershed is delineated into 25 sub-watersheds (Figure 1b) and 93 sub-watersheds (Figure 1c), as explained in Section 3.1. The watershed is traversed by the Tahan River and its irregular pattern of tributaries that spread across the steep, high slopes of the sub-watersheds (angle 20◦ to 85◦ with an elevation between 220 m and 3527 m). The subtropical monsoon climate and the annual rainfall between 2200 mm and 2800 mm in the watershed encourages a densely vegetated state, which is mostly natural and artificial coniferous and broadleaved trees, and the trend in rainfall pattern has increased over the past decades to create concerns of soil erosion (Figure 1d) [9,31].

**Figure 1.** Maps of the Shihmen Reservoir watershed: (**a**) elevation, (**b**) 25 sub-watersheds, and (**c**) 93 sub-watersheds. (**d**) An illustrative picture of soil erosion.

#### **3. Methodology**

The SEDD model developed by Ferro and Minacapilli [24] analyzes the spatially distributed SY by determining the parameters of SDR and SL found using any soil loss model. The SDR is found by a relationship between the travel time (*t*p,i) and a watershed-specific coefficient, β. β cannot be determined analytically but is determined iteratively, by balancing the relationship between a known SY, SL, and SDR. The SY is evaluated for the entire watershed and the two discretizations, 25 and 93 sub-watershed models, to compare the impact of aggregation on the model outputs.

The flow chart of the SEDD model implemented in this study is distributed among four stages as shown in Figure 2:

Stage 1 (1.1):


Stage 2:


Back to stage 1.2:

6. Determine β from the SDR, SL, and the travel time found.

Stage 3:

7. Return to the DEM and discretize the study area into sub-watersheds/hydrological units or any other applicable unit of analysis.

Stage 4:


**Figure 2.** Flow chart of the SEDD model in this study.

SDR evaluation for SEDD modeling requires the development of three (3) main parameters: the slope, hydraulic length, and the β coefficient. The "flow distance" module of the ArcGIS software was employed to develop the horizontal flow paths and required the use of the "flow accumulation" and "flow direction" modules. The slope map and other component datasets of the USLE/SEDD model were developed in the ArcGIS platform.

This study relied upon the R programming language. Specifically, the "raster," "e1071," and "LaplacesDemon" libraries were employed for statistical analysis, conversion of data from raster to point datasets, and other analysis, while the "ggplot2" libraries of R were employed for data visualization [32–35].

The USLE model in this study was developed in the ArcGIS model builder framework updating the model originally developed by Jhan [36], Yang [37], Li [8], and Liu [38].

The USLE model was developed by Wischmeier and Smith [39,40] and is designed to predict the average annual SL by sheet and rill erosion. It has been continuously improved upon and localized by many researchers over the past decades. The USLE model can be denoted as follows:

$$A\_m = R\_m K\_m LSCP \tag{1}$$

where *Am* is the computed soil loss per unit area (t/ha/year), *Rm* is the rainfall and runoff factor (MJ-mm ha−<sup>1</sup> h−<sup>1</sup> year<sup>−</sup>1), *Km* is the soil erodibility factor (t-h MJ−<sup>1</sup> mm<sup>−</sup>1), *L* is the slope-length factor (dimensionless), *S* is the slope-steepness factor (dimensionless), *C* is the cover and management factor, and *P* is the support practice factor.

#### *3.1. Sediment Distributed Delivery Model (SEDD)*

The SEDD utilizes the calculation of the flow length for SDR developed by Ferro and Minacapilli [24], which includes considerations for slope and the roughness/runoff coefficient. SY, the sediment yield by the annual scale, for a basin is discretized into *Ni* morphological units and is measured in tonnes (*t*):

$$\sum\_{i=1}^{N\_i} SY\_i = \sum\_{i=1}^{N\_i} A\_{\text{mi}} SDR\_i SI\_i \tag{2}$$

where *Ami* is the computed soil loss per unit area (t/ha/year) for the morphological unit *i*, *SUi* is the area of the morphological unit *i* (ha), and *Ni* is the number of morphological units over the watershed area (dimensionless).

Ferro and Minacapilli [24] concluded that the *SDRw* can be physically defined using watershed morphological data without the consideration of the soil loss model. *SDRi* in Equation (2) and the relationship between the *SDRw* and *SDRi* was defined as follows (Equations (3) and (4)):

$$SDR\_i = \varepsilon^{-\beta t\_{p,i}} = \exp\left[-\beta \frac{l\_{p,i}}{\sqrt{\xi\_{p,i}}}\right] = \exp\left[-\beta \sum\_{j=1}^{N\_p} \frac{\lambda\_{i,j}}{\sqrt{\xi\_{i,j}}}\right] \tag{3}$$

$$SDR\_w = \frac{\sum\_{i=1}^{N\_i} \exp\left[-\beta \frac{l\_{p,i}}{\sqrt{S}\_{p,i}}\right] A\_{mi} SLI\_i}{\sum\_{i=1}^{N\_i} A\_{mi} SLI\_i} \tag{4}$$

where *tp*,*<sup>i</sup>* is the travel time from *i*th morphological unit to the nearest stream reach (m), β is the roughness and runoff coefficient for the watershed (m<sup>−</sup>1), *sp*,*<sup>i</sup>* is the slope of the hydraulic flow path (m/m), *lp*,*<sup>i</sup>* is the length of the hydraulic flow path (m), λ*i*,*<sup>j</sup>* and *Si*,*<sup>j</sup>* are the length and slope of each morphological unit *i* localized along the hydraulic path *j*, and *Np* is the number of morphological units localized along the hydraulic path *j* [6,24,26–29].

In this study, the parent watershed (the Shihmen Reservoir watershed) was discretized into two discrete sets of children sub-watersheds, specifically 25 and 93 sub-watersheds. The 25 sub-watersheds were delineated using the SWAT model with the input of the DEM and river course of the Shihmen Reservoir watershed. The output was 25 hydrologically connected sub-watersheds, as shown in Figure 1b. The 93 sub-watersheds were delineated using the ArcGIS Watershed module utilizing an input of the flow direction and a stream map derived from flow accumulation. The output was 93 hydrologically connected sub-watersheds, as shown in Figure 1c.

The SEDD model was implemented as 10 m grid data, with over seven million data points and aggregated into sub-watersheds. This study derived the grid-based SEDD model and aggregated the dataset into sub-watersheds to examine the impact this can have on the output SY data. A grid–cell-based analysis is more computationally expensive, and larger watersheds can be quite time consuming or hardware-expensive but may provide alternative results. On the other hand, the hydrological unit analysis is less time consuming to produce, has a lower computational cost, and can garner results that are aggregated by specific areas. However, there can be discrepancies due to oversimplification, some vast.

#### *3.2. Sediment Yield Determination*

Flow direction (FD) algorithms estimate the flow of a specified material from a source cell (S1) continuously to the next neighboring cells (N1) until a stopping point or limitation point (river or stream in Figure 3) has been reached, such as a stream, river, or dam. The SDR model considers the hydraulic flow of sediments through each morphological/hydrological unit to the trunk river or its tributaries (in this scenario, the Tahan river, and its tributaries) using the single flow direction algorithm, deterministic D8. The distance traveled by particles has been termed the "flow path length" and has varying definitions and methods of calculation. The flow distance module in ArcGIS was used to define the horizontal flow distance to the river using the D8 flow direction.

**Figure 3.** A flow path showing the single flow D8 algorithm where S = source and N = Node (re-drawn from [41]).

The relative time taken to reach the sink from the source has been termed the "travel time." Moreover, Jain and Kothyari [27] state that the travel time from a cell *i* is equivalent to the total travel time through each of the *Np* cells along its flow path until terminating at a channel. The flow path length and the slope in conjunction with a watershed-wide β coefficient are used to derive, firstly, the travel time and, secondly, the SDR within the SEDD model.

The SY was first derived using a watershed-wide grid of the USLE and SDRi, and then compared against the discretization of the 25 sub-watersheds model and the 93 sub-watersheds model using the mean, median, and mode for each sub-watershed.

#### **4. Results**

The results of SEDD modeling in the Shihmen Reservoir watershed and its sub-watersheds are shown in the following sections.

#### *4.1. The* β *Coe*ffi*cient*

Using GIS, we could calculate the flow lengths, slope, and travel time needed in Equations (3) and (4) for the Shihmen Reservoir watershed. At the same time, soil loss could be computed by USLE, and the SDRw is known to be 0.49 previously determined from the annual siltation of the reservoir [8]. Therefore, the only unknown is Equation (4) was the β coefficient. By assuming an initial value of β, we can solve for the β coefficient iteratively to satisfy the Equation (4). We found the β coefficient to be 0.00515. This is the first result of applying the SEDD model to the Shihmen Reservoir watershed using the iterative method, and a number like this has never been obtained before. The determined β coefficient is watershed specific.

#### *4.2. Grid-Based Travel Time, SDR, SL, and SY*

The following Figures 4–7 depict the travel time, sediment delivery ratio (SDR), soil loss (SL), and sediment yield (SY) determined using the SEDD model.

The probability density functions (PDF) of the grid–cell-based SDR, SL, and SY of the Shihmen Reservoir watershed have asymmetric, right-skewed distributions (which will be shown later in Section 4.3 with two representative sub-watersheds). This study determined in the Shihmen Reservoir watershed for SDR: the mean value of the distribution was 0.474 (not the same as SDRw), the median was 0.459, and the mode was 0.317. For the SL of the entire watershed, the mean was estimated as 87.07 t/ha/year, and the median was 71.74 t/ha/year. Lastly, the mean SY for the entire watershed was 42.08 t/ha/year, and the median was 29.80 t/ha/year.

**Figure 4.** Travel Time of the Shihmen Reservoir watershed.

**Figure 5.** Sediment Delivery Ratio of the Shihmen Reservoir watershed.

**Figure 6.** Soil Erosion/Soil Loss Distribution of the Shihmen Reservoir watershed.

**Figure 7.** Sediment Yield of the Shihmen Reservoir watershed.

#### *4.3. Comparison by Sub-Watershed Discretization on SDR, SL, and SY*

This study considered the distribution of SDR, SL, and SY in the Shihmen Reservoir watershed discretized into 93 sub-watersheds. From Figure 8, it is evident that both the median and the mean of SDR by 93 SW had very similar distributions with a central tendency centered around 0.500.

**Figure 8.** Histograms of the distributions of means and medians: (**a**) sediment delivery ratio (SDR) and (**b**) soil loss of the Shihmen Reservoir sub-watersheds.

The range of mean SL by sub-watershed was between 15.60 t/ha/year (SW #8) and 151.21 t/ha/year (SW #45) and the range of the median of SL was 1.15 t/ha/year (SW #5) and 136.36 t/ha/year (SW #69). The evaluation of the median of the SL for 93 SW showed that the median tended to be between 60 and 90 t/ha/year, and over 20 sub-watersheds had a median between 60–70 t/ha/year. In contrast, the mean SL by sub-watersheds were more widely spread and had a number of peaks between 60 and 120 t/ha/year with over 20 sub-watersheds valued at 80 to 100 t/ha/year.

The range of the mean SY by sub-watershed in the 93 sub-watershed discretization was between 10.28 t/ha/year (SW#1) and 92.97 t/ha/year (SW#73) while the range of the median was 0.38 t/ha/year (SW#8) and 88.11 t/ha/year (SW #73). For the 25 SW, the mean SY was between 10.52 t/ha/year (SW #3) and 73.92 t/ha/year (SW #8) and the range of the median was between 5.50 t/ha/year (SW #3) and 49.43 t/ha/year (SW #16).

To investigate the mean, median, and mode values by sub-watershed discretization (25 SW and 93 SW) and the correlation of each aspect of sediment yield analysis (SDR, SL, and SY), this study created the boxplots displayed in Figure 9a–e. This study additionally developed the PDF plot of each sub-watershed to investigate similarities and differences between the sub-watersheds—a sample of these plots from the 93 SW are displayed in Figure 10. The variance in the shapes of the probability density functions of SDR, SL, and SY was evident.

#### **5. Discussion**

Past studies have shown the distribution of soil erosion of the Shihmen Reservoir watershed, and this paper estimates the geospatially distributed sediment delivery ratio (SDRi) under a beta determined by the recursive method of the SEdiment Delivery Distributed (SEDD) model and a known watershed-wide SDRw.

#### *5.1. The Importance of SDR*

The SDR plays an essential role in soil erosion research as an additional parameter when considering sediment yield, slope conservation/remediation, and sediment control projects for engineers and decision-makers. An SDR of almost 100% denotes an area where the dislodged sediments have a near-perfect chance to reach a nearby river or stream. In comparison, an SDR of 0% within a highly erodible soil may mean that the soil has a high likelihood of depositing before reaching the river or stream. The SDR consideration creates a twofold issue for engineers: the classification of sediment loads entering the stream and the classification of the deposition quantities on the slopes that may lead to future sediment related disasters.

The traditional method of conveying the SDR across a region has historically been studies of small watersheds or empirical equations. The SDR values obtained from these methods were extrapolated to the entire watershed and utilized to calculate the SY from the gross erosion model's output SL. With the advancement of GIS, it is now possible to use morphological units or grid cells to compute the SDR for each location in the watershed. However, unlike SL or SY, it is essential to note that the individual SDR (SDRi) cannot be averaged directly. The average of the SDRi is not the SDRw because of the definition of SDR. To calculate the SDRw, individual SY has to be computed from the SDRi first. SDRw is then the ratio of the sum of SY divided by the total soil loss. In this study, we calculated the mean of SDRi purely for the sake of statistical analysis.

The determined SDR can be used to determine SY, which is then used as a basis for decision-making, countermeasure designs, and other uses. Therefore, the accuracy of these evaluations is critical to the management of a watershed, such as the Shihmen Reservoir watershed, where significant soil erosion risk is apparent.

#### *5.2. The* β *Coe*ffi*cient in the Model*

The SDR calculation within the SEDD model utilizes one unique parameter requiring expert opinion or field experiments, β. The β coefficient in the model is calculated using three methodologies, the recursive method, the trial-and-error method, and the field experiments method, using Cs137 [6,42– 44]. The β coefficient is a watershed-wide constant, but in some cases, this parameter is evaluated for each spatial discretization unit (for sub-watersheds). Additionally, some researchers have subjectively set a value using known published values of β [45]. The "recursive approach" or "recursive fitting approach" as used in this study defines β using known values of SDRw. When β is determined, SDRi can be determined from Equation (3). Sometimes, SDRw is obtained from empirical equations. For example, Vanoni's method [46] was used to estimate β as a SEDD parameter [6]. Burguet et al. [44] evaluated β over two olive orchard catchments using different annual C factors and R factors, determining the median β from multiple events and improving the assessment of the β parameter for their two study catchments. Other studies have utilized an additional factor, the vegetation or land-cover parameter, α [27,47], but Lopez-Vicente and Navas [29] concluded that the SDR in the SEDD model has limited sensitivity to the land use type. The slope and length of the flow paths have more influence on the model.

Additionally, Lai [48] explored the use of the SEDD in the Shihmen Reservoir watershed but determined the β coefficient by setting it to values between 0 and 200 using the Jain and Kothyari's variant of the original model which introduces the coefficient ai (also denoted as ki) to consider different land uses based on expert opinions from a table introduced by Haan [49]. However, this study utilizes the original model developed by Ferro and Minacapilli [24] because the use of the other model introduces high spatial variance in the ai coefficient [27] and increases uncertainty to the SDR model. This is because the expert opinions have to be utilized to compare Haan's table to land use types not considered originally or are geographically dissimilar. This study determines the β = 0.00515 using the known SDRw value, while Lai [48] determined β = 8.5. For these reasons, this study has been distinguished as a more accurate implementation of the SEDD model.

The SDRw (0.49) used in this study was derived by Li [8], which was calculated from reservoir sedimentation, and the soil loss included sheet and rill erosion, gully erosion, and landslides sediments. This recursive method is discussed thoroughly by Porto and Walling [42] and is employed in regions lacking data to obtain reliable values for β. The recursive method of the SEDD model in our study was developed within the model builder of the ArcGIS software. It extended the works of the model originally developed by Jhan et al. [50] and Chen et al. [51].

#### *5.3. SEDD Model Using the Grid–Cell-Based Analysis*

The SEDD model determines the sediment yield at the outlet from each hydrological unit or grid cell unit. First, the USLE equation is applied to the study area, and the gross erosion derived is then reduced by the SDR to the equivalent sediment yield at the outlet. The SDR equation within the SEDD model uses physical and hydrological parameters (slope and flow distance, respectively) to determine the sediment delivery to channels and eventually to the reservoir outlet.

The correlation between the SY, SL, and the ratio SDR is evident in the previously shown Figures 5–7. A visual comparison between Figures 4 and 5 can easily distinguish that the SDR values are inversely related to the travel time. As the travel time or distance to a river channel increases, the probability of eroded particles entering the stream decreases and consequently SDR decreases.

In Figure 5, the SDR of the watershed ranged from 0 to 1, and the areas where sediment delivery has a higher likelihood to reach channels or the outlet is focused mainly around the rivers and streams distributed throughout the watershed. The soil loss evaluation by USLE has been previously discussed in Chen et al. [51] and Liu et al. [16]. This study builds upon the model of Liu et al. [16] by increasing the number of rainfall stations in the analysis of Rm to 41 stations in and around the watershed [38]. Significantly, the SL is 0–40 t/ha/year at the lower elevation in the northern areas of the watershed that surround the reservoir. In contrast, in the southern reaches of the watershed of higher elevations, the SL increases to 80 t/ha/year or more and often exceeding 100 t/ha/year. The delivery to the channel is limited by the SDR, as shown in Figure 5. Therefore, the results of SY in these areas follow the pattern of the SDR with increases around the river channels. The influence of the SDR on SY is evident from visual inspection of the maps.

#### *5.4. Mean, Median, and Mode of 25*/*93 Sub-Watersheds*

This study determined the SDR and SY of the SEDD model. It analyzed the probability density function of the SDR, SL, and SY for a discretization of the watershed into 25 sub-watersheds and 93 sub-watersheds. The distributions of SDR, SL, and SY for the watershed and sub-watersheds were found to be all non-normal distributions.

For each sub-watershed, we calculated its mean, median, and mode of SDR, SL, and SY from its distributions. The results were then aggregated into Figure 9 using boxplots. Additionally shown in Figure 9a–c are the mean and median of the entire watershed (horizontal dash lines). Comparing the 25 sub-watersheds with the 93 sub-watersheds, we can see that the range (between the ends of whiskers) of 25 SW is always smaller than that of the 93 SW, no matter what the distribution type is (SDR, SL, or SY). This means that with finer discretization, there is more variation, and it is more likely to obtain extreme values. Interestingly, the interquartile ranges (IQR) of the means of SDR, SL, and SY of 25 SW and 93 SW are about the same. It can also be noted that although the mean values of SDR and SL of 93 SW have broader ranges than those of the 25 SW, the resulting mean values of SY for both the 25 and 93 SW have about the same range. Figure 9d shows the distribution of the watershed level

SDR, SL, and SY. Compared with Figure 9b,c, the ranges of the watershed SL and SY are larger than those of both sub-watershed discretizations.

Figure 9e shows the skewness of the SDR, SL, and SY datasets of the 25 SW and 93 SW in comparison with the skewness of the grid–cell-based analysis of the entire watershed. The difference in the skewness of the SDR between all three discretizations (Shihmen as a whole, 25 SW, and 93 SW) is small. Their range (SDR) is much smaller than that of the SL and SY. Additionally, the skewness of the SL and SY are similarly distributed. The lower quartile range is more dominant in the 93 SW, while the 25 SW shows a generally even distribution between the upper and low quartiles. The skewness of the entire watershed SDR was 0.153, for SL 8.359 and for sediment yield 9.134. The upper whiskers of the 93 SW for SL and SY are significantly longer than the lower whiskers. Sixty-two of the 93 sub-watersheds (66.7%) for SDR were positively skewed (right-skewed), meaning the majority of SDR were below the 0.500 while for the 25 sub-watersheds, 17 were positively skewed (68.8%). For the SL, 24 of 25 sub-watersheds (96%) were positively skewed (right-skewed), and 81 of 93 sub-watersheds (87%) were, too. All of the 25 sub-watersheds (100%) and all 93 sub-watersheds were positively skewed for the PDF of SY. The similar positive relationship between the skewness of SL and SY shows that the SL is more influential on the SY than the SDR is. Both example cases in Figure 10 support this. SW #16 has a skewness of 0.625 (SDR), 4.568 (SL), and 5.463 (SY), and SW #93 has a skewness 0.064 (SDR), 0.357 (SL), and 0.827 (SY).

Applying Equation (4) to each of the 25 or 93 sub-watersheds, we also calculated the sub-watershed SDRw, as shown in Figure 11 (not averaging of SDRi). It can be seen that the discretization of 93 sub-watershed shows a broader range of SDRw values than the 25 sub-watersheds. These SDRw values are essential if a sub-watershed level study is to assess detailed SY features for each sub-watershed. The SDRw can also serve as one of the criteria to select the most critical sub-watersheds to monitor.

**Figure 11.** Sub-watershed SDRw of (**a**) 25 sub-watersheds and (**b**) 93 sub-watersheds.

#### *5.5. Sub-Watershed Prioritization*

Soil Conservation and countermeasures against sedimentation of river waterways are a costly endeavor and involve significant budgetary concerns for nations such as Taiwan. Climate change and its effects, such as the increase of global land surface temperatures and changing precipitation

patterns, pose increased concerns for water security for large population centers such as Taipei City and Taoyuan. These budgetary concerns reinforce the need for sustainable watershed management and evidence-based decision making for targeted soil conversation and soil erosion mitigation programs. Therefore, sub-watershed prioritization can improve the effectiveness of these interventions while controlling budgets. In this study, the three measurements of central tendency (mean, median, and mode) of SDR, SL, and SY were explored. The SEDD model introduces a physically-based methodology for determining the SDR that is useful in understanding what probability of soil will enter the stream or outlet while modeling changing rainfall or land cover levels.

The prioritization by mean SY for 25 and 93 sub-watersheds is shown in Figure 12, each with five levels of prioritization from 0 to 20 t/ha/year (low) to 80 to 100 t/ha/year (high) derived from the ranges previously discussed. It is evident that there are striking similarities between both of these prioritization schemes. Generally speaking, the SY at the outlet is low, but in the eastern and southern extremes of the watershed, there are specific sub-watersheds with medium to high sediment yield values.

**Figure 12.** Sub-watershed prioritization by sediment yield of (**a**) 25 sub-watersheds and (**b**) 93 sub-watersheds.

#### **6. Conclusions**

This study applied the SEdiment Delivery Distributed (SEDD) model to the Shihmen Reservoir watershed in Taiwan. The main merit of the study lies in the first attempt to derive the sediment yield by soil erosion for the entire Shihmen Reservoir watershed and its sub-watersheds. By using the recursive method, this study determined the SEDD β coefficient to be 0.00515 and predicted the spatial distributions (maps) of travel time, SDRi, soil erosion (soil loss), and sediment yield of the Shihmen Reservoir watershed. The resulting average of SY for the Shihmen Reservoir watershed was 42.08 t/ha/year and by sub-watershed 10.52–73.92 t/ha/year for 25 sub-watersheds and 10.28–92.97 t/ha/year for 93 sub-watersheds.

This study also recommended the use of mean SY by sub-watershed in sub-watershed prioritization for future soil conservation decision-making. The model presented takes advantage of all currently available data for the Shihmen reservoir watershed. However, it is essential to note that the sediment

yield modeling can be improved by increased on-site validation and the use of aerial photogrammetry to deliver more updated data to better understand the field situations. Considering the implications of climate change, there is also a great need for further research on the sediment yield by soil erosion and other land degradation sources and the impacts of changing precipitation regimes. Using geospatial models of sediment yield as guidance can help the local government to better implement engineering and ecological solutions for soil conservation to achieve sustainable land management (SLM), thereby reducing sediment yield risks and creating a well-balanced solution for all stakeholders.

**Author Contributions:** Conceptualization, W.C.; Data curation, K.T.; Formal analysis, K.T.; Funding acquisition, W.C. and U.S.; Investigation, B.-S.L. and U.S.; Methodology, W.C.; Project administration, W.C.; Resources, B.-S.L.; Software, K.T.; Supervision, W.C.; Visualization, K.T.; Writing—original draft, K.T. and W.C.; Writing—review & editing, W.C., B.-S.L. and U.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was partially supported by the National Taipei University of Technology-King Mongkut's Institute of Technology Ladkrabang Joint Research Program (Grant Numbers NTUT-KMITL-106-01, NTUT-KMITL-107-02, and NTUT-KMITL-108-01) and the Ministry of Science and Technology (Taiwan) Research Project (Grant Number MOST 108-2621-M-027-001).

**Acknowledgments:** We thank the anonymous reviewers for their careful reading of our manuscript and insightful suggestions to improve the paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*
