*Article* **Comparison of Vegetation Types for Prevention of Erosion and Shallow Slope Failure on Steep Slopes in the Southeastern USA**

**Homayra Asima †, Victoria Niedzinski †,‡, Frances C. O'Donnell \* and Jack Montgomery**

Department of Civil and Environmental Engineering, Auburn University, Auburn, AL 36849, USA

**\*** Correspondence: fco0002@auburn.edu; Tel.: +1-334-844-7168

† These authors contributed equally to this work.

‡ Current Address: School of Earth and Climate Sciences, University of Maine, Orono, ME 04469, USA.

**Abstract:** Shallow slope failures due to erosion are common occurrences along roadways. The use of deep-rooted vegetative covers is a potential solution to stabilize newly constructed slopes or repair shallow landslides. This study compared species that may provide slope stabilization for sites in the Piedmont region of the southeastern USA. Six species were tested on experimental plots under natural rainfall conditions, and vegetation health and establishment were monitored. Two methods were used to measure surface erosion, measurement of total suspended solids in collected runoff and erosion pins. While measurement uncertainty was high for both methods, differences were evident between species in the spatial distribution of surface erosion that was related to the quality of vegetation establishment. For three species that established well, soil cores were collected to measure root biomass at depths up to 40 cm. Vetiver grass (*Vetiveria zizaniodies*) had substantially higher mean root biomass (3.75 kg/m3) than juniper shrubs (*Juniperus chinensis*; 0.45 kg/m3) and fescue grass (*Lolium arundinaceum*; 1.28 kg/m3), with the most pronounced difference in the deepest soil layers. Seeding with turf grass such as fescue is a common practice for erosion control in the region but replacing this with vetiver on steep slopes may help prevent shallow landslides due to the additional root reinforcement. Additional work is needed to measure the magnitude of the strength gain.

**Keywords:** erosion; vegetative covers; root biomass; erosion pins; vetiver grass

#### **1. Introduction**

Shallow slope failures due to erosion are common occurrences along roadway slopes in regions where high intensity rainfall is prevalent [1,2]. These instabilities are usually relatively small in size, but the consequences can cause major economic and social disruption [3,4]. Sediment from erosion can also have significant environmental impacts [5]. Many factors can impact roadside erosion, such as rainfall characteristics, slope gradient, rutting caused by lawn maintenance equipment, roadside construction, soil type, and the presence and type of vegetation [6].

The establishment of vegetation on newly constructed slopes can prevent erosion and increase slope stability [2,7–10]. Well-developed aboveground vegetation prevents surface erosion by intercepting rainfall and wind, increasing surface roughness, binding loose soil particles, and creating a physical barrier to sediment movement [11–14]. There is a nonlinear relationship between precipitation and sediment yield, with precipitation driving erosion while also increasing vegetation growth up to the point where vegetation is no longer water-limited [15]. Plant root systems provide belowground support and can prevent shallow slope failures by increasing soil strength through reinforcement [16–19]. Roots are strong in resisting tension forces while soil is strong in resisting compression forces [20], so root-permeated soil creates a mixed material that can withstand both forces [21]. Roots perpendicular to the soil surface reinforce the soil mass on the sheared surface while roots growing parallel increase in-plane tensile strength [22,23]. Additionally, plants remove

**Citation:** Asima, H.; Niedzinski, V.; O'Donnell, F.C.; Montgomery, J. Comparison of Vegetation Types for Prevention of Erosion and Shallow Slope Failure on Steep Slopes in the Southeastern USA. *Land* **2022**, *11*, 1739. https://doi.org/10.3390/ land11101739

Academic Editors: Chiara Piccini and Rosa Francaviglia

Received: 31 August 2022 Accepted: 6 October 2022 Published: 8 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

water from the soil through transpiration, which prevents slope failures by reducing the unit weight of the soil and increasing apparent cohesion from matric suction [3,24–26].

Plant species and functional types differ in the traits that determine their suitability for use in vegetative covers on steep slopes. Recent reviews of vegetation traits and their effects on slope stability have been published by [8] and [10]. Rapid growth after planting and abundant, evenly distributed aboveground biomass are key to preventing surface erosion [6,27]. Generally, herbaceous vegetation performs better than woody species during the establishment phase [28]. Some vegetation types have specific characteristics, such as grasses that can grow in hedgerows [29,30] and ferns that form rhizome mats [31], which make them particularly effective in binding soil and forming physical barriers to erosion. High root length density [32] and fine-root content [33] are important to soil strength, and these are typically associated with herbaceous vegetation. However, woody vegetation tends to have deeper roots to prevent slope failures [18], though deep rooted grass species do exist [34,35]. These traits must be weighed against practical concerns, such as suitability for local conditions, ease of planting, and maintenance requirements [6,27,36–38].

In the southeastern USA., turf grasses grown from seed are the most common vegetative erosion control [6]. These perform well on relatively flat terrain, where deep root structure is not needed for stabilization [39]. However, turf grasses require mowing. On steeper terrain, mowing can cause ruts that increase erosion (Figure 1) and may expand into shallow slope failures during rain events [4]. There is interest among transportation management agencies in finding alternatives to turf grass that would provide deeper soil stabilization while still establishing quickly and preventing surface erosion. The goal of this study is to evaluate the field performance of several candidate species in experimental plots in the Piedmont region of Alabama. This area is especially prone to shallow slope failures and erosion along slopes due to mowing activities [4,5] (Figure 1a).

**Figure 1.** Eroded slopes along (**a**) US-280 near Waverly, Alabama (photo from Google StreetView) and (**b**) Alabama Highway 69 near Tuscaloosa, Alabama (photo provided by Jacob Hodnett, ALDOT).

Previous experimental plot studies of erosion and slope stability have found substantial differences between vegetation types. In a study of very steep (42.5◦) slopes in central China, a mix of grass and shrubs reduced runoff and surface erosion, and had deeper roots than grass alone [2]. Plots with any type of planted vegetation performed better than those that were allowed to revegetate naturally. Studies in a semi-arid region of Spain found differences in erosion rate between species that were mainly driven by quality of establishment [40,41]. A previous study in the southeastern USA found a plot planted with a mix of native grass seed had a lower sediment yield than an exotic seed mix [42], though the difference was not statistically significant. A study by [43] examined combining biochemical surface treatments with vegetation and found that using

seeded biochemical treatments on slopes was effective for both short- and long-term stabilization against erosion. Recent work by [44] highlighted the potential for vetiver grass to support resilient transportation systems by mitigating slope failures and improving stormwater quality, but this study also highlighted the relative lack of use of vetiver in the United States.

To determine the efficacy of a species for slope stabilization, both prevention of surface erosion and stabilization of deeper soil layers must be assessed. Previous studies have measured sediment yield by collecting runoff and measuring total suspended solids (TSS) in the collected runoff [2,42]. While this method is well-established, constructing runoff collection infrastructure is costly and adequate replication for statistical analysis is difficult to achieve. The method is also prone to missing data [42], especially during periods of high intensity rainfall when most erosion occurs. Methods that directly measure changes in the elevation of the vegetated surface are an alternative that have the advantage of characterizing spatial patterns in surface erosion. Some techniques that are common for bare soil surfaces, such as total station surveys and lidar scanning, do not work well on vegetated surfaces [45,46]. Erosion pins offer a simple, low-cost alternative that provides point-based measurements of erosion or deposition through a manual measurement of surface height relative to the fixed reference point of the pin head. In this study, both runoff collection and erosion pins are used to assess surface erosion. For deeper soil stabilization, the measurement of root biomass and morphological characteristics in soil core samples is an accepted and established method [2,47].

This study has the following objectives:


This study addresses processes on small (<50 m) constructed slopes in humid environments over short time scales (<2 years) post-construction. Recent research on vegetationsediment interaction has emphasized the importance of orographic effects on precipitation that occur over large elevation gradients [48] and ecogeomorphic coevolution of landforms that occurs over centennial scales in arid and semi-arid environments [49], and these are outside the scope of the current research. Based on erosion control strategies that were successful in previous studies [40,41], we focus on planted vegetation rather than allowing for vegetation to establish naturally after disturbance. Therefore, the variation in species prevalence associated with slope, aspect, soil type, and other factors is not considered.

#### **2. Materials and Methods**

The materials and methods for this study are summarized in the flow chart shown in Figure 2.

**Figure 2.** Flow chart illustrating the materials and methods for this study.

#### *2.1. Study Site*

This study focuses on the Piedmont region of the southeastern USA, and particularly the state of Alabama. Shallow slope failures and erosion on highway cut slopes are a common occurrence in this region due to the prevalence of high-intensity rainfall [50] and hilly terrain. The Piedmont region is one of the fastest growing areas of the United States in terms of population and land-use change [51] and there is a need to identify sustainable solutions to managing soil erosion in this region. This area has a humid subtropical climate with mean annual rainfall of 132 cm and mean annual temperature of 17.9 ◦C. Most rainfall occurs as localized convective thunderstorms occurring in the summer and as widespread frontal precipitation, including tropical storms that occur in the fall through spring. Class A annual pan evaporation is 122 cm [52]. While vegetation growth is generally not waterlimited, droughts do occur, particularly in the fall.

The experimental plots were established on a roadside slope along the National Center for Asphalt Technology (NCAT) Test Track in Opelika, AL, USA (32.595390◦, −85.296363◦). The plots have a 25–30◦ slope. Particle size analysis of study area soil was performed with the Integral Suspension Pressure method [53] using a Pario device (Meter Environment, Pullman, WA). The surface soil layer (0–25 cm) is clay loam (29% sand, 40% silt and 31% clay) and deeper layers (>25 cm) are silt loam (23% sand, 65% silt and 12% clay). The plots are on a north-facing slope, so conditions are slightly cooler with less radiation than the local average. Daily precipitation data were collected onsite by NCAT using an automated weather station.

#### *2.2. Experimental Plot Design*

The experimental plot design was based on [2] and [42]. The plots were prepared, built and planted in May 2020. Existing vegetation was treated with Round-Up herbicide (Bayer, Germany) and removed with a small excavator. Each plot consisted of a 1.5 m × 3 m wooden frame built from pressure-treated lumber (Figure 3). The outlet of each plot was tapered to a 45 cm exit to create a total surface area of 5.23 m2. The four corners of all the frames had rebar installed to keep the plots stable on the slope and maintain the shape of the 45 cm outlet. The planks at the end of all the frames were wrapped with plastic sheeting and the ground between them had sheeting shingled under the earth to create a smooth path. A trench was dug 2 m above the frames, creating a berm directly above the plot to divert water coming from the slope above the plots. An erosion fence was erected at

(**a**) (**b**)

the top of the frames and below the berm to prevent sediment movement from the slope above the plots.

**Figure 3.** Completed experimental plot after installation and planting with plots 1–5 shown left to right (**a**) and plot dimensions (**b**).

#### *2.3. Vegetation Planting and Maintenance*

Prior to planting, the area was tilled with a mechanical rototiller. Vegetation and seeds were planted according to the providers' instructions. Seeding straw was spread over plots where plants were grown from seed and in bare areas of plots with potted plants. Plots were watered regularly for three weeks after planting. Six vegetation types were selected for testing based on previous literature on vegetative covers for erosion control:


Initially, each species, except switchgrass, was planted in one randomly assigned plot. There were problems with the establishment of maidenhair fern and hairy vetch. One was replaced with switchgrass and the other with a mix of juniper and fescue grass after the first year of the study. The species used and planting dates are summarized in Table 1. The vetiver grass was trimmed from a height of 2 m to a height of 1 m in April 2021. Weeds were a persistent problem. A broad-spectrum herbicide (Spectracide, United Industries, St. Louis, MO, USA) was applied and weeds were manually removed from all plots in July 2020 and late June and early July of 2021.

**Species Plot Planting Date Termination Date** Juniper <sup>1</sup> 1 May 2020 August 2022 Vetiver 2 May 2020 August 2022 Fescue 3 May 2020 August 2022 Maidenhair Fern 4 May 2020 July 2021 Hairy Vetch 5 May 2020 April 2021 Juniper <sup>2</sup> and Fescue 4 July 2021 August 2022

Switchgrass 5 April 2021 August 2022

**Table 1.** Species planting dates in experimental plots. Plot numbers are described in Figure 3.

<sup>1</sup> *Juniperus chinensi*; <sup>2</sup> *Juniperus horizontalis*.

#### *2.4. Runoff and Erosion*

Two methods were used to estimate erosion: runoff collection with TSS measurements and erosion pins. The runoff collection method was applied from June 2020 to March 2021, and erosion pins were applied from August 2021 to April 2022. Previous studies have traditionally only used one of these methods [2,16,62], but we found them to be complimentary to obtain both the spatial distribution of erosion and deposition and the total settlement yield.

#### 2.4.1. Runoff Collection with TSS

A hole was dug at the base of each plot, and a 68-L plastic bin was placed in the hole. The bins were positioned with the plastic sheeting at the base of each plot flowing into the bins. Bins were partially covered with lids to minimize water loss due to evaporation. In each bin, a U20L HOBO water level logger (Onset, Bourne, MA, USA) was suspended from the lid with wire and submerged in water. The loggers were deployed on October 7 2020 and were set to record pressure and temperature every 15 minutes. Prior to this date, water depths were measured manually once per week. An additional logger was placed outside of the bins to record atmospheric pressure. Measured pressures were converted to change in water level at 15-min intervals using software HOBOware V3.7 analysis software (Onset, Bourne, MA, USA). Change in water level was converted to change in volume using the dimensions of the bin.

Once per week, a well-mixed sample of water from each bin was collected. TSS in each sample was measured by filtration following US Environmental Protection Agency method 160.2 [63]. After sampling, the bins were emptied and cleaned and filled with a known volume of clean water such that the water was above the measurement threshold of the water level logger. Assuming TSS of the clean water is negligible, sediment yield (*SY*) in g for each one-week collection period was determined by multiplying the measured TSS (g/L) by the total volume in the bin (L) at the end of the week. The volume was divided by the surface area of the plot (5.23 m2) to give runoff depth (mm, after unit conversion).

#### 2.4.2. Erosion Pins

Erosion pins estimate erosion and deposition at point locations by indicating the change in height of the land surface relative to a fixed reference [64]. EasyFlex 8-inch (20 cm) nylon anchoring spikes (Dimex, Marietta, OH, USA) were used as erosion pins and were installed perpendicular to the ground. Three erosion pins were installed in each plot on 12 April 2021. Pins were installed 0.3 m inside the left boundary of the plots. This distance minimizes edge effects while making it possible to measure the pin height without requiring foot traffic in the plot. The pins were placed 0.6, 1.5, and 2.4 m away from the upper boundary of the plot. These pins are considered upslope, midslope, and downslope, respectively, for analysis. After the first erosion pins produced reasonable data, four additional pins were added to each plot to increase statistical power. Three were added inside the right boundary of the plot in the same configuration as the previous pins. One additional upslope pin was added in the middle of the plot 0.3 m from the top boundary.

A ruler with 1 mm gradations was used to measure the visible height of the erosion pins above the ground. Values that are greater than the baseline value indicate erosion is occurring at the point, while values less than baseline indicate that deposition is occurring. The erosion pins were monitored and measured biweekly or after any large rain event from 12 April 2021 to 15 April 2022. Due to soil disturbance from installation of the erosion pins, data were not collected during a one-month stabilization period after installation [65]. The first measurement after this period was used as the baseline height for the study. Thus, the first set of pins was analyzed from 25 May 2021 to 15 April 2022 and the second set of pins was analyzed from 12 October 2021 to 15 April 2022.

Some studies have suggested that the absolute value of change in erosion pin height is a better indicator of erosion when multiple pins are considered, because it differentiates plots with both erosion and deposition [65]. In this study, we are interested in overall sediment yield from the plots, so actual change in pin height is used for analysis. Linear regression analysis with either time in days or cumulative rainfall based on daily measurements is the independent (*X*) variable and change in erosion pin height is the dependent (*Y*) variable. Time and cumulative rainfall are strongly correlated, so they were considered in separate single-variable regression models rather than a multivariate model. The measurement on 12 October 2021 was used as the baseline because all pins were installed and stabilized by that date. A statistically significant positive slope indicates erosion is occurring at the pin location while a negative slope indicates deposition. One-way ANOVA was used to determine if change in erosion pin height from 12 October 2021 to 15 April 2022 was significantly different among plots. An unpaired *t*-test was used to determine if the mean change in erosion pin height over the same time period was different for upslope and downslope pins. A significance level of 10% was used for statistical analyses due to the inherently high variability in erosion data. All statistical analyses were performed in Microsoft Excel V16.

#### *2.5. Root Biomass Analysis*

Samples for root biomass testing were collected from the three plots that were planted at the beginning of the study and had established well: fescue, vetiver, and juniper. Samples were collected in February 2022 after nearly two years of growth. Sampling was carried out when the soil was moist, for best results [66]. A fixed-volume soil core sampler (AMS, American Falls, ID, USA) was used to collect the samples. Cores were collected with a slide hammer until the point of resistance, which was reached at 40 cm depth. One upslope and one downslope sample was collected 0.6 m and 1.8 m from the upper plot boundary.

The soil cores were cut into 5 cm sections to determine the distribution of root biomass with depth. Due to dry soil near the bottom of the soil profile, the core could not be sectioned below 30 cm depth, so 30–40 cm depth is analyzed together. Methods from [67] were used to determine root biomass. The samples were dried at 110 ◦C for 24 h and weighed before and after drying to determine moisture content. After drying, samples were soaked in tap water for 30 min to break down soil aggregates. Roots were collected by washing the samples through a 2 mm (#10) sieve followed by a 600 μm (#30) sieve under running tap water. The roots collected on the sieves were dried at 60◦C for 24 h. Roots were weighed after drying to determine dry root biomass in each sample (g) and converted to a soil root biomass (g/m3) by dividing by the volume of the core section.

#### **3. Results**

#### *3.1. Vegetation Growth and Establishment*

After the initial planting, all five of the plant species were able to grow and establish successfully. However, native weeds and plants in the surrounding area began to encroach and quickly took over the plots, and weeding was required. After weeding, the two plots containing the hairy vetch and maidenhair fern were overwhelmed by the disturbance and showed little new growth. The plots were overtaken by fescue grass from the adjacent plot, as well as white clover (*Trifolium repens*) and ground-ivy (*Glechoma hederacea*). Fescue grass grew well if it had direct sun. The small amount of shading caused by the silt fence resulted in poor establishment at the top of the plot. The height of the fence was reduced after the first year of the study.

During the first year of the study, vetiver and juniper were the most successful at general establishment. The area surrounding the junipers was overgrown by similar weeds as the other plots, but this did not impact the growth of the juniper. A mix of juniper and fescue grass was planted at the start of the second year (Table 1) as a potential strategy to improve the weed resistance of the area surrounding the juniper. However, weeds were still an issue. By September 2020, the vetiver was nearly at its full height (2 m) and the hedgerows developed an almost impenetrable layer that was resistant to weeds. In November 2020, the vetiver reverted to a dormant stage but remained healthy and regrew the next year. The switchgrass that was planted in the second year of the study did not grow well as it was planted off season due to issues with obtaining seeds. Thus, switchgrass is excluded from further analysis. Juniper, vetiver, and fescue are compared in the subsequent analyses due to their good establishment and consistent growth throughout the study (Figure 4).

**Figure 4.** Final condition of the three plots where vegetation established well, photographed in April 2022. Shown from left to right are juniper, vetiver, and fescue.

#### *3.2. Runoff and Erosion*

#### 3.2.1. Runoff Collection Method

The time series of runoff for the three plots is shown in Figure 5 for two representative rainy periods, one during the first year after planting and one almost one year after planting. The juniper had the lowest runoff volumes while vetiver and fescue had similar

values during most of each rain event. While subtle differences between plots in slope or underlying soils cannot be ruled out, this difference was consistent across rain events.

**Figure 5.** Daily precipitation and hourly change in runoff volume from three erosion plots during rainy periods: (**a**) six months after planting; and (**b**) ten months after planting.

Based on the runoff collection and TSS method, the plot with fescue grass had the highest sediment yield over the first nine months of the study while the plot with the juniper shrubs had the lowest (Figure 6). The initial spike in sediment yield in June was collected one month after planting and shows that the grass provided the least amount of initial surface-soil stabilization. All three species showed similar spikes in sediment during rain events at the end of November and the end of March.

**Figure 6.** Cumulative sediment yield during the first year of the study measured using the runoff collection with TSS method.

#### 3.2.2. Erosion Pins

As previously discussed, a single-variable regression model was used to assess changes in erosion pin height during the study period. The linear regression models with time and cumulative rainfall as the independent variables produced similar results in terms of which subsets of the data had the best model performance. However, R2 values were consistently higher for models with time as the dependent variable, so this is considered for analysis. Linear regression between time and erosion pin height showed spatial variability in erosion and deposition among the species tested. For juniper (Figure 7) the slopes for upslope, midslope and downslope positions are 0.018, 0.006 and −0.02 respectively. This indicates erosion from the top pins, almost no erosion at the middle pins, and deposition at the bottom pins. The juniper showed more growth at the base of the plot, which may have slowed water flow leading to deposition. For fescue grass (Figure 8), the slopes for upslope, midslope and downslope positions are 0.011, −0.007 and 0.013, respectively. There is deposition and erosion evident at the midslope and downslope, respectively. While the slope was positive for the upslope pins, the high variability in the data for this area makes it difficult to draw conclusions about the dynamics. For vetiver (Figure 9), the slopes for upslope, midslope and downslope positions are 0.001, 0.011 and 0.018, respectively. The vetiver established uniformly across the plot but was not present near the plot outlet because it was not possible to plant slips in this small area. For a uniform surface, the flow velocity is expected to be highest at the bottom of the plot because of the accumulation of rainfall over the plot. This could be why the vetiver shows little to no erosion at the upslope and midslope and erosion at the downslope. The denser growth of the vetiver may also change the overland flow patterns and velocities [68], but these flow patterns were not measured in this study.

**Figure 7.** Linear regression for the plot with juniper between time in days and erosion pin height relative to the measurement on 12 October 2021, when all erosion pins were installed and stabilized. Solid blue lines and markers show the trajectory of measurements for each erosion pin and dashed red lines show the regression line. Regression slope is given in each plot with \* indicating a regression *p*-value less than 0.1. Regression lines are calculated separately for upslope (**top**), midslope (**middle**), and downslope (**bottom**).

**Figure 8.** Linear regression for the plot with vetiver between time in days and erosion pin height relative to the measurement on 12 October 2021, when all erosion pins were installed and stabilized. Solid blue lines and markers show the trajectory of measurements for each erosion pin and dashed red lines show the regression line. Regression slope is given in each plot with \* indicating a regression *p*-value less than 0.1. Regression lines are calculated separately for upslope (**top**), midslope (**middle**), and downslope (**bottom**).

The regression analysis summary (Table 2) shows that the *R*<sup>2</sup> value for every species is less than 0.5, indicating that the independent variable (time) is explaining only a small amount of the variation in the dependent variable (erosion pin height). Other potential sources of variation include spatial variability in erosion patterns and measurement errors. The species differed in which positions showed significant erosion or deposition. However, where erosion or deposition was occurring, the rates were similar, as indicated by overlapping 90% confidence intervals of the slope values. An exception to this is between the vetiver and juniper. These do not overlap at the upslope or downslope locations, though it should be noted that the *R*<sup>2</sup> and *p* values for vetiver upslope indicate that time was not a significant predictor of change in erosion pin height (Table 2).

One-way ANOVA analysis of the effect of species on change in erosion pin height did not show a significant effect (*F* = 0.55, *p* = 0.58), indicating similar mean change among plots (Figure 10a). Thus, the differences between plots were primarily in the spatial distribution of erosion and deposition due to differences in the uniformity of vegetation establishment. The influence of slope position indicated a clear pattern of positive change in pin height for upslope pins, indicating erosion negative values, and deposition in downslope pins (Figure 9b). A *t-*test demonstrated a significant difference between upslope and downslope pins (*t* = 1.48, *p* = 0.08).

**Figure 9.** Linear regression for the plot with fescue grass between time in days and erosion pin height relative to the measurement on 12 October 2021, when all erosion pins were installed and stabilized. Solid blue lines and markers show the trajectory of measurements for each erosion pin and dashed red lines show the regression line. Regression slope is given in each plot with \* indicating a regression *p*-value less than 0.1. Regression lines are calculated separately for upslope (top), midslope (**middle**), and downslope (**bottom**).



<sup>1</sup> Significant erosion (*p* < 0.10); <sup>2</sup> Significant deposition (*p* < 0.10).

**Figure 10.** Boxplots showing the change in erosion pin height between 12 October 2021 and 15 April 2022, as grouped by (**a**) species planted on the experimental plot (data from all slope positions included); and (**b**) slope position of the erosion pin (data from all plots included).

#### *3.3. Root Biomass*

Vetiver had the highest overall root biomass in the top 40 cm of soil (3.75 kg/m3), followed by fescue (1.28 kg/m3) and juniper (0.45 kg/m3). In the upper layers of the soil, the amount of root biomass is very similar among the species (Figure 11). However, the root biomass of vetiver increases with depth and shows higher amounts of root biomass than the other species in the deeper soil layers. Total root biomass was substantially higher for vetiver while juniper was lower than the other species (Table 3). Root biomass was generally higher in the upslope core, though this was most pronounced for juniper. It should be noted that roots could not be identified by species, so some of the roots sampled from the juniper and fescue plots are likely from weeds. Very few weeds were present on the vetiver plots.

**Figure 11.** Root biomass by layer for three species in the (**a**) upslope core; and (**b**) downslope core.


**Table 3.** Root biomass by species in the top 40 cm of soil for the upslope core, the downslope core, and the mean of the two cores.

Vetiver produced roots with a larger diameter than both the fescue and juniper (Figure 12). Root tensile strength, which is assessed per unit area, is inversely proportional to diameter [20,69], so a dense, fibrous root system with many small diameter roots is better for slope stability [18,70,71]. Given the similar biomass abundance in upper soil layers and prevalence of small diameter roots, fescue may be a better choice than juniper if only surface stabilization is needed. However, the root biomass analysis (Figure 11) demonstrated that vetiver is clearly better for deeper slope stabilization due to the greater abundance of deep root biomass. Additional work is needed to directly measure the impact of these roots on the strength of the slopes.

#### *3.4. Results Summary*

This study addressed four research objectives. The first objective was to select vegetation types that may provide erosion control and slope stabilization for priority sites and compare them to the current management practice of planting turf grass from seed. Five vegetation types were tested—deep-rooted (vetiver) grass, woody shrubs, perennial legume, fern, and native prairie grass—and were compared to fescue grass grown from seed. The second objective was to determine which species establish and grow well on moderately steep roadside slopes in the Piedmont region of the southeast USA. Of the species tested, vetiver grass and juniper shrubs grew well (Figure 4). The third objective was to compare surface erosion rates from the experimental plots. Juniper and vetiver both had slightly lower sediment yield than fescue grass when sediment yield from the whole plot was considered (Figure 6). Erosion pins indicated that there was more spatial variability in erosion and deposition within the juniper and fescue plots due to uneven vegetation establishment (Figures 7–9). The final objective was to estimate the contribution of each

species to increased slope stability by measuring root biomass and diameter in soil cores. Vetiver grass had substantially higher root biomass than the other species, particularly in deeper soil layers (Figure 11).

#### **4. Discussion**

Vetiver grass and juniper shrubs both established well on the experimental slopes and present possible alternatives to seeding with turf grasses such as fescue on highpriority sites. The failure of the perennial legume (hairy vetch) and maidenhair fern demonstrate the importance of weed resistance in species used for erosion control and slope stabilization in the study region. Vetiver showed the best weed resistance of the species tested while juniper may be suitable if a weed resistant species is planted in the interspace between plants.

The plot containing juniper had the lowest runoff amounts and sediment yield based on the runoff collection with TSS method (Figures 5 and 6). The erosion pin data suggest that this is because deposition is occurring at the base of the plot, as measured by the downslope pins (Figures 7–9), where aboveground vegetation establishment was strong (Figure 4). This suggests that the vegetation near the bottom of the plot was slowing runoff and allowing for deposition. If runoff was ponding as it slowed, it could increase infiltration [14], causing the lower runoff volumes observed. The highest rates of erosion were observed on the upslope pins in the juniper plot (Figure 10a), but the plot had the lowest total sediment yield (Figure 6). This demonstrates the high potential of juniper to create a barrier to sediment movement with good establishment.

In the other plots, with more even vegetation establishment, erosion and deposition rates were more consistent across the plot. This agrees with previous studies that emphasized the importance of good vegetation establishment in preventing erosion [41,42]. This study used two methods to compare surface erosion: runoff collection with TSS, which measures total sediment yield, and erosion pins, which measure the spatial distribution of erosion and deposition. The methods proved to be complimentary, as information on the spatial pattern of erosion was helpful in relating vegetation establishment to observed runoff and sediment yield. We did not directly quantify the effects of vegetation density on overland flow velocities or patterns and this remains an important topic for future research [68,72–75].

Despite the issues with establishment, the overall erosion rates observed in this study were within the limit of 2–7 mm/yr. This is much lower than a previous study conducted on a completely bare steep slope which had erosion rates of 20 mm/yr [62]. This study, which was conducted over a 10-year period, also recommended longer monitoring duration than was possible in the current study for erosion pins. Overall, the changes observed in the erosion pin heights were very small relative to the measurement precision (± 1 mm) of a manual ruler. Longer monitoring until higher erosion or deposition values are observed may allow for more robust statistical analysis. Another difference with studies on bare slopes was in the pattern of erosion. On bare slopes, higher erosion rates are typically observed near the bottom of the slope, because that is where sheet flow velocities are highest [76]. The vetiver plot showed this same pattern. The pattern of higher erosion rates in the upslope pins observed for this study in the juniper and fescue (Figure 10b) was also found in a large study of vegetated streambanks slopes using erosion pins [77].

Slope stability also depends on root biomass, diameter distribution, and architecture [8]. Vetiver grass added substantially more belowground biomass than the other species tested (Figure 11). In general, fine roots (roots < 3 mm in diameter) are considered more important to soil stabilization than coarse roots [18]. Most of the biomass sampled from the plots, including vetiver, would be considered fine roots. While juniper and fescue had a slightly higher abundance of small diameter roots in upper soil layers, the deeper biomass of vetiver is key for increasing soil strength [20,24]. A previous study of vetiver grass in Brazil found similarly high levels of root biomass [78]. Based on the outcomes of this study, vetiver grass is a promising alternative to the current practice of seeding with

turf grass that should be explored by managers. This study is one of the first to examine the use of vetiver grass in Alabama and the Piedmont region of the southern USA. and so this strong establishment and growth should be encouraging to those considering using vetiver. Future research should also measure additional factors that can influence slope failures, such as soil moisture and soil organic matter content [79–81]. Given the weak predictive power of daily rainfall as a predictor of change in erosion pin height, future studies should also measure sub-daily rainfall for the calculation of rainfall intensity as this is a better predictor of erosion than rainfall depth [82].

The performance of a vegetative cover must be weighed against practical concerns, such as ease and cost of planting and maintenance requirements. Vetiver is a non-native species that must be planted from slips that have been sterilized so they will not produce seeds. This is more costly and labor intensive than planting from seed. Future work should consider other native grasses, such as switchgrass, that are deep-rooted and can be planted from seed. However, to grow vegetations from seed, seeds need to be planted during the season recommended and do require some care, such as watering and reseeding of bare patches. Future studies could also consider applying seeded biochemical solutions such as those used by [43] to combine temporary biochemical surface stabilization with the benefits of native grasses.

#### **5. Conclusions**

This study compared several vegetation types for erosion control and slope stabilization on roadside slopes in the Piedmont region of Alabama. The focus was on deep-rooted species that could be an alternative to planting turf grass from seed, the prevailing slope management practice in the region. The test plots were established on a relatively steep cut slope (25–30◦) at the NCAT Test Track in Opelika, AL, USA. The response of the plots was monitored for over two years using erosion pins and TSS measurements.

Vetiver grass and juniper shrubs established well. Juniper and vetiver both had slightly lower sediment yield than fescue grass when sediment yield from the whole plot was considered. Erosion pins indicated that there was more spatial variability in erosion and deposition within the juniper and fescue plots, likely due to uneven vegetation establishment. Selecting a species with strong and even establishment is important to preventing surface erosion. Vetiver grass had more abundant and deeper root biomass than the other species in the study, suggesting it will be best for slope stabilization. This was the first study to test vetiver grass in the Piedmont region of the southeast and demonstrates that it is a promising option for slope stabilization. Future work is needed to directly measure the impact of vetiver roots on slope stability and to investigate effects of vegetation density and planting arrangement.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/land11101739/s1, Table S1: Water Level Logger Dataset; Table S2: Runoff Collection and TSS Dataset; Table S3: Erosion Pin Dataset; Table S4: Root Analysis Dataset.

**Author Contributions:** Conceptualization, F.C.O. and J.M.; methodology, H.A., V.N., F.C.O. and J.M.; formal analysis, H.A. and V.N.; investigation, H.A. and V.N.; resources, F.C.O. and J.M.; data curation, H.A. and V.N.; writing—original draft preparation, H.A., V.N. and F.C.O.; writing—review and editing, H.A., V.N. and J.M.; visualization, H.A. and V.N.; supervision, F.C.O.; project administration, F.C.O. and J.M.; funding acquisition, F.C.O. and J.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** Financial support for this work was provided by the Auburn University Highway Research Center.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available in the Supplementary Materials as Tables S1–S4.

**Acknowledgments:** The authors thank M. Kiernan, J.R. Ellis, H. Xiao, Z. Hilliard-Shepherd, J. Anderson, C. Barrie, L. Rahimikhameneh and V. Aguilar for assistance with field and lab work. Michael Perez, Guy Savage, and the staff of the Auburn University Erosion and Sediment Control Test Facility assisted with site selection, preparation, and maintenance. Jason Nelson and the National Center for Asphalt Technology provided site access and coordination.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Siyu Tang, Chong Du \* and Tangzhe Nie**

School of Hydraulic and Electric Power, Heilongjiang University, Harbin 150080, China; 2201873@s.hlju.edu.cn (S.T.); 2019036@hlju.edu.cn (T.N.)

**\*** Correspondence: duchong@hlju.edu.cn

**Abstract:** Sentinel-2A multi-spectral remote sensing image data underwent high-efficiency differential processing to extract spectral information, which was then matched to soil organic matter (SOM) laboratory test values from field samples. From this, multiple-linear stepwise regression (MLSR) and partial least square (PLSR) models were established based on a differential algorithm for surface SOM modeling. The original spectra were subjected to basic transformations with first- and secondderivative processing. MLSR and PLSR models were established based on these methods and the measured values, respectively. The results show that Sentinel-2A remote sensing imagery and SOM content correlated in some bands. The correlation between the spectral value and SOM content was significantly improved after mathematical transformation, especially square-root transformation. After differential processing, the multi-band model had better predictive ability (based on fitting accuracy) than single-band and unprocessed multi-band models. The MLSR and PLSR models of SOM had good prediction functionality. The reciprocal logarithm first-order differential MLSR regression model had the best prediction and inversion results (i.e., most consistent with the realworld data). The MLSR model is more stable and reliable for monitoring SOM content, and provides a feasible method and reference for SOM content-mapping of the study area.

**Keywords:** soil organic matter; Sentinel-2A; remote sensing; differential algorithm; multispectral modeling; PLSR

#### **1. Introduction**

Soil organic matter (SOM), as an extensive component of soil, is an important indicator to measure the fertility, status and degradation degree of cultivated soil [1]. It plays a role in increasing moisture retention and, consequently, the drought tolerance of crops [2,3]. SOM also constitutes a huge organic carbon pool in terrestrial ecosystems [4,5]. It is of great practical significance to estimate the soil organic pool by mastering the spatial distribution information of large-scale soil organic matter content instantaneously [6]. Estimating soil organic matter pools has a significant impact on ecology and sustainable land use in the long term. Precision agriculture and long-term regional land development are aided by timely monitoring of SOM data [7]. Traditional biochemical analysis methods are timeconsuming and labor-intensive, in addition to being ineffective and unsuitable for gathering information such as soil organic matter content over a large expanse [8,9].

Soil-reflection spectroscopy has successfully enabled rapid and cost-effective SOM estimation, assisting in fulfilling regional to global soil evaluation and monitoring requirements [10]. The majority of research has concentrated on soil spectral studies in controlled indoor environments. There are concerns with test parameters, including light-source power, light-source distance, and irradiation angle during the test. The majority of soil samples used in the tests are stabilized soils. As a result, the spectral data is more unclear, making it harder to share the results of known inverse models of soil characteristics. The advantages of remote sensing technology include a vast coverage area for ground object

**Citation:** Tang, S.; Du, C.; Nie, T. Inversion Estimation of Soil Organic Matter in Songnen Plain Based on Multispectral Analysis. *Land* **2022**, *11*, 608. https://doi.org/10.3390/ land11050608

Academic Editors: Chiara Piccini and Rosa Francaviglia

Received: 31 March 2022 Accepted: 19 April 2022 Published: 21 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

information, as well as periodicity, currency, precision, and reliability [11]. The quantitative description of soil organic matter by remote sensing technology has always been a research hotspot of many scholars. According to research, the spectral characteristics of SOM are primarily reflected in the absorption of incident light energy by organic matter, and soil reflectance decreases as organic matter content increases [12,13].

There are still image factors, such as mixed pixels, water content, and spectral resolution in remote-sensing data, that must be taken into consideration for soil organic matter analysis. Therefore, the theoretical underpinning of using existing remote-sensing data for SOM mapping involves extracting adequate information from soil spectral data and generating a soil spectral index [14]. Researchers have made some progress in related fields. The application of traditional spectral index inversion theory was used to improve the estimation model's accuracy. Spectral indices could quantify the interrelationships between the SOM's characteristic bands utilizing spectral indices, enhancing weak band connections and reducing model complexity. Preprocessing transformations used to remove imagespecific reflectance include soil moisture and particle size to transform soil spectral data, remove signal noise, and highlight features for quantitative model estimation [15]. The derivative algorithm is one of the common preprocessing transformations that reduces spectrum interference by eliminating baseline drift and improving spectral resolution, resulting in increased separation of overlapping peaks and less spectral interference [16]. Although a large number of studies have been carried out in related fields using differential spectral technology [17], relatively few studies have explored the predictive ability in monitoring soil nutrient content.

Previous studies have demonstrated that spectral preprocessing is an important component of multivariate modeling analysis and would improve the predictive performance of models [18–21]. A prediction model based on soil spectral information can effectively and rapidly estimate soil physical and chemical parameters. MLSR (multiple linear stepwise regression) has been developed on the basis of multiple linear regression. Considering the advantage of avoiding collinearity, MLSR has been used to develop models that estimate soil properties [22,23]. The regression equation was introduced using stepwise regression based on the effect, significance or contribution rate of global independent variables on the dependent variable. A linear regression model generates predictions about the dependent variable by eliminating independent variables that are not important to the dependent variable. PLS (partial least squares) is a widely used linear multivariate regression method in the field of soil spectroscopy [18,24,25]. PLS was more accurate than principal component regression or multiple linear regression in predictions of soil salinization using soil conductance in the semi-arid region of Brazil [26]. In PLS, the correlation between principal components is relatively insignificant, while the correlation with the dependent variable is the largest. At the same time, PLS can overcome the strong interpretation of independent variables by principal component analysis. It can effectively extract the comprehensive variables with substantial explanatory power to the system and improve the estimation ability of the model. Therefore, mathematical models are conducive to relate reflectance spectra to SOM content to predict soil nutrients [27,28].

Sentinel-2A is a high-resolution multispectral imaging satellite that covers 13 spectral bands. The bands vary in wavelength from 433 to 2280 nm, including ten bands in the visible near-infrared spectrum and three in the short infrared band. Sentinel-2A has an imaging bandwidth of 290 km, a spectral resolution of 15–180 nm and spatial resolutions including 10, 20 and 60 m. Compared with Landsat TM and other remote-sensing images, Sentinel-2A remote-sensing imaging has higher spectral and spatial resolutions and a shorter revisit cycle (the cycle is five days); it is primarily utilized in global ecological environment monitoring [29]. Morteza Sadeghi investigated soil moisture approaches using Sentinel-2 and Landsat-8 satellites and discovered that Sentinel-2 was more appropriate for the task [30]. Sentinel-2 is suitable for monitoring and mapping soil organic matter, but not soil texture (clay, silt, and sand content) [31]. Qi Gao presented two methodologies for the

retrieval of soil moisture from remotely sensed SAR images, with a spatial resolution of 100 m [32]. However, few studies have used Sentinel-2A imagery to monitor soil nutrients.

Given the enormous range of remote-sensing imagery available due to periodic updates, a comprehensive grasp of image characteristics is critical when selecting which image to employ. Different researchers come to different conclusions in terms of the reflectance band and estimation model used to calculate organic matter content [33,34]. In order to comprehensively understand the prediction ability and feasibility of differential spectroscopy in soil nutrient content, the trial used differential processing of the high-resolution Sentinel-2A spectral data based on mathematical transformations to develop models to predict soil organic matter content in the study area.

On the basis of the above, the research aims to construct a SOM evaluation model based on spectral indices and compare the prediction accuracy of different methods in the study region, using Sentinel-2A remote-sensing images as the data source and measured test data of SOM content as analysis data. The objective of this research is: (1) to investigate the use of Sentinel-2A remote-sensing images as a reference for estimating soil organic matter; (2) to analyze the correlation of mathematical transformation (reciprocal, reciprocal logarithm, square root, and square and cubic transformation) with the first- and secondorder differential of reflectance and SOM; (3) to construct single-band and multi-band MLSR and PLSR inversion models and evaluate the spectral indices and model performance in SOM estimation.

#### **2. Materials and Methods**

#### *2.1. Experimental Site*

The study area, Daqing, is located in the southwest of Heilongjiang Province in northeast China (45◦46 –46◦55 N and 124◦19 –125◦12 E). The region is located in the middle of the Songnen Plain, a Mesozoic subsidence area with a flat, slightly undulating, small ground slope. The landform in the region gradually declines from north to south and is generally plain, with a relative height difference of 10 to 35 m. It connects with the Suihua area in the east, faces Jilin Province (Songhua River) in the south, and borders the city of Qiqihar in the west and north. Winters are cold and snowy, whereas spring and autumn monsoons are more humid. The frost-free season lasts only a few weeks each year. With mean annual precipitation of 427.5 mm, a mean annual temperature of 4.2 ◦C, and a mean evaporation amount of 1635 mm; rain and heat are in the same season, which is beneficial for crop and forage grass growth. The cultivated land area of Daqing accounts for roughly 20% of the whole area and consists of an established farming industry (Figure 1).

**Figure 1.** Distribution area diagram of the soil sampling area.

#### *2.2. Soil Sampling*

In this study, soil sampling was conducted in Daqing in July 2021, and 19 soil samples were randomly obtained. Five surface soil (soil depth of 20 cm) samples were collected and mixed within a 1 m radius of a specific sampling location, and approximately 500 g of soil per sampling site from the mixed models was used for chemical analysis. The actual longitudes and latitudes of the samples were recorded using a global positioning system (GPS) at the time of field sampling in order to obtain the reflectivity of the sampling point in the remote-sensing image. In the laboratory, all samples were air-dried and ground to pass through a 2-mm sieve to remove impurities such as gravel and animal and plant residues.

Chemical analysis was used to determine the SOM content. The concentrations of all soil samples from each sample point were measured through the potassium dichromate method [35]. The SOM content varied from 13.42 to 22.04 g kg−1. The coefficient of variation (CV) was 0.15, indicating that SOM showed medium variability across all samples (i.e., 0.1 CV 1.0). To ensure the rationality of model establishment and validation, the data were randomly divided into 14 prediction sites and 5 validation sites. (Table 1).

**Table 1.** Descriptive statistics of SOM (g/kg).


Notes: SD, standard deviation; CV, coefficient of variation.

#### *2.3. Remote-Sensing Image Processing*

Based on region size, we selected six Sentinel-2A images in July 2021 for the experiment. Ten visible bands (B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12), which are near-infrared and short-wave infrared bands with a resolution of 10 m, were selected from the images (Table 2). The preprocessing of remote-sensing images mainly included radiometric correction, atmospheric correction, geometric correction, image mosaic, and image clipping. Sentinel-2A remote-sensing images were processed with Sen2cor software for radiometric and atmospheric correction, an ESA plug-in dedicated to the creation of L2A level data that is used to reduce radiometric inaccuracies caused by atmospheric influence and to invert the true surface reflectance of objects. Compared with the typical atmospheric correction software (SMAC and 6S), Sen2cor operation is more straightforward, without human input parameters. The geometric correction of Sentinel-2A remote-sensing imaging was completed with ENVI software, and the geometric correction error was less than 1 pixel. The boundary of the research area was classified in ArcGIS software, and remote-sensing images were clipped and arranged as a mosaic. Remote-sensing images of the original research region were obtained by clipping remote-sensing image data of six scenes with vector boundary data from the research area.

To properly manipulate the data from the sample points, a vector map of the administrative divisions of Daqing was obtained using the BIGEMAP map loader. Furthermore, ArcMap and ArcGIS software were used to complete the longitude and latitude distribution map of sampling points by clicking add data and the directory window. Then ArcMap and ENVI were used in combination to extract Sentinel-2A images for each sampling point corresponding to a DN (digital number) value of each band.


**Table 2.** The relevant parameters of Sentinel-2A.

#### *2.4. Statistical Modeling*

#### 2.4.1. Differential Algorithm

Differential spectral technology, which is a common spectrum processing approach, can effectively dig spectral effective information and provide better resolution than the original spectral reflectance. It also improves the correlation between spectral data and soil parameters, allowing for better monitoring of progress in soil nutrient content research and improved prediction accuracy. The reference formula is as follows [36].

The first derivative can be described as:

$$FDR\_{\left(\lambda\right)} = \left[ R\_{\left(\lambda\_{i+1}\right)} - R\_{\left(\lambda\_{i}\right)} \right] / \left[ \lambda\_{i+1} - \lambda\_{i} \right] \tag{1}$$

The second derivative (*SDR*) can be described as:

$$SDR\_{\left(\lambda\right)} = \left[\mathbb{R}'\_{\left(\lambda\_{i+1}\right)} - \mathbb{R}'\_{\left(\lambda\_{i}\right)}\right] / \left[\lambda\_{i+1} - \lambda\_{i}\right] \tag{2}$$

where *λ<sup>i</sup>* is the wavelength of the *i*-th band, *R*(*λi*+1), *R*(*λi*) are the reflectance at bands *λi*+1, *λi*, and *R* (*λi*+1), *R* (*λi*) are the first derivative at bands *λi*+1, *λi*, respectively.

#### 2.4.2. Multiple Linear Stepwise Regression

MLSR is mainly a comparative analysis of all independent variables according to influence or contribution size to all dependent variables through the F test [23]. Variables significant by the sum of squares are selected for the regression equation. Only one variable is introduced for each step. When a variable is introduced, the partial regression sum of squares of each variable is then tested. If the introduced variable is found to be insignificant, it is removed from the partial regression equation. If more than two variables are introduced in successive steps, it is determined whether or not any existing variables can be removed. Further, when no independent variables can be eliminated, a new independent variable with significant influence is selected for evaluation. This process is repeated until none of the introduced variables can be removed. The original independent variable is also tested, and the gradual regression equation ends.

The formula of the gradual regression equation is:

$$SOM = a\_0 + \sum\_{i=1}^{n} a\_i \mathbb{R}\_{\lambda\_i} \tag{3}$$

where *a*0, *a*<sup>1</sup> = 1, *n* is the regression coefficient, *i* is the number of bands used for modeling, *λ<sup>i</sup>* is the wavelength of the ith modeling band, and *Rλ<sup>i</sup>* is the reflectance value at wavelength *λi*.

#### 2.4.3. Partial Least Squares Regression

PLSR adopts the idea of extracting principal components from principal component analysis, which can simplify the data structure [25]. There are *p* dependent variables and *m* independent variables considered. The basic practice is to extract the first component *xi* in the independent variable set and the first component *ui* in the dependent variable set, requiring maximum correlation between *xi* and *ui*. The regression of the dependent variable with *xi* is then established, and the algorithm is terminated until the equation reaches satisfactory accuracy. Otherwise, the extraction of the second pair component continues to achieve satisfactory accuracy. If the *n* components are finally extracted from the independent variable set from the independent variable set, the partial least squares regression will establish the regression equation between the dependent variable and *x*1, *x*<sup>2</sup> , ... , *xn*. This represents the regression equation between the dependent variable and the original independent variable: the partial least squares regression equation. In PLS calibration, significant wavelengths can be assessed on the basis of variable important in projection (VIP), If the VIP score of a specific wavelength exceeds 1, then the wavelength is considered important [37,38].

#### *2.5. Construction of Spectral Indexes*

SOM exhibits unique spectral response properties in visible and near-infrared bands, and the soil spectral reflectivity and SOM content are generally significantly negatively correlated [39,40]. The increase and decrease of SOM content can be reflected from the soil reflection spectrum to a certain extent. The determination of soil spectral reflectance becomes a novel approach to assessing SOM content due to the particular response relationship. Furthermore, the soil spectrum and SOM content show a nonlinear variation caused by the interaction of soil structure and the spectral measurement environment, making the absorption belt and reflection belt of the spectral curve not visible. On the other hand, low-order (first-order, second-order) differential transformation of the spectrum is less sensitive to noise, eliminating some of the background and noise influence and improving the correlation between spectral data and organic matter content.

Therefore, the spectral data are processed by conventional mathematical transformation and differential processing to increase sensitivity to the SOM content of the spectral index. The original spectrum was subjected to six different types of traditional mathematical transformations and respective first and second derivatives. Spectral characteristic indicators were screened using the Pearson correlation analysis method. SOM content measured in the laboratory is the dependent variable of the function, with the characteristic spectral index as the independent variable. The model was constructed between SOM content and the transformed spectral data of the reflection spectrum. The correlation between SOM content and the reflectance of remotely sensed images was analyzed in SPSS. The correlation coefficient was calculated by Formula (4):

$$r = \sum\_{i=1}^{n} (\mathbf{x}\_i - \overline{\mathbf{x}})(y\_i - \overline{\mathbf{y}}) / \sqrt{\sum\_{i=1}^{n} (\mathbf{x}\_i - \overline{\mathbf{x}})^2} \sqrt{\sum\_{i=1}^{n} (y\_i - \overline{\mathbf{y}})^2} \tag{4}$$

where *r* is the correlation coefficient of SOM and reflectence, and *xi* and *x* are the measured value and mean value of reflectivity, respectively; *yi* and *y* are the measured value and mean value of SOM content, respectively. When *r* > 0, reflectivity is positively correlated with SOM, and when *r* < 0, reflectivity is negatively correlated with SOM. The closer r is to 1, the more stable the model is and the better the fit is [41].

The prediction of SOM model stability is measured by the determination coefficient *R*2; the larger the *R*2, the more stable the model; the accuracy is tested by RMSE. The smaller the RMSE, the higher the model accuracy [42,43].

The calculation formula is shown in (5) and (6):

$$R^2 = \sum\_{i=1}^{u} \left( y\_i - \mathfrak{z}\_i \right)^2 / \sum\_{i=1}^{n} \left( y\_i - \overline{y} \right)^2 \tag{5}$$

$$RMSE = 1/n \sum\_{i=1}^{n} \left( y\_i - \mathcal{y}\_i \right)^2 \tag{6}$$

where *y*ˆ*<sup>i</sup>* indicates the values estimated by the model; *yi* indicates the measured values; *y* indicates the average of the measured values; and *n* is the number of observations of the variable to be modelled.

#### *2.6. Flow Chart*

Figure 2 shows the flowchart of the research to estimate the model between SOM content and spectral index with differential transformations.

**Figure 2.** Conducting SOM prediction models.

#### **3. Results and Analysis**

*3.1. Differential Analysis of the Multispectral Data*

The first-order differential and second-order differential are processed by IDL software, with a remote-sensing third band image as an example (Figure 3). The image can better express the real situation of the object, and the first-order differential image better distinguishes the water body from the soil. Raw remote-sensing images contain much information, including noise, which can be excluded by differential image processing of remote-sensing images. However, the meaning of the information in the differential processing image cannot be seen directly, so it needs to be further analyzed based on actual data.

**Figure 3.** Derivative processing of remote image. (**a**) Original remote-sensing image; (**b**) First derivative processing of remote-sensing image; (**c**) Second derivative processing of remote-sensing image.

#### *3.2. Correlation between SOM Content and Spectral Metrics*

The remote-sensing estimation and inversion of site parameters are based on the relationship between remote-sensing data and site parameters. The correlation between multispectral reflectance data and measured SOM data was analyzed in SPSS to clarify the relationship and to find the spectral information sensitive to SOM content. All original remote-sensing bands exhibited a high degree of correlation (Table 3). According to the first-derivative image data, B3 and B4 have a higher correlation to SOM (Table 4), whereas the overall correlation was relatively low in the second-derivative image data (Table 5). In general, many bands have a high correlation with the original remote-sensing data, and the correlation value of the band is relatively large.


**Table 3.** The correlation between original image data and SOM.

Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.

Through the correlation table of each dataset and SOM content, the correlation of each band of original image data is more significant than 0.6, with B5 and B6 reaching 0.8. However, the first-derivative image data was less-associated with SOM content, with only 0.7 in the B3 band. No sensitive band exists with the SOM after the second-derivative image data. The correlation was significantly reduced compared to the original data in the differential image. According to the aforementioned relationship, the differential processing single-band model results in significant spectral information loss, and the relationship between multispectral data and SOM analysis is not ideal.


**Table 4.** The correlation between first-derivative image data and SOM.

Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.


**Table 5.** The correlation between second-derivative image data and SOM.

Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.

#### *3.3. Single-Band Inversion Model*

The spectral reflectance of the different variation processing was correlated with SOM content to determine the sensitive bands according to the magnitude of the correlation coefficient using SPSS software (Figure 4). SOM exhibited significant spectral response properties in the visible and near-infrared wavelengths and was negatively correlated with spectral reflectance in Sentinel-2A remote-sensing images. The correlation coefficient of the original spectral reflectance peaked at around 740 nm (*r* = −0.854, *p* < 0.001). The fifth and sixth bands had remarkable correlation coefficients for the transformed converted spectral index. The correlation between the spectral index of the first-order differential (*R* ) and SOM content was significant in the third waveband, with the weakest correlation coefficient (*r* = −0.770, *p* < 0.05). Differential processing significantly reduces the correlation compared to the other forms of the spectral index, which were linked considerably with SOM (*|r|* > 0.800, *p* < 0.001). Square root processing (*R*1/2) showed the most-significant correlation occurring at 705 nm (*r* = −0.858, *p* < 0.001).

**Figure 4.** Correlation statistics of organic matter content and reflectivity (*R* represents reflectance spectra, 1/*R* represents inverted transformation, e*<sup>R</sup>* is exponential transformation, log*R*−<sup>1</sup> is logarithm the reciprocal of reflectance, *R*1/2 is square-root transformation, *R*<sup>2</sup> is square transformation, *R*<sup>3</sup> is cubic transformation).

The bands with a correlation *r* > 0.5 with SOM were employed as independent variables to develop separate SOM prediction models, while the measured SOM contents were used as the dependent variable (Table 6). The correlation coefficient of calibration determination (*R*<sup>2</sup> *<sup>c</sup>* ) and root mean square error of calibration (RMSEC) were used as an assessment indicator. The prediction coefficient of determination (*R*<sup>2</sup> *<sup>p</sup>*) and the root mean square error of the validation (RMSEP) set were used to assess accuracy of the final model. The single-band model based on the original spectrum's sensitive band reflectance and SOM produced satisfactory accuracy. In addition, the model with square root processing (*R*1/2) had the best modeling efficiency (*R*<sup>2</sup> *<sup>c</sup>* = 0.74, RMSEC = 1.50), but in validation sets performed poorly (*R*<sup>2</sup> *<sup>p</sup>* = 0.69, RMSEP = 1.31). The single-band model based on the *R* and 1/*R* processed spectra with SOM had inferior modeling (*R*<sup>2</sup> *<sup>c</sup>* = 0.60 and 0.61, RMSEP = 1.86 and 1.75, respectively). The other five models (1/*R*, log *R*−1, *R*1/2, *R*2, *R*3) all showed stronger modeling (*R*<sup>2</sup> *<sup>c</sup>* > 0.68, RMSEP < 1.66). Compared to the original reflectivity, the *R* transformation showed the best prediction, with an increase in *R*<sup>2</sup> of 0.03 (*R*<sup>2</sup> *<sup>p</sup>* = 0.82), but showed extreme uncertainty (RMSECP = 3.72). The modeling of the organic matter single-band model was enhanced, but not dramatically, by spectral processing. Preprocessing of the original spectrum was used in the best suitable model utilizing the single-band (*R*<sup>2</sup> *<sup>p</sup>* = 0.79, RMSEP = 2.18).

**Table 6.** Single-band inversion model of soil organic matter content based on spectral index.


Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.

#### *3.4. Performance of Multiple Linear Stepwise Regression*

MLSR models were utilized to evaluate the correlation between SOM content and spectral index, referring to the results of the single-band correlation analysis. Table 7 showed the sensitive band combinations used in the regression analysis. A variable variance significance level of 0.05 was set as the criterion for variable selection and exclusion. The maximum variance invasion factor (VIF) of each spectral band was less than 10, indicating no multicollinearity between bands. By comparing model accuracy, six better models were selected.

The band reflectance in the basic mathematical transformation was excluded as an opt-in variable to conduct the MLSR model, except for the raw spectra. However, the modeling is well-based on the differential treatment of mathematical transformations (Figure 5). The multi-band model demonstrated better predictive results than the singleband model in terms of accuracy and stability. Predictions of raw spectral data under MLSR models outperformed all single-band models (*R*<sup>2</sup> *<sup>c</sup>* = 0.78, RMSEC = 1.32) in the calibration set. The raw and first-order differential processing (*R* ) of the spectral indices resulted in a significantly improved model, but the effect of validation was poor and unstable (*R*<sup>2</sup> *<sup>p</sup>* = 0.16 and 0.55, respectively). The MLSR models based on (1/*R*) and (1/*R*)" showed better performance than the original spectrum. The inverse first-order differential (1/*R*) verification set, on the other hand, was poor at predicting and hence was not taken into account (*R*<sup>2</sup> *<sup>p</sup>* = 0.37). The MLSR model constructed by (log*R*<sup>−</sup>1) and (log*R*<sup>−</sup>1)" were slightly less effective than the original spectrum modeling in terms of modeling performance

(*R*<sup>2</sup> *<sup>c</sup>* = 0.71 and 0.69, RMSEC = 1.51 and 1.71, respectively), but the accuracy and stability of validation sets were significantly improved (*R*<sup>2</sup> *<sup>p</sup>* = 0.93 and 0.76). Overall, the second-order differential (1/*R*)" model performed the best (*R*<sup>2</sup> *<sup>c</sup>* = 0.91, RMSEPC = 1.54). However, the validation model performed a little worse (*R*<sup>2</sup> *<sup>P</sup>* = 0.84, RMSEP = 1.23), but was considered to be unsuccessful at prediction. The validation of the model based on (log*R*<sup>−</sup>1) performed incredibly well, with the *R*<sup>2</sup> improving by 0.09 and the RMSE reducing by 0.43 compared to the (1/*R*)" transformation, even though it did not get the best match and model stability (*R*<sup>2</sup> *<sup>c</sup>* = 0.71, RMSEC = 1.51, *R*<sup>2</sup> *<sup>p</sup>* = 0.93, RMSEP = 1.11). The pre-processing method of the reciprocal logarithm first-order differential spectrum was used in the best suitable model utilizing the MLSR.


**Table 7.** Multi-band inversion model of soil organic matter content based on spectral index.

Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.

#### *3.5. Performance of Partial Least Square Regression*

PLSR was established for the eighteen spectral transformations using the Unscrambler software. The results showed that the number of factors obtained by PLSR analysis varies considerably. We performed a full cross-validation before establishing a predictive model. Cross-validation is a method of predicting how well a model will fit the hypothesis validation set. The number of PLSR factors based on the original reflectance was three, increasing to nine after reciprocal transformation. The correlation coefficient of cross-validation determination (*R*<sup>2</sup> *cv*) and root mean square error of cross-validation (RMSECV) were used as assessment metrics to optimize various spectrum post approaches. *R*<sup>2</sup> *<sup>p</sup>* and RMSEP were used to assess the final effect [22,44]. PLSR regression models based on mathematical transformation with differentiation gave disappointing outcomes.

Among the basic processing, the *R*<sup>2</sup> model showed the most prediction accuracy in cross-validation (*R*<sup>2</sup> *cv* = 69, RMSECV = 1.62) and in independent validation (RMSEP = 11.93), in which the prediction stability was not reliable (Table 8). Similarly, the PLSR model based on 1/*R* method also performed well in cross-validation (*R*<sup>2</sup> *cv* = 0.68, RMSECV = 1.65) but had poor accuracy in independent validation (*R*<sup>2</sup> *<sup>p</sup>* = 0.51). Prediction was worst using the *R*<sup>3</sup> method (*R*<sup>2</sup> *cv* = 0.50, RMSECV = 2.06). A PLSR model based on log*R*−<sup>1</sup> yielded the best prediction results (*R*<sup>2</sup> *cv* = 0.66, RMSECV = 1.70, *R*<sup>2</sup> *<sup>p</sup>* = 0.79, RMSEP = 1.55).

For the first-order derivative processing method, the accuracy and stability were reduced to varying degrees compared to the basic processing (Table 9). The prediction when using the (1/*R*) method was best in cross-validation (*R*<sup>2</sup> *cv* = 0.65, RMSECV = 1.8) and in independent validation (though with poor prediction stability) (*R*<sup>2</sup> *<sup>p</sup>* = 0.67, RMSEP = 0.84). In terms of the prediction results of the six calibration models, the (log*R*<sup>−</sup>1) model validations performed the best (*R*<sup>2</sup> *cv* = 0.62, RMSECV = 1.80, *R*<sup>2</sup> *<sup>p</sup>* = 0.90, RMSEP = 0.51). Furthermore, PLSR regression models based on second-order derivative processing against SOM content had no practical significance (*R*<sup>2</sup> *cv* < 0.60). The model based on (log*R*−1)" performed best among the processed sets (*R*<sup>2</sup> *cv* = 0.57, RMSECV = 1.92, *R*<sup>2</sup> *<sup>p</sup>* = 0.83, RMSEP = 1.28), suggesting that the PLSR model was not suitable for estimating the organic matter content of the region (Table 10).

**Figure 5.** Model validation. (**a**) Original multi-band measured value; (**b**) First derivative of the multiband measured value; (**c**) First derivative of reciprocal multi-band measured value; (**d**) Second derivative of reciprocal multi-band measured value; (**e**) First derivative of the reciprocal logarithm multi-band measured value; (**f**) Second derivative of the reciprocal logarithm multi-band measured value.


**Table 8.** Results of partial least squares regression analysis of the original spectral data and soil organic matter content.

Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.

**Table 9.** Results of partial least squares regression analysis of first-derivate transformation and soil organic matter content.


Notes: \* and \*\* represent the confidence level at 0.05 and 0.001, respectively.

**Table 10.** Results of partial least squares regression analysis of second-derivate transformation and soil organic matter content.


Notes: \* represents the confidence level at 0.05.

The PLSR model under the reciprocal logarithm first-order differential spectrum transformation showed the best correlation with SOM among the three methods. The information above indicates that 1/*R* transformation performed satisfactorily in data representing spectral features and quantitative inversion models. The results of multi spectral model conducted by PLS show that the prediction capability of PLSR modeling is high using different spectral pretreatment methods.

#### *3.6. Spatial Pattern Analysis of Soil Organic Matter Content*

The accuracy of the above single-band, MLSR and PLSR inversion models was investigated. The models created by MLSR and PLSR both had satisfactory prediction performance. The accuracy using PLSR is higher than that of MLSR, with excellent prediction. According to the validation sample detection model, the accuracy and stability of the MLSR model are relatively stable. This shows that MLSR regression is more stable and meets the application needs in predicting the SOM content in Daqing. Reciprocal logarithm first-order differential by MLSR regression model is the optimal model. The SOM content-inversion model based on the Sentinel-2A image spectral index was selected

to invert and map the SOM content in the study area to obtain the SOM spatial distribution in Daqing (Figure 6).

**Figure 6.** Inversion result of model.

#### **4. Discussion**

The inversion results are in line with the actual spatial distribution of SOM. The SOM content in the study area was generally low and uneven, with large spatial differences. The content was gradually distributed from northeast to southwest, and the SOM content in the northeast was generally higher than that in the southwest. According to the soil field survey, surface runoff accumulates in low-lying areas to form intermittent and permanent puddles due to concentrated precipitation. Poor drainage accelerates the process of salt accumulation, and the complex micro-topography causes uneven evaporation of soil water. The salt above the micro-topography is aggravated by strong evaporation and rainfall and irrigation water containing a certain amount of salt flow from high places to places in areas with poor soil permeability. Soil salinization occurs in low-lying areas after fraction evaporation. In addition, the Daqing area is located in a seasonally frozen soil area, and the freezing and thawing of the soil promotes the accumulation of salt. These aspects combined can lead to serious soil salinization. The high salt content of the soil is not conducive to the survival of vegetation, and the small amount of vegetation means that the content of humus is low and is not conducive to the survival of general decomposers [45]. Higher salinity in soil masks the spectral signature of SOM, resulting in a low inversion value of SOM content in the southwest of the study area [46,47].

The above results showed that Sentinel-2A remote-sensing images had a good correlation with SOM content in visible and near-infrared bands. The effect of multi-band modeling is better than that of single-band modeling. It is found that there is a sensitive band in the correlation between the first-order differential data and SOM content. The original remote-sensing data had the highest correlation in Band 6 (*r* = −0.730). In singleband modeling, the correlation between square root transformation and SOM is best, but the modeling is not as good as the original spectrum. The results of differential processing in the PLSR model are not satisfactory, probably because the differentially processed

image eliminates some of the information of the original image. The correlation between reflectance and SOM is improved after basic transformation and differential processing in MLSR models. The single-band model only uses a very small amount of information, while for remote sensing, the data-rich multi-spectral band can only express extremely limited SOM information in a single band, which can easily cause the loss of some key information. The MLSR and PLSR models achieve convincing results, and the MLSR model has better predictive ability (based on fitting accuracy). The PLSR model, based on second-order differential processing, is ideal and relevant in practice. Reciprocal transformation will improve model prediction in MLSR and PLSR regression. Based on the reciprocal logarithm first-order derivative MLSR regression model, the inversion of SOM in the study area was carried out. The inversion result is in accordance with the actual situation, which is suitable for spectral inversion of soil organic matter content in a certain geographical area.

Compared to direct contact, remote sensing has advantages in the estimation of SOM content, such as predicting soil fertility without direct contact with the object of study, forecasting crop yields from visible and near-infrared bands, accessing the information on the surface of the earth more efficiently and affordably, and updating soil databases in many fields. The previous study found that for most places, the forecast accuracy based on high-resolution satellite was satisfactory for SOM content prediction, and mapped the soil organic matter more precisely than the airborne sensors [32]. The univariate model only considers a single variable to participate in the modeling [48], and the current research is mainly aimed at soil and crops. In this study, the single-band model built using the original spectra worked best—the mathematical transformed form reduced the model prediction—indicating the applicability of Sentinel-2A for predictions. The accuracy of SOM estimation using the MLSR model established by simple mathematical transformation and derivative transformation is higher than that of univariate model. Differential transformation is more beneficial to extract sensitive features in the soil spectrum [49]. The multivariable model integrates the features of multiple sensitive bands, alleviates the "multicollinearity" to a large extent, and improves the applicability and stability of the model. In previous studies, there have been inconsistencies between PLSR models and MLSR predictions [22,23,50,51]. The MLSR model possesses better prediction results than the PLSR model in this study, and high soil salinity in the study area may be a significant factor. In future work, we intend to develop the method further, for example, by expanding the number of soil samples, diversifying the soil types, and taking into account soil moisture and microorganisms. The multispectral examination of SOM by Sentinel-2A has received little attention and has not been thoroughly investigated. Future spectral modeling of the SOM could be integrated with different spectrum indices (e.g., salinity indices) to screen high-precision spectral parameters. Furthermore, indoor light sources can be used to generate reflectance spectra and provide a complete set of measurement data to eliminate the weather impact of spectral acquisition.

#### **5. Conclusions**

To maximize the correlation between spectral metrics and soil organic matter content, MLSR and PLSR modeling were applied to establish the SOM content model based on Sentinel-2A remote-sensing images. The effective and predictive capacities of different models, which combined basic transformation with differential processing, were validated. Sentinel-2A remote-sensing images had a good correlation with SOM content in visible and near-infrared bands. MLSR and PLSR models of SOM in the study area were established based on different processing and measured values, respectively. The correlation between SOM content and spectral data was improved by multi-spectral modeling after differential processing. However, the correlation between SOM content and reflectance was reduced after first-order differential, indicating that spectral information was partially lost due to differential treatment, and the relationship between spectral data and SOM was not ideal. Multi-band modeling made superior predictions compared to single-band. SOM content could be well-estimated using MLSR models. The MLSR model is more accurate

and stable than PLSR, verified by the calibration and validation samples. The accuracy of the modeling results is high and can meet research requirements. These findings give a theoretical foundation and technological support for utilizing spectroscopy to estimate soil organic matter concentration, and indicates this method can substitute traditional experimental methods for measuring organic matter, thus enabling a larger scale of longterm monitoring of changes in soil organic matter content. In this study, Sentinel-2A images made it possible to retrieve surface soil organic matter with a high spatial and temporal resolution. For soil ecosystem observations, these prediction models will need to be assessed, optimized, and used more broadly.

**Author Contributions:** Conceptualization, C.D.; methodology, C.D. and S.T.; software, S.T.; validation, T.N.; writing—original draft, S.T.; writing—review and editing, T.N. and S.T.; funding acquisition, C.D. and T.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the basic scientific research business expenses of provincial colleges and universities in Heilongjiang Province of China (grant number 2018-KYYWF-1570).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data in this study is available on request from the corresponding author. As this data is the result of a project related to the remote sensing interpretation of soil organic matter, it is not allowed to be made public unless specifically requested.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

