**Applications of GIS and Remote Sensing in Soil Environment Monitoring**

Editors

**Antonio Ganga Blaz Repe ˇ Mario Elia**

*Editors* Antonio Ganga Department of Architecture, Design and Urban Planning University of Sassari Sassari Italy

Blaz Repe ˇ Geography Department University of Ljubljana Ljubljana Slovenia

Mario Elia Department of Soil, Plant and Food Sciences (DISSPA) University of Bari Bari Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Sustainability* (ISSN 2071-1050) (available at: www.mdpi.com/journal/sustainability/special issues/ AGRSISEM).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

Lastname, A.A.; Lastname, B.B. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-9479-8 (Hbk) ISBN 978-3-0365-9478-1 (PDF) doi.org/10.3390/books978-3-0365-9478-1**

Cover image courtesy of Antonio Ganga

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) license.

## **Contents**


#### **Yingxuan Ma and Nigara Tashpolat**


## **About the Editors**

#### **Antonio Ganga**

Antonio Ganga works as a researcher at the Department of Architecture, Design and Urban Planning at the University of Sassari (ITA). He possesses a Master's degree in Landscape Architecture and a PhD in Pedology, both from the University of Sassari. He deals with spatial analysis in environmental data management. Additional research areas include investigating intervention techniques aimed at improving degraded soils, exploring the potential for urban wastewater and industrial by-products as soil amendments, and conducting integrated ethnopedological surveys to better understand local knowledge of soils. The author previously served as a Visiting Professor at the Faculdade de Ciencia Agron ˆ omicas, Universidade Estadual Paulista (UNESP) ˆ Campus de Botucatu in Brazil, and currently maintains collaborative ties with the institution. He ˆ served as a Guest Speaker at the Faculty of Agronomy, Horticulture, and Bioengineering within the Department of Soil Science and Microbiology at the University of Life Sciences in Poznan, Poland. Moreover, His is a Topical Advisory Panel Member and Guest Editor for the journal *Sustainability*, specifically for the Special Issue entitled "Applications of GIS and Remote Sensing in Soil Environment Monitoring". He is a member of the Directive board of AMFM GIS Italia, (Automated Mapping/Facilities Management/Geographic Information Systems). He is also a member of the Scientific Committee of Asita, which is the Italian Federation of Scientific Associations for Territorial and Environmental Information.

#### **Blaz Repe ˇ**

Blaz Repe is an associate professor at the Geography Department, Faculty of Arts, University ˇ of Ljubljana, and, since 2007, he has been teaching GIS, digital cartography, Soil geography, Biogeography, Geographical decision support, Physical geography and Applicative physical geography. In 2007, he defended his PhD thesis in the field of soil geography, GIS and digital soil mapping (Geographical soil map and its use in geography). He is the author of many papers, several textbooks for universities and high schools as well as various teaching materials for all levels of education. In 2017, with his fellow soil scientists, he published a Springer monographic publication, Soils of Slovenia, and, in 2018, he translated the 2015 version of WRB soil classification into the Slovenian language. He was a guest lecturer in Slovakia, Croatia, Bosnia and Herzegovina as well as Poland. His knowledge was broadened while visiting the universities in Bratislava (Slovakia) and Freiburg (Germany). Since 2009, he has been a member of Steering Committee of Slovene Soil Science Society. At the moment, and among other things, he is dealing with soil classification, digital soil mapping and invasive alien plant species.

#### **Mario Elia**

Dr. Mario Elia, graduated in Forestry science and is a fire scientist of the Department of Soil, Plant and Food Sciences (DISSPA) at the University of Bari, Italy. His research interests lie in the field of Fire Ecology and Forest Fire Management, focusing mainly in the areas of fire ecological modeling, GIS, remote sensing and ecosystem conservation at multiple scales. He has collected experience in modelling data through research projects and scientific collaborations worldwide. Dr. Elia experienced managing projects, leading research studies, writing projects and papers, developing board studies and collaborating to national and international congresses and seminars. He has worked as visiting researcher at the Toledo University (Oh, USA), Oregon State University (Or, USA) and University of Lisbon (POR).

## **Preface**

Effective land resource management requires the monitoring of environmental characteristics, for which soil data, both temporal and spatial, are of immense importance. Remote sensing and GIS applications facilitate the efficient management of these data, thus enabling the development of predictive and valid models to control soil erosion and land degradation. Further research is needed to improve the study of soil dynamics in various environmental biosystems, providing a greater understanding of erosion processes and the effectiveness of conservation techniques, in line with contemporary and sustainable practices. Furthermore, recent research on soil dynamics offers a promising opportunity to improve our understanding of the link between soil erosion and natural disasters such as landslides, floods, slope instability, biodiversity loss and climate change.

This publication brings together recent studies that contribute to improving techniques and knowledge in these contexts, especially with regard to soil management and monitoring through the use of original techniques, which are essential for the future efficient management of this limited resource.

> **Antonio Ganga, Blaz Repe, and Mario Elia ˇ** *Editors*

### *Editorial* **Applications of GIS and Remote Sensing in Soil Environment Monitoring**

**Antonio Ganga 1,\*, Mario Elia <sup>2</sup> and Blaž Repe <sup>3</sup>**


Monitoring plays an essential role in the efficient and sustainable management of the environment. Accurate and rapid procedures and data enable the activation and implementation of public policies and initiatives with which to address emergencies and medium-term environmental depletion processes. Among these, soil consumption is one of the most important issues. Soil sealing threatens the protection of the environment and the security of food production [1,2] in a world where only 10–12% of natural soils are still available for agriculture [3]. Another important threat to the soil as a resource is the increase in the erosion process. Several studies expect soil erosion to increase in the 21st century due to global climate change and land use [4,5]. At this point, measures to mitigate the effects of soil erosion are currently on the agenda of international institutions such as the Food and Agriculture Organization [6] and the European Union [7].

On the other hand, the availability of remote sensing data with greater temporal and spatial resolution has increased recently [8]. This wealth of data, integrated with field observations, enables increasingly efficient monitoring processes. Therefore, it is essential to implement and perfect more accurate and efficient methods and models.

With this in mind, this Special Issue aims to collect various contributions dealing with ways to improve models for managing data in the environmental monitoring process, focusing on soil issues. In this volume, various aspects of environmental monitoring have been addressed in a logical framework. In some cases, the focus was on hazard detection and mapping [9–13]; in others, the focus was on the detection and description of soil properties and their contribution to land use determination [14–16]. The papers addressed the issues at different scales: regional scale [10,13,14,17]; watershed scale [9,12]; and field scale [11,17]. The GIS approach is indeed useful for implementing an analytical model at multiple scales, even for estimating soil erosion [18]. Finally, the published review [19] addressed the monitoring of salinization, another problem affecting more than 100 nations [20,21], and gave a general overview of the problem in China, one of the largest food producers with a critical problem in terms of self-sufficiency [22].

**Author Contributions:** Conceptualization, A.G., B.R. and M.E.; writing—original draft preparation, A.G., B.R. and M.E.; writing—review and editing, A.G., B.R. and M.E. All authors have read and agreed to the published version of the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Citation:** Ganga, A.; Elia, M.; Repe, B. Applications of GIS and Remote Sensing in Soil Environment Monitoring. *Sustainability* **2023**, *15*, 13705. https://doi.org/10.3390/ su151813705

Received: 7 September 2023 Accepted: 12 September 2023 Published: 14 September 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **A Model between Cohesion and Its Inter-Controlled Factors of Fine-Grained Sediments in Beichuan Debris Flow, Sichuan Province, China**

**Qinjun Wang 1,2,3,4,\*, Jingjing Xie 2,3, Jingyi Yang 2,3, Peng Liu 2,3, Dingkun Chang 2,3 and Wentao Xu 2,3**


**Abstract:** Cohesion is the attraction between adjacent particles within the same material, which is the main inter-controlled factor of fine-grained sediment stability, and thus plays an important role in debris flow hazard early warning. However, there is no quantitative model of cohesion and its inter-controlled factors, including effective internal friction angle, permeability coefficient and density. Therefore, establishing a quantitative model of cohesion and its inter-controlled factors is of considerable significance in debris flow hazard early warning. Taking Beichuan county in southwestern China as the study area, we carried out a series of experiments on cohesion and its inter-controlled factors. Using the value of cohesion as the dependent variable and values of normalized density, normalized logarithm of permeability coefficient and normalized effective internal friction angle as the independent variables, we established a quantitative model of cohesion and its inter-controlled factors by the least-squares multivariate statistical method. Fitting of the model showed that its determination coefficient (R2) was 0.61, indicating that the corresponding correlation coefficient (R) was 0.78. Furthermore, t-tests of the model showed that except for the *p* value of density, which was 0.05, those of other factors were less than 0.01, indicating that cohesion was significantly correlated to its inter-controlled factors, providing a scientific basis for debris flow hazard early warning.

**Keywords:** debris flow; fine sediments; cohesion; Beichuan

### **1. Introduction**

Debris flow is gravity sediment flow with a large amount of soils and stones caused by rainstorms or snow/ice melting. It often causes houses to collapse and results in damaged roads, electricity lines and other facilities, thus posing a serious threat to the safety of local people's lives and property [1]. For example, on August 20, 2019, catastrophic debris flow in Sichuan affected 446,000 people, leading to a direct economic loss of 15.89 billion yuan [2]. Therefore, rapid debris flow early warning plays an important role in ensuring the safety of mountainous people.

There are numerical debris flow hazard simulation methods, which are essential for the development of a hazard early warning system [3–6], such as the full three-dimensional (3D) smoothed particle hydrodynamics (SPH) method, the modified MPS method, the coupled moving particle simulation–finite element method and the liquid–gas-like phase transition model in sand flow under microgravity. In such models, quantitative debris flow parameters are important to the debris flow hazard simulations.

With particle size (represented by diameter) less than 2 mm, fine-grained sediments are Quaternary sediments and the main materials that flow in water during debris flow.

**Citation:** Wang, Q.; Xie, J.; Yang, J.; Liu, P.; Chang, D.; Xu, W. A Model between Cohesion and Its Inter-Controlled Factors of Fine-Grained Sediments in Beichuan Debris Flow, Sichuan Province, China. *Sustainability* **2022**, *14*, 12832. https://doi.org/10.3390/ su141912832

Academic Editors: Antonio Ganga, Blaž Repe and Mario Elia

Received: 15 August 2022 Accepted: 30 September 2022 Published: 8 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

3

Their stability is closely related to the debris flow initial water volume [7–15]. Therefore, debris flow early warning needs to quickly detect their stability. It is mainly controlled by external and internal factors. External factors include water sources, such as rainfall, rainfall intensity, runoff and topographic conditions, e.g., slope, surface coverage and structure. Internal factors include cohesion, permeability coefficient and effective internal friction angle [16–18].

Cohesion is the attraction within a material, such as electrostatic attraction, van der Waals force, cementation and valence bonds. In the case of effective stress, cohesion is obtained by reducing the friction from the total shear strength, which is the inter-controlled factor of the debris flow stability. Its value mainly reflects the strain capacity of soil to resist external stress, and is related to the effective internal friction angle (the internal friction between soil particles, mainly including the surface friction of soil particles and the bonding force between them), the permeability coefficient (the unit flow under the unit hydraulic gradient, indicating the difficulty of fluid passing through the pore skeleton), density (mass per unit volume) and moisture (the ratio of the weight of water contained in the soil to the weight of dry soil) [19–27].

As there is no quantitative model of cohesion and its inter-controlled factors in debris flow, carrying out cohesion research and discovering its inter-controlled factors to establish a quantitative model is of great significance in debris flow hazard early warning.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area is mainly located in Beichuan county, with geographical coordinates of 104◦23 –104◦31.7 E, 31◦48.5 –31◦53.5 N, covering an area of about 140 km<sup>2</sup> (Figure 1). Since the Wenchuan earthquake in 2008, six heavy debris flow disasters have occurred, resulting in nearly 10,000 people left homeless, more than half of the buildings buried and a large area of farmland flooded [28].

**Figure 1.** Location map of the study area.

#### *2.2. Materials and Equipment*

To establish a model of cohesion and its inter-controlled factors of fine-grained sediments in the study area, we collected multi-resource materials: remote sensing images, digital elevation model (DEM), soil and its parameters such as cohesion, permeability coefficient, density and particle size using the corresponding equipment listed in Table 1.


**Table 1.** Materials and equipment.

#### *2.3. Methods*

A technical flowchart of this research is shown in Figure 2, which mainly includes the steps of data acquisition, parameter measurement experiment and model establishment.

**Figure 2.** Technical flowchart.

#### 2.3.1. Data Acquisition

(1) Background data

GF-2 satellite remote sensing images with a spatial resolution of 0.8 m and DEM with a spatial resolution of 30 m were acquired first. Then, a fine-grained sediment map was extracted from the GF imagery to determine the locations of sampling sites.

(2) Sample collection

From 19 to 25 March 2021, with cloudy weather and 11–15 ◦C temperature, 200 samples (600 mL for each sample) were collected from 11 sampling sites using ring knives according to the guide GB/T 36197-2018 [29], whose locations are shown in Figure 1.

(3) Database establishment

A database composed of remote sensing images, DEM, fine-grained sediments map, soil locations, pictures and descriptions was established according to the principles of GB/T 30319-2013 [30].

#### 2.3.2. Parameter Measurement Experiment

Experiments were carried out according to the specification SL237-1999 [31]. Measured parameters include particle size, cohesion and its inter-controlled factors, such as effective internal friction angle, permeability coefficient, density and moisture. Detailed information about the experiments can be found in Reference [28].

(1) Particle size measurement experiment

Microtrac S3500 in Section 2.2 was used to measure particle size; the main steps include setting the sample number and parameters on the instrument, particle size automatic measurement, saving data and cleaning the pipeline.

A histogram of fine-grained sediments' particle size is shown in Figure 3. We can see that the minimum soil particle size in the study area is 0.45 um and the maximum is about 90 um, most of which is distributed in the range of 10–20 um. According to the standard for the engineering classification of soil (GB/T 50145-2007) [32], they are classified as silt loams.

**Figure 3.** Histogram of fine-grained sediments' particle size.

(2) Cohesion measurement experiment

A ZJ strain-controlled direct shear instrument in Section 2.2 was used to measure the soil cohesion and effective internal friction angle. The main steps of the experiment include sample preparation, adding shear normal stress σ of 50, 100, 200 and 300 kpa to obtain shear strength τ, then calculating cohesion and effective internal friction angle by showing the 4 pairs of data in the coordinate system, in which σ is on the horizontal axis and τ is on the vertical axis.

Histograms of fine-grained sediments' cohesion and effective internal friction angle are shown in Figure 4. We can see that cohesion is distributed from 13.95 to 39.55 kPa with the main range of 17.15–26.75 kPa, and the effective internal friction angle is distributed from 16.16◦ to 23.68◦, with the main range of 18.98–21.80◦.

(3) Cohesion inter-controlled factor measurement experiments

A series of measurement experiments were carried out to acquire cohesion intercontrolled factors, such as permeability coefficient, density and moisture.

Firstly, the TST-55 permeameter in Section 2.2 was used to carry out the permeability coefficient experiment, whose main steps include sample preparation, flowing water through the sample and recording related parameters such as initial water head, starting

**Figure 4.** Histograms of fine-grained sediments' cohesion (**a**) and effective internal friction angle (**b**) in Beichuan.

A histogram of the fine-grained sediments' permeability coefficient is shown in Figure 5. We can see that the permeability coefficient is distributed from 0.47 to 2.85 m/d, with the main range of 1.15–2.17 m/d.

Secondly, the MDJ-300A solid densitometer in Section 2.2 is used to measure density; the main steps include sample preparation, weighing the sample and its bag, and then calculating density.

A histogram of fine-grained sediments' density is shown in Figure 6. We can see that density is distributed from 1.34 to 1.73 g/mL, with the main range of 1.34–1.54 g/mL.

**Figure 5.** Histogram of fine-grained sediments' permeability coefficient in Beichuan.

Finally, an electric heating constant temperature drying oven in Section 2.2 was used to measure moisture. The main steps include sample preparation and weighting, drying the sample for more than 8 hours, weighting dried samples and then calculating moisture.

A histogram of fine-grained sediments' moisture is shown in Figure 7. We can see that moisture is distributed from 4.01% to 30.69%, with the main range of 4.01–12.02%.

2.3.3. Model

(1) Data standardization

To eliminate the impacts of magnitude dimensions and orders for different units, data standardization was carried out using Equation (1).

$$\mathbf{z\_{i\bar{j}}} = \left(\mathbf{x\_{i\bar{j}}} - \overline{\mathbf{x\_i}}\right) / \mathbf{s\_i} \tag{1}$$

where zij is the standardized value, *x*ij is the measured value, *xi* is the mean value and si is the standard deviation.

(2) Close factors to cohesion selection

As shown in Table 2, in order to determine the factors close to cohesion, the correlation coefficients between cohesion and effective internal friction angle, permeability coefficient, density and moisture were calculated.

**Table 2.** Coefficients between cohesion correlated to its factors.


p: permeability coefficient; ln: natural logarithm.

From Table 2, we can see that with the correlation coefficients of −0.66, −0.58, 0.36 and 0.32, the effective internal friction angle, logarithm of permeability coefficient, density and moisture, respectively, are related to cohesion. The correlation coefficients of the former two are negative, indicating that cohesion decreases with the increase in effective internal friction angle and permeability coefficient, while the others are positive, indicating that cohesion increases with the increase in density and moisture.

Figure 8 shows the fitting relationship between cohesion and its inter-controlled factors. From which, we can see that the fitting determination coefficient (R2) between cohesion and the permeability coefficient is 0.31, which is less than 0.34 of the fitting determination coefficient between cohesion and the logarithm of permeability coefficient, logarithmic transformation on the permeability coefficient should be made before regression. We also applied logarithms on the effective internal friction angle and density, and then calculated determination coefficients between cohesion with them. The results showed that their logarithmic determination coefficients were 0.44 and 0.12, respectively, which are no more than those of 0.44 and 0.13 in their linear regression format.

**Figure 8.** Fitting relationship between cohesion and its inter-controlled factors.

According to the principles of the statistical significance test, when the correlation coefficient (R) is more than 0.3 and the *p* value of the *t*-test is no more than 0.05, the factor passes the *t*-test. However, the t-test on the correlation between cohesion and its intercontrolled factors (Table 3) showed that the *p* value of moisture was 0.45, which is much more than 0.05, indicating that moisture did not pass the *t*-test, and it was then deleted from the cohesion inter-controlled factors.


**Table 3.** The *t*-test of the correlation between cohesion and its inter-controlled factors.

p: permeability coefficient; ln: natural logarithm.

Therefore, close factors to cohesion are effective internal friction angle, logarithm of permeability coefficient and density.

(3) Model establishment

After selecting the close factors to cohesion, a model between them is established using the least-squares multivariate statistical method.

Firstly, a model between cohesion and each inter-controlled factor is established by the correlation fitting method, such as scatter plot analysis in Microsoft Excel. Then, a transformation on each factor is carried out according to the model between cohesion and each inter-controlled factor so that the cohesion and each inter-controlled factor are linearly correlated. Finally, a model of cohesion and its inter-controlled factors is established by the least-squares multivariate statistical method, whose principles are as follows.

By minimizing the summation of squared errors, the least-squares multivariate statistical method finds the best matching function.

$$y = f(\mathbf{x}, w)$$

In order to determine *w*, the function can be solved as

$$L(y\_\prime f(\mathbf{x}\_\prime w)) = \sum\_{i=1}^n |y\_i - f(\mathbf{x}\_{i\prime} w\_i)|^2 \tag{2}$$

where *wi* (*i* = 1, 2, . . . , *n*) can be calculated by minimizing the function.

#### **3. Results and Discussion**

*3.1. Results*

Using the experimental data, the model of cohesion and standardized inter-controlled factors is established as

$$y = 22.91 + 0.62\mathbf{x}\_1 - 1.57\mathbf{x}\_2 - 2.48\mathbf{x}\_3$$

where *y* is cohesion, *x*<sup>1</sup> is the normalized density, *x*<sup>2</sup> is the normalized logarithm of permeability coefficient ln(p) (p is the permeability coefficient) and *x*<sup>3</sup> is the normalized effective internal friction angle.

The fitting correlation of the model is shown in Figure 9.

From Figure 9, we can see that the model's determination coefficient (R2) is 0.61, indicating that the corresponding correlation coefficient (R) is 0.78. In the model, the absolute value of each variable coefficient indicates its closeness to cohesion. Because the absolute coefficients of normalized density, normalized logarithm of permeability coefficient and

normalized effective internal friction angle are 0.62, 1.57 and 2.48, respectively, the closest parameter to cohesion is the effective internal friction angle, followed by the logarithm of permeability coefficient and density in the study area.

**Figure 9.** Fitting correlation between the measured and the predicted cohesion.

The results of the *t*-tests on each regression coefficient are shown in Table 4. We can see that except for the *p* value of density, which is 0.05, others are less than 0.01, indicating that cohesion has a significant correlation with its inter-controlled factors.

#### **Table 4.** The *t*-tests on the model.


p: permeability coefficient; ln: natural logarithm.

#### *3.2. Discussion*

Cohesion is negatively correlated to effective internal friction angle and logarithm of permeability coefficient and positively correlated to density; the reasons for this are analyzed as follows.

(1) The fixed shear strength of a sample makes a negative correlation between cohesion and the effective internal friction angle.

The effective internal friction angle is determined by the friction resistance and linkage between soil particles. Larger factors that lead to the increase in the soil friction angle—such as a coarser surface, more edges and corners and greater spaces between soil particles—result in the weakening of the gravity between soil particles and the reduction in cohesion [23,33].

In a sample, the correlation of cohesion and the effective internal friction angle can be expressed by

$$
\pi\_f = \sigma t \mathfrak{g} \mathfrak{g} + \mathfrak{c} \tag{3}
$$

where *c* is the cohesion (kPa), *τ<sup>f</sup>* is the shear strength (kPa), *φ* is the effective internal friction angle (◦), *σ* is the normal pressure (kPa) and *σtgφ* is internal friction.

Therefore, when shear strength *τ<sup>f</sup>* and normal pressure *σ* are fixed, the greater the effective internal friction angle is and the smaller the cohesion becomes, thus leading to a negative correlation between them.

(2) Attraction controlled by distance between soil particles makes a negative correlation between cohesion and the permeability coefficient, and a positive correlation between cohesion and density.

The value of the permeability coefficient mainly reflects the number, size and connectivity of soil pores. The greater the permeability coefficient is, the larger the distance between soil particles becomes, leading to smaller attraction and the weakening of cohesion, and thus leading to a negative correlation between cohesion and the permeability coefficient [23].

The greater the soil density is, the smaller the distance between soil particles is and the greater the mutual attraction between particles becomes, leading to greater soil cohesion. Therefore, there is a positive correlation between cohesion and density [34].

(3) The correlation between cohesion and moisture is not significant in the study area.

Many studies showed that soil moisture has some influence on cohesion, but their correlation varies with the content of water and soil components: (1) with the increase in water content, soil cohesion increases first and then decreases [35], and (2) soil cohesion differs with the soil components. For example, it is higher in red soil, mid-range in kaolin and the lowest in sandy soil [36].

Therefore, the correlation between moisture and cohesion is complex, as cohesion does not show a consistent trend with the change in moisture, thus leading to a small correlation coefficient between them. Furthermore, the soil type in the study area is silt loam, in which the correlation between cohesion and moisture is not as significant as in red soil or kaolin in other places.

#### **4. Conclusions**

We designed a series of soil experiments to establish a model of cohesion and its inter-controlled factors. The main conclusions are as follows.

(1) The cohesion inter-controlled factors of fine-grained sediments in Beichuan debris flow were discovered.

The selection of close factors to cohesion showed that with the correlation coefficients of −0.66, −0.58 and 0.36, effective internal friction angle, logarithm of permeability coefficient and density, respectively, were related to cohesion. Therefore, the cohesion inter-controlled factors of fine-grained sediments in Beichuan include effective internal friction angle, logarithm of permeability coefficient and density, as analyzed in Section 3.2.

(2) A model of cohesion and its inter-controlled factors in Beichuan debris flow was established.

A model of cohesion and its inter-controlled factors (effective internal friction angle, logarithm of permeability coefficient and density) in Beichuan debris flow was established by the least-squares multivariate statistical method. The results show that the absolute coefficients of normalized density, normalized logarithm of permeability coefficient and normalized effective internal friction angle were 0.62, 1.57 and 2.48, respectively, indicating that the closest parameter to cohesion is the effective internal friction angle, followed by the logarithm of permeability coefficient and density in the study area. The results of *t*-tests on each regression coefficient showed that except for the *p* value of density, which was 0.05, those of other factors were less than 0.01, indicating that cohesion had a significant correlation with its inter-controlled factors.

(3) The quantitative model of cohesion and its inter-controlled factors provides a scientific basis for debris flow hazard early warning.

Fine sediments with particle sizes less than 2 mm are easily transported by water, especially during high-intensity rainfall or rapid snow melt. Thus, these materials play an important role in the debris flow early warning system. Debris flow early warning needs to quickly detect the stability of these fine-grained sediments, being one of the factors controlling disaster scales.

Cohesion reflects the strain capacity of soil to resist external stress, and is closely related to soil stability. Although cohesion varies with water content, we can quickly obtain its value by the quantitative model presented in this article when the fine-grained sediments on the surface are in their natural state. Then, dangerous areas can be identified by the early warning system according to the value of soil stability estimated by cohesion. In other words, less stability indicates higher danger, and thus provides a scientific basis for debris flow early warning.

**Author Contributions:** Conceptualization, Q.W. and J.X.; methodology, Q.W., J.X., J.Y. and P.L.; validation, Q.W., J.X. and P.L.; formal analysis, Q.W. and J.X.; investigation, Q.W., J.X., P.L., D.C. and W.X.; data curation, J.X., J.Y. and P.L.; writing—original draft preparation, Q.W., J.X. and J.Y.; writing—review and editing, Q.W.; visualization, D.C.; supervision, W.X.; project administration, Q.W.; funding acquisition, Q.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by the National Natural Science Foundation of China (grant number 42071312), the Innovative Research Program of the International Research Center of Big Data for Sustainable Development Goals (grant number CBAS2022IRP03), the Hainan Hundred Special Project (grant number 31, JTT [2018]), the National Key R&D Program (grant number 2021YFB3900503), the Special Project of Strategic Leading Science and Technology of the Chinese Academy of Sciences (grant number XDA19090139), the Second Tibetan Plateau Scientific Expedition and Research (STEP) (grant number 2019QZKK0806) and the Hainan Provincial Department of Science and Technology (grant number ZDKJ2019006).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We express our acknowledgments to Chuanxin Li and Shuo Gao in China University of Geosciences (Beijing) for their hardware supporting. We also express our acknowledgments to reviewers and editors for their good comments and suggestions to make the article better. Finally, we express our acknowledgments to Yu Chen, Bihong Fu and Pilong Shi in the Aerospace Information Research Institute, Chinese Academy of Sciences for their hardware supporting.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


## *Article* **Mapping Soil Properties at a Regional Scale: Assessing Deterministic vs. Geostatistical Interpolation Methods at Different Soil Depths**

**Jesús Barrena-González, Joaquín Francisco Lavado Contador \* and Manuel Pulido Fernández**

Instituto Universitario de Investigación para el Desarrollo Territorial Sostenible (INTERRA), Universidad de Extremadura, 10071 Cáceres, Spain

**\*** Correspondence: frlavado@unex.es

**Abstract:** To determine which interpolation technique is the most suitable for each case study is an essential task for a correct soil mapping, particularly in studies performed at a regional scale. So, our main goal was to identify the most accurate method for mapping 12 soil variables at three different depth intervals: 0–5, 5–10 and >10 cm. For doing that, we have compared nine interpolation methods (deterministic and geostatistical), drawing soil maps of the Spanish region of Extremadura (41,635 km<sup>2</sup> in size) from more than 400 sampling sites in total (e.g., more than 500 for pH for the depth of 0–5 cm). We used the coefficient of determination (*R*2), the mean error (*ME*) and the root mean square error (*RMSE*) as statistical parameters to assess the accuracy of each interpolation method. The results indicated that the most accurate method varied depending on the property and depth of study. In soil properties such as clay, EBK (Empirical Bayesian Kriging) was the most accurate for 0–5 cm layer (*R*<sup>2</sup> = 0.767 and *RMSE* = 3.318). However, for 5–10 cm in depth, it was the IDW (Inverse Distance Weighted) method with *R*<sup>2</sup> and *RMSE* values of 0.689 and 5.131, respectively. In other properties such as pH, the CRS (Completely Regularized Spline) method was the best for 0–5 cm in depth (*R*<sup>2</sup> = 0.834 and *RMSE* = 0.333), while EBK was the best for predicting values below 10 cm (*R*<sup>2</sup> = 0.825 and *RMSE* = 0.399). According to our findings, we concluded that it is necessary to choose the most accurate interpolation method for a proper soil mapping.

**Keywords:** soil mapping; interpolation methods; different depths; regional scale

#### **1. Introduction**

The interpolation methods have been a breakthrough in soil science, expanding the interest of study areas and saving time and money on complex field works. Many of them have been developed particularly for mapping soil properties [1–4] such as texture [2], nitrogen [5], phosphorus [6] and organic matter content [7] through different techniques [8–10]. They can be divided into two main groups: deterministic and geostatistical. Deterministic methods include Inverse Distance Weighting (IDW), Radial Basis Function (RBF), Global Polynomial Interpolation (GPI), Local Polynomial Interpolation (LPI) or Splines. On the other hand, kriging and its variants such as ordinary, simple, empirical, universal, etc., are included in the group of geostatistical techniques. In recent years, however, so-called hybrid techniques have been used, which consist of the fusion of a linear model and a non-linear interpolation model (e.g., Regression Kriging (RK), or Geographically Weighted Regression (GWR)) [11,12].

Given the variety of available interpolation methods, choosing one is a complex and critical task that entails variations in study results, as also do the nature and context of the data (e.g., experimental design, sampling density or topographic characteristics) [13]. Therefore, working on the identification of the most appropriate interpolation method in each case is essential for a proper soil mapping.

**Citation:** Barrena-González, J.; Lavado Contador, J.F.; Pulido Fernández, M. Mapping Soil Properties at a Regional Scale: Assessing Deterministic vs. Geostatistical Interpolation Methods at Different Soil Depths. *Sustainability* **2022**, *14*, 10049. https://doi.org/ 10.3390/su141610049

Academic Editors: Antonio Ganga, Blaž Repe and Mario Elia

Received: 19 July 2022 Accepted: 11 August 2022 Published: 13 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Currently, there is much controversy about which interpolation method is ideal for different properties and case studies. In this regard, the working spatial scale is one of the most important issues when it is used in one method or another. At the regional scale, many studies applied interpolation techniques and obtained different results depending on the variables interpolated [14–19]. Shen et al. [12], for example, found that ordinary kriging provided the best results for predicting total soil phosphorus content, while Chen et al. [20] identified IDW as the best method for predicting and monitoring soil moisture at regional scale. In other cases in which soil carbon [21] or soil moisture content [22] are predicted, hybrid methods such as RK and GWR were preferred.

The application of these techniques has not been used only to surface soil properties but also to map them at different depth intervals [23–25]. The most common depth interval is 0–10 cm, even when in some environments with shallow soils the adequate interval should be 0–5 cm in depth [26,27]. For those works that used different depth intervals, a large variety of methods have been used so far [28]. However, geostatistical techniques are the most widely used [2,29,30].

In addition, there is not yet a clear consensus about which technique is the most appropriate to reflect properties at different depths, frequently depending on individual cases. In some studies, for example, IDW seems to be an appropriate method for mapping variables such as P, K or SOM in the subsoil [31,32], while in others it is the kriging technique that works the best for topsoil [33]. Conversely, in some cases RBF reported better results than IDW or kriging techniques [34].

In regions such as Extremadura (SW Spain), shallow soils occupy 70% of the total area. This characteristic implies that soil mapping at different depth intervals is considered necessary. Nevertheless, so far in this region, there are no works focused on mapping soil properties at different depths. Beyond the soil mapping developed by the European Soil Data Centre (ESDAC) [35–37], there are no studies that focus on mapping soil properties. Some of these studies were focused on mapping quality indices at fixed 0–30 cm intervals, mainly using geostatistical techniques [38]. Some studies compared different interpolation techniques but focused on the quantification of soil losses or on some specific properties without considering different depths and at restricted spatial scales [39,40]. Other studies at a regional scale produced maps of soil aridity [41], bioclimatic indices [42] or sensitivity to land degradation [43].

The scarcity of investigations attending the mapping of soil properties at regional scales is evident. This highlights the need to provide accurate mapping techniques for soil properties. Therefore, the main objective of this work was to identify the most suitable interpolation method for the studied variables at different depths.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study has been carried out in the Autonomous Region of Extremadura, located in the southwest of Spain (Figure 1). Extremadura has a surface area of 41,635 km2, which represents 8.25% of the total area of Spain. The orography of Extremadura is characterised by the presence of two large river basins (the Tagus and Guadiana rivers) and three parallel mountain ranges (Sistema Central, Montes de Toledo and Sierra Morena), which interrupt the dominance of the extensive peneplains. Precambrian and Paleozoic slates and granites are the dominant rocks on which our soils are developed. Cambisol, Leptosol and Regosol soil types occupy 70% of the total surface area, which is mainly used for livestock rearing. On the other hand, Fluvisol or Luvisol soil types predominate in the river valleys, mainly dedicated to agricultural activities [35]. The climate is Mediterranean, with mild winters and hot and dry summers. Yearly total average rainfall is below 600 mm, except in the mountain ranges where values over 1000 mm are reached. The average temperature in the region is around 16–17 ◦C with lower values in the vicinity of the mountain ranges.

**Figure 1.** Location map of the study area.

#### *2.2. Dataset Characteristics and Variables Selection*

Various data sources contributed to the dataset used in this study. For all sources, data available for any of the three studied soil depths (0–5, 5–10 and >10 cm, respectively) were considered, disregarding depth intervals that do not comply with those proposed in the study. The >10 cm class refers to soils mostly not exceeding 35 or 40 cm depth. This region is dominated by Cambisol and especially Leptosol soils, most of which are less than 40 cm in depth. Data have been structured as a point soil-sample database, where most of the records come from different research projects developed by the Geo-Environmental Research Group (GIGA) of the University of Extremadura over the last 20 years, to whom authors belong. Some data from the book "*Estudio de los Suelos de la Tierra de Barros*" [36] have also been used. Additionally, other data come from the Spanish Soil Properties Database, created by the Centre for Energy, Environmental and Technological Research (CIEMAT), from the Soil Catalogue of Extremadura in soil profiles carried out at the end of the 1990s, as well from the Soil Grids platform [37]. In one of the data sources used, it was not possible to verify which analysis method was used. However, in the other sources, the method was found to be the same. The total number of sampling points varied depending on the soil property and the depth where soil samples were taken (Figure 2, Table 1). For instance, more than 500 sampling points are available for some properties, such as pH at 0–5 cm depth and 94 for >10 cm depth phosphorus.

**Figure 2.** Sampling points available for pH at 0–5, 5–10 and >10 cm, respectively.

**Table 1.** Methods used for the analysis of the studied properties and distribution of the number of samples by property and soil depth interval.


\* *n* corresponds to the total number of samples for each property.

#### *2.3. Data Preprocessing and Statistical Analysis*

The first step of the analysis consisted of filtering the outliers from the original dataset. In order to do this the cluster and outlier analysis tool (Anselin Local Moran's I) integrated in ArcGIS 10.5 software was applied [44,45]. This analysis provides a simple and adequate identification of statistically significant outliers, with 95% confidence in relation to neighbouring data.

Once the outliers were identified and removed from the original dataset, the remaining data were randomly partitioned into training and validation datasets, amounting, respectively, to 80% and 20% of the data available for each one of the interpolation methods selected. This sequence of work was carried out for the 12 soil properties and three depths considered.

Statistical parameters such as mean, minimum, maximum and coefficient of variation were calculated to characterise the filtered dataset. In order to reflect the differences observed between some statistics among interpolation methods, bar and whisker plots were used. All the statistical analyses were carried out using Statistica v. 6.0 (StatSoft, Tulsa, OK, USA) [46] and Microsoft Office Excel software packages.

#### *2.4. Interpolation Methods*

Nine interpolation methods were considered as spatially predictive techniques for each of the variables selected in the study. Six deterministic and three geostatistical methods were chosen, all of them integrated into the Geostatistical Analyst extension of ArcGIS 10.5 (ESRI, Redlands, CA, USA) [47].

The deterministic methods were:


Contrariwise, the geostatistical techniques assume spatial autocorrelation as part of the interpolation process. The geostatistical methods used in this study were:


It should be noted that in each of the interpolation methods used, the parameters of the model were modified in order to obtain the lowest mean square error as the output result.

#### *2.5. Assessment of the Interpolation Methods Reliability*

From the cross-validation, the mean error (*ME*), the coefficient of determination (*R*2) and the root mean square error (*RMSE*) were used to assess the accuracy of the interpolation methods. In order to consider the most reliable method, the lowest *RMSE* and the highest *R*<sup>2</sup> were taken into account [53]. In some cases, where these parameters were equal, the ME value closest to 0 was used to determine the most reliable method.

*ME* gives an absolute value that determines the degree of bias obtained in the estimates. Higher values indicate larger differences between predicted and observed values [54]. *ME* values are calculated as follows:

$$ME = \frac{\sum\_{i=1}^{n} (\hat{Z}\left(s\_{i}\right) - z\left(s\_{i}\right))}{n} \tag{1}$$

where, *z* (*si*) is the observed value at point *si*, *Z*ˆ (*si*) is the predicted value at point *si* and *n* the number of samples.

The coefficient of determination or R-squared (*R*2) assesses the ability of a model to predict an outcome of a regression. Thus, the coefficient of determination indicates how well the data fit the model. Values vary between positive 1 and 0, values close to 1 indicate a better fitness of the model [55]. The equation is given as follows:

$$R^2 = \frac{\left[\sum\_{i=1}^{n} (P\_i - P\_{\text{ave}}) \left(Q\_i - Q\_{\text{ave}}\right)\right]^2}{\sum\_{i=1}^{n} \left(P\_i - P\_{\text{ave}}\right)^2 \sum\_{i=1}^{n} \left(Q\_i - Q\_{\text{ave}}\right)^2}$$

Being *Pave* the average of the estimated value; *Qave* is the average of measured value; and *n* is the points number used for estimation.

The root mean square error (*RMSE*) was used to quantify the accuracy of the interpolation model used. Low *RMSE* values indicate higher reliability of the model. The *RMSE* is calculated as follows:

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{n} \left[ \hat{Z}\left(s\_i\right) - z\left(s\_i\right) \right]^2}{n}} \tag{2}$$

Being, *z* (*si*) the observed value at point *si*, *Z*ˆ (*si*) the predicted value at point *si*, and *n* the number of samples.

#### **3. Results**

#### *3.1. Descriptive Statistics*

Table 2 shows the descriptive statistics of each of the studied variables. A trend can be observed of clay, silt, and sand content, as well as pH, cation exchange capacity, calcium, and magnesium to increase with depth, showing other variables, i.e., nitrogen, phosphorus, potassium and organic matter, an opposite behaviour. The coefficient of variation (CV) reflects the variability of soil properties was low for pH, with values not exceeding 17%, being high for clay and silt content, as well as for some chemical properties such as cation exchange capacity, phosphorus, potassium, or soil organic matter. The CV was particularly high for calcium, magnesium, sodium, and nitrogen.

### *3.2. Optimal Interpolation Method by Soil Property*

#### 3.2.1. Particle Size Distribution

Model statistics for the three studied soil depths, obtained with the validation dataset when modelling particle size distribution, are shown in Table 3. In Appendix A, Figure A1 shows the regional cartographic representation of the best performing methods for clay, sand, and silt content. Regarding the clay content, the more accurate methods were EBK for 0–5 cm soil depth and IDW for larger depths. *RMSE* values were below the average reflected in Table 1, showing an increasing trend with depth (3.554%, 5.131% and 5.205%). In turn, R2 values decreased (0.767, 0.689 and 0.648). It is worth noting that the differences in *RMSE* between the most and least accurate methods were about 1% or higher being *R*<sup>2</sup> differences minor. In the case of silt content, IDW was the most accurate method for 0–5 and 5–10 cm depth, performing the OK method better for greater depths. As in the case of clay content, *RMSE* values were below the average measured for the reference dataset. Model accuracy decreased when increasing soil depth (6.22%, 5.103% and 5.605%), showing an opposite behaviour to that observed for clay content, while *R*<sup>2</sup> increased (0.831, 0.897 and 0.921). With regards to the sand content, the best performing methods were CRS

at 0–5 cm depth, IDW for 5–10 cm depth and SK at >10 cm depth. *RMSE* data increased in this case with increasing soil depth (4.627%, 5.217% and 6.354%), and the same is true for *R*<sup>2</sup> values (0.739, 0.816 and 0.799), contrasting with the trend observed in the clay model.


**Table 2.** Descriptive statistics of the studied variables.

#### 3.2.2. Selected Chemical and Sorption Complex Properties

Table 4 shows the main model statistics of the interpolation methods used for the selected chemical variables. The maps that represent the regional pattern of these variables with the more accurate interpolation methods are shown in Figure A2. For the pH, the best performing method at 0–5 cm depth was CRS (*R*<sup>2</sup> = 0.834 and *RMSE* = 0.333). The EBK algorithm provided very similar results to the former, but with slightly higher RMSE and lower *R*2. At 5–10 cm depth, OK method performed better (*R*<sup>2</sup> = 0.823 and *RMSE* = 0.328), followed by SWT with slightly higher *R*<sup>2</sup> and *RMSE*. At soil depths greater than 10 cm, EBK was the most accurate method (*R*<sup>2</sup> = 0.825 and *RMSE* = 0.399). Overall, results obtained at each of the three depths were comparable, with RMSE values that varied below 0.07.

Regarding the Cation Exchange Capacity (CEC), IDW was the most accurate method at 0–5 cm depth, with *RMSE* of 3.778 and *R*<sup>2</sup> of 0.854. At 5–10 cm, OK was the method that provided the best results, with *RMSE* and *R*<sup>2</sup> accounting to 4.533 and 0.702, respectively. The same was true at soil depths greater than 10 cm, where EBK was the best performing method, with *RMSE* of 4.415 and an *R*<sup>2</sup> of 0.697. In the case of Calcium (Ca), unlike the previous variables, it was a geostatistical method, EBK, that provided the best results at 0–5 cm depth, with *RMSE* of 0.864 and *R*<sup>2</sup> of 0.977. At 5–10 cm depth, on the other hand, it was the IDW method that showed better results, with *RMSE* of 1.879, one point higher than the topmost soil layer, but still showing a good fitness with an *R*<sup>2</sup> of 0.973. The *RMSE* increased at deeper depths (>10 cm), where EBK was identified as the most accurate method but still retaining a high *R*<sup>2</sup> value of 0.964. Magnesium (Mg) models for the 0–5 cm soil depth showed the SK method as the best performing one with *RMSE* of 0.741 and an *R*<sup>2</sup> of 0.569, indicating poor model fitness. Nevertheless, with increasing depths, results improve considerably, with the OK method being better at 5–10 cm that reduced the *RMSE* to 0.534 and improved the *R*<sup>2</sup> to 0.770. The same was true at >10 cm depth, where the IDW method provided an *RMSE* that increased again to 0.781 while still maintaining a good *R*<sup>2</sup> value of 0.708. Finally, for Sodium (Na) content, it was the CRS method that produced the best results at 0–5 cm depth, with *RMSE* and *R*<sup>2</sup> that amounted to 0.167 and 0.888, respectively. For the 5–10 cm soil layer, CRS was also the method showing the best results, although both *RMSE* and *R*<sup>2</sup> decreased to 0.138 and 0.511, respectively. At >10 cm soil layer, although IDW was the most accurate method, the model validation statistics obtained were poor (*RMSE* = 0.196 and *R*<sup>2</sup> = 0.044) indicating a very poor model fitness.

**Table 3.** Model validation statistics for the different interpolation methods used when modelling particle size distribution at the three studied soil depths. In bold is the optimal method.



**Table 4.** Model validation statistics for the different interpolation methods used when modelling the chemical properties selected at the three studied soil depths. In bold is the optimal method.

3.2.3. Selected Elements Content

The statistics obtained as a result of the validation of the selected interpolation methods used for some of the main so-called soil fertilisers are presented in Table 5, and their cartographic representation is shown in Figure A3.


**Table 5.** Model validation statistics for the different interpolation methods used when modelling nitrogen (N), phosphorus (P), potassium (K) and soil organic matter (SOM) at the three studied soil depths. In bold is the optimal method.

At the 0–5 cm soil layer, the most accurate method predicting nitrogen content was the SK, showing an *R*<sup>2</sup> of 0.634 and an *RMSE* of 0.053. SK was also optimal for the 5–10 cm soil layer. However, although the *RMSE* value of 0.033 is acceptable, the *R*<sup>2</sup> value obtained at this depth (0.161) indicates poor model fitness. At the depth >10 cm, IDW provided the best results, with *R*<sup>2</sup> of 0.685 and *RMSE* of 0.024.

As it is seen in Table 5, when the phosphorus content is interpolated, the optimal methods were the CRS for the uppermost soil layer (*R*<sup>2</sup> = 0.840 and *RMSE* = 6.726), the OK at 5–10 cm soil depth (*R*<sup>2</sup> =0.745 and *RMSE* = 5.366) and the IM-Q for the deepest layer (>10 cm) (*R*<sup>2</sup> = 0.625 and *RMSE* = 8.811).

Regarding the potassium, the IDW was the most accurate method at 0–5 cm (*R*<sup>2</sup> = 0.729 and *RMSE* = 0.170). In this case, model accuracies were considerably reduced with depth, with the CRS method being better at 5–10 cm with an *R*<sup>2</sup> of 0.582 and an *RMSE* of 0.133, and at >10 cm soil layer, EBK revealed the most accurate results, very similar to those obtained for the above layer.

For soil organic matter (SOM), IDW was the most accurate method for the 0–5 cm layer, with an *R*<sup>2</sup> of 0.754 and *RMSE* of 0.957. The model accuracy was also reduced when modelling SOM at deeper layers, with *R*<sup>2</sup> values of 0.657 at 5–10 cm, where the EBK was the optimal model, and 0.588 at >10 cm layer, with the OK as the best performing method.

#### *3.3. General Assessment of the Interpolation Methods*

The general performance of the different interpolation methods considered in this study, attending to the whole set of modelled variables, was evaluated considering their average *R*<sup>2</sup> and *RMSE* statistics, as reflected in Figure 3.

**Figure 3.** Analysis of the average values of *R*<sup>2</sup> and *RMSE* for the interpolation methods applied to each study variable. Whiskers show the mean value ± 0.95 confidence interval. The red rectangle groups the more accurate deterministic interpolation methods and the green rectangle groups the geostatistical interpolation methods. Letter (**a**) refers to *R*<sup>2</sup> values and letter (**b**) to *RMSE* values.

As expressed in Figure 3a, attending to the model accuracies, as expressed by the *R*2, the interpolation methods with the best general performance were IDW, CRS and SWT, reaching, respectively, 0.739, 0.755 and 0.747 on average. The geostatistical methods, on the other hand, showed lower accuracies in general, OK being the one with the highest *R*2, as compared to the similar values obtained with EBK and SK methods.

As for the average *RMSE* (Figure 3b), again the IDW, followed by CRS and SWT, were the more accurate methods. In this case, the geostatistical methods yielded similar or slightly higher values, with SK producing the highest mean *RMSE* overall.

When the analysis was performed as a function of the soil depth, the IDW was the method more repeatedly observed as more accurate. It was more reliable at 0–5 cm depth for properties such as silt, cation exchange capacity, potassium, and soil organic matter. At the 5–10 cm layer, it was also preferred when modelling clay, silt, sand, and calcium content. Finally, at depth >10 cm, it was also better when modelling clay, magnesium, sodium, and nitrogen content. Alternatively, CRS was more reliable for variables such as sand, pH, sodium, and phosphorus at the uppermost soil layer, also being good to express sodium and potassium content at 5–10 cm depth. Among the deterministic methods, IM-Q showed the best results when predicting phosphorus content at >10 cm.

As for the geostatistical methods, OK yielded the best results in six cases. At 5–10 cm depth, it was the most reliable method for variables such as pH, cation exchange capacity, and magnesium and phosphorus contents, and also for silt content and soil organic matter at the >10 cm soil layer. SK was the optimal method to model magnesium and nitrogen content at 0–5 cm and for nitrogen content at 5–10 cm and sand content at >10 cm. Finally, the EBK was selected as more accurate in seven cases.

#### **4. Discussion**

The lithological and land use characteristics existing in the region under study show a low variability. This highlights that the degree of variation observed was highly dependent on the property itself. The mineral fraction of the soil showed less variability than the selected elements content. In this sense, due to the illuviation process, the clay content showed greater variability as compared with silt and sand content. Those results agree with the findings of Keshavarzi et al. [56] and Addis et al. [57]. In turn, soil nutrients showed a higher variability, since more than 80% of them are concentrated in the upper and shallower soil layers. However, other chemical properties such as pH showed the opposite behaviour, with less variability and higher values at the deepest layers [34].

Currently, geostatistical methods are the most widely used methods for predicting and mapping soil properties [2,58]. However, in this study both deterministic and geostatistical methods have been used to ascertain the more accurate methods. The results of the interpolation analysis in this study showed the deterministic methods as having more potential to interpolate soil particle size distribution as compared to the geostatistical ones. The best results of deterministic methods in the topsoil layers are determined by a sufficiently dense data set. In this way, deterministic functions can capture the extent of local surface variation needed for the analysis and report more accurate results. In contrast to the findings of other authors such as Gozdowski et al. [59] or Radocaj et al. [60], in our study it was the IDW method that provided the best results. However, particularly when interpolating clay content at 0–5 cm or silt and sand content at the >10 cm layer, geostatistical methods were identified as more accurate, as found also by Li et al. [2].

Attending to the chemical variables, it was difficult to identify a unique optimal interpolation technique. The models generated for pH showed adequate fitness and acceptable mean square errors. Although other studies [61] found OK as the best performing method, our analysis showed the CRS as the most accurate for the topsoil, in agreement with Zandi et al. [34]. At the same time, also according with Zandi et al. [34] but contrasting with that observed by Robinson and Metternicht [62], the error and reliability statistics of the subsoil models indicate that kriging techniques were more accurate.

When studying CEC, models showed good performance, even when the average errors obtained were relatively high for two of the three depths. Our results showed that, when interpolating the cation exchange capacity at the 0–5 cm layer, IDW was the best method. Those results are in agreement with the findings of other authors [62,63]. Nevertheless, several works preferred geostatistical techniques, in concordance with the results obtained for the deeper layers, where OK and EBK proved to be more accurate [64–68].

In the case of calcium, models showed good results in general, even though the mean square error increased in depth due to the higher variability of the data. In this case, EBK was the most accurate method for predicting calcium at the shallowest (0–5 cm) and deepest (>10 cm) layers. Our results agree with other authors [69,70], although they disagree with the findings of others [62] that identified IDW as more accurate for interpolating calcium at 5–10 cm depth.

For magnesium, model results improve with depth. When interpolating the shallowest soil layers (0–5 and 5–10 cm), according to John et al. [9], geostatistical methods such as SK and OK were found to be the most accurate. However, IDW was the best method when interpolating Mg at more than 10 cm. This agreed with Schloeder et al. [71], who found no differences between IDW and OK models.

Sodium models provided good results at 0–5 cm depth, with CRS being the most accurate method. Even when at 5–10 cm depth results were less accurate, CRS was also the best method. At >10 cm soil depth, the model showed a weak fitness, indicating the difficulties for mapping Na at this soil depth. In this case, the better accuracy of deterministic vs. geostatistical methods for the first two soil depths coincides with the findings of Fu et al. [72]. However, it differs from Cruz-Cárdenas et al. [70] and Keshavarzi and Sarmadian [73], who found kriging methods as the most accurate.

When interpolating N, P, K and SOM, the results indicate that any of the interpolation techniques dominate in terms of the model's accuracy, i.e., depending on the property and soil depth, the results vary. Good results were observed for nitrogen at 0–5 cm and >10 cm layers, where SK and IDW, respectively, were the most accurate methods. The usefulness of SK coincides with Wang et al. [15]. Nevertheless, the adequacy of IDW at deeper layers in our case differs from the findings of other works [27,74,75]. However, the results of N models at 5–10 cm depth showed poor fitness, which does not allow for accurate predictions for the study area.

When interpolating phosphorus, deterministic techniques such as CRS and IM-Q were the most accurate at 0–5 cm and >10 cm soil depth. These results are in agreement with those obtained by Wollenhaupt et al. [31]. However, coinciding with Shen et al. [12] and Duan et al. [76], OK was the best method for predicting P at 5–10 cm. Furthermore, in our case, OK and IDW showed the same behaviour at 5–10 cm depth, as also found by Schloeder et al. [71].

In the case of potassium models, good results were obtained at 0–5 cm depth. However, ability to ascertain the predictive capacity of the models at 5–10 cm and >10 cm was poor. For the first depth, the deterministic IDW method was the most accurate. In this sense, the results are in agreement with the findings of Bogunovic et al. [77] and Lei et al. [78] in which radial basis functions were observed as more accurate for the shallowest soil layers. These findings agree with our results in the sense that the deterministic CRS was also better to interpolate potassium at 5–10 cm. Conversely, EBK was the best method for predicting values >10 cm. This was also observed by Lingling et al. [79] and Karwariya et al. [5], who found geostatistical methods as more accurate than deterministic ones.

The models developed for the prediction of soil organic matter were good in the case of the first two soil layers. However, at >10 cm, model results were relatively worse, although still keeping acceptable errors. IDW was the most reliable method for predicting SOM at 0–5 cm. Nevertheless, geostatistical methods were the most accurate in the case of Long et al. [3] or Bouasria et al. [80]. At 5–10 and >10 cm, EBK and OK were better. These results could partially agree with Durdevic et al. [7], who found EBK as the most accurate method to interpolate SOM at 0–30 cm, followed by OK and IDW.

In all the case studies analysed in this work, the model's performance varies according to the different properties. This highlights the spatial dependence of the data on intrinsic and soil formation factors. One of the main parameters determining model performance is soil depth. This, in turn, limits the number of samples in many cases due to the shallow soils in this study area. However, the sampling point's density does not imply that the model's performance is reduced but depends on other issues such as the variability of the data at depth, so much so that models with a lower number of samples provide better results than others with more samples, as is the case for silt. In the surface soil layers, the properties show a high variability. However, this spatial variability is considerably reduced at depth.

Another important issue in modelling soil properties is the date of the data used. In some environments (e.g., tropical climates) the use of soil samples exceeding 10–15 years may lead to unrealistic results due to the faster pattern rates of change in soil properties. However, in our environment (Mediterranean climate), soil properties do not usually undergo significant changes. Properties remain stable for 10–15 years, unless there are changes in land use that are very different from the previous ones, as pointed out by Pulido Fernández [81], who compared data from more than 30 years in similar environments. Nevertheless, it is important to add that soil organic matter content changes faster than nutrients and cations, as found by Llorente et al. [82], which shows how in this type of environment only organic matter increases, especially in farms without excessive stocking rates. It is also true that, logically, with proper management, levels increase, but slowly.

When the performance of all the studied interpolation techniques were jointly assessed by means of the analysis described in Section 3.3, it was clearly ascertained that three of the deterministic methods (IDW, CRS and SWT) provided better results than the geostatistical ones. Although the difference is minor in some cases, our results differed from the general consideration by which the geostatistical methods are more commonly used to spatially predict soil properties values.

Working on the search for the most accurate mapping technique is one of the main tasks of this group. However, the main idea is to clarify which parameters have the greatest influence on the model's performance. To this end, the use of deep learning techniques is intended, in which a multitude of environmental covariates can be incorporated. Moreover, new trends in soil mapping are along these lines, as is the use of hybrid techniques for soil mapping.

#### **5. Conclusions**

In this study, nine interpolation methods were used to predict 12 soil variables that were measured at three different soil depth intervals. Statistics such as mean error, coefficient of determination and mean square error were used to evaluate the accuracy of the methods. Although a general preference to use geostatistical methods is observed in general, we conclude that deterministic methods provide better results than geostatistical ones. Our results show that geostatistical methods were more accurate in 19 of the 36 case studies. However, the observed difference between the interpolation techniques is negligible in some cases, allowing different ones to be used interchangeably. In this regard, our results indicate that the accuracy of the methods varies depending on the case study. Results also varied, in general, when the different depths are considered, identifying deterministic methods as more accurate for the topsoil and geostatistical ones for the deeper layer. Therefore, we also conclude the necessity to use a variety of soil mapping methods and techniques to achieve the best results.

**Author Contributions:** This article was written by J.B.-G. and J.F.L.C., and reviewed and edited by M.P.F. The methodology was proposed by J.B.-G. and the data analysis was carried out by J.B.-G., J.F.L.C. and M.P.F. Fieldwork and data acquisition was carried out by J.B.-G. and M.P.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This publication has been made possible thanks to funding granted by the Consejería de Economía, Ciencia y Agenda Digital de la Junta de Extremadura and by the European Regional Development Fund of the European Union through the reference grant IB16052.

**Acknowledgments:** Thanks to the European Social Fund and the Junta de Extremadura for the funding granted to Jesús Barrena González (PD18016).

**Conflicts of Interest:** The authors declare no conflict of interest.

**Appendix A**

**Figure A1.** Cartographic representation of the different soil particle sizes and at each of the study depths using the most accurate interpolation method in each case.

**Figure A2.** Cartographic representation of the different chemical properties of the soil and at each of the study depths using the most accurate interpolation method in each case.

**Figure A3.** Cartographic representation of the different fertilisers plus soil organic matter and at each of the study depths using the interpolation method with the highest accuracy in each case.

#### **References**


### *Article* **Impact of Agricultural Land Use Types on Soil Moisture Retention of Loamy Soils**

**Szabolcs Czigány 1, Noémi Sarkadi 1,\*, Dénes Lóczy 1, Anikó Csépl ˝o 2, Richárd Balogh 2, Szabolcs Ákos Fábián 1, Rok Cigliˇc 3, Mateja Ferk 3, Gábor Pirisi 1, Marcell Imre 1, Gábor Nagy <sup>4</sup> and Ervin Pirkhoffer <sup>1</sup>**


**Abstract:** Increasingly severe hydrological extremes are predicted for the Pannonian Basin as one of the consequences of climate change. The challenges of extreme droughts require the adaptation of agriculture especially during the intense growth phase of crops. For dryland farming, the selections of the optimal land use type and sustainable agricultural land management are potential adaptation tools for facing the challenges posed by increased aridity. To this end, it is indispensable to understand soil moisture (SM) dynamics under different land use types over drought-affected periods. Within the framework of a Slovenian–Hungarian project, soil moisture, matric potential and rainfall time series have been collected at three pilot sites of different land use types (pasture, orchards and a ploughland) in SW Hungary since September 2018. Experiments were carried out in soils of silt, silt loam and clay loam texture. In the summers (June 1 to August 31) of 2019 and 2022, we identified normal and dry conditions, respectively, with regard to differences in water balance. Our results demonstrated that soil moisture is closely controlled by land use. Marked differences of the moisture regime were revealed among the three land use types based on statistical analyses. Soils under pasture had the most balanced regime, whereas ploughland soils indicated the highest amplitude of moisture dynamics. The orchard, however, showed responses to weather conditions in sharp contrast with the other two sites. Our results are applicable for loamy soils under humid and subhumid temperate climates and for periods of extreme droughts, a condition which is expected to be the norm for the future.

**Keywords:** drought; ecosystem services; land use; soil moisture dynamics; water stress

#### **1. Introduction**

Reports by the Intergovernmental Panel on Climate Change (IPCC) predict that the increasing frequency of both extreme precipitation and prolonged drought periods is very likely in the near future [1]. This pattern was exemplified in Hungary in 2010, with record high annual precipitation totals, followed by the record low annual rainfall total of 2011. However, thanks to the storage of surplus moisture from the previous year in soils, the 2011 drought did not cause remarkable losses in crop yields. Negative water balances, water stress and drought will likely be manifested in diverse ways geographically in the future [2]. Nonetheless, due to the rain shadow effect of the Alps and the Carpathians, the Pannonian Basin will likely be affected by water shortages in the near future [3] and alternating inundations by flash floods, inland excess waters [4–6] and soil erosion [7].

A novel element of sustainable adaptation to climatic conditions of negative water balances could be integrated water management, equally directed to the prevention of excess runoff and prolonged droughts [8,9]. In basin locations and areas of transit waters,

**Citation:** Czigány, S.; Sarkadi, N.; Lóczy, D.; Csépl˝o, A.; Balogh, R.; Fábián, S.Á.; Cigliˇc, R.; Ferk, M.; Pirisi, G.; Imre, M.; et al. Impact of Agricultural Land Use Types on Soil Moisture Retention of Loamy Soils. *Sustainability* **2023**, *15*, 4925. https:// doi.org/10.3390/su15064925

Academic Editors: Antonio Ganga, Blaž Repe and Mario Elia

Received: 18 January 2023 Revised: 20 February 2023 Accepted: 3 March 2023 Published: 9 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

such as the Pannonian Basin, water retention is of increasing importance [9,10]. This has been crucial since the river regulation works in the early- and mid-1800s when water conveyance had been accelerated artificially. Over the 1800s and 1900s, flood control measures meant the cost-intensive construction of hydrologic structures (levees, dykes and embankments). Nevertheless, recently, the negative consequences of the rapid conveyance and limited storage of water through the Pannonian Basin have been recognized. The main hydrological constraints are limited land availability and water retention and the reduced storage capacity of the soil [11].

By recognizing the beneficial role of ecosystem services [12–14], a paradigm change occurred in water management policies in many countries in Europe and North America. Water conservation in a sustainable way has priority, especially in floodplains and low-lying areas. Water should be retained in floodplains or various stormwater-mitigation facilities (e.g., raingardens and flood retention pools), in natural and manmade reservoirs and other water bodies instead of increasing the intensity of water conveyance [15].

Adaptation to the increased extremities of hydrologic phenomena (droughts and floods) and the retention of water are indispensable in light of climate change [16]. Therefore, analyzing the suitability of soil textural types, fertility, topography and crop varieties may increase the profitability of the given land use type if managed site-specifically [17–20].

Hence, for sustainable site-specific best management practices, the re-evaluations of landscape diversity and efficiency of increased water retention are essential [21]. As opposed to costly hydrologic structures that are often undesired for natural or seminatural environments [22,23], greener investments and eco-friendly solutions are needed [10], which may comply with the EU Water Framework Directive or other similar frameworks [24–26]. To maintain a more balanced water budget in the long run, the following specific goals should be stated [10]:


Land use and management can significantly affect both atmospheric (e.g., greenhouse gas emission rates and vapor content) and soil physical properties, including porosity hydraulic conductivity [16,27], rate of infiltration, and volume of plant-available water [28–31]. Land use changes may influence the intensity of certain elements of the water cycle, especially the magnitude and time of evapotranspiration [32]. Fu et al. revealed the beneficial impacts of intercropping and terraced agriculture on soil moisture [21]. Niu et al. demonstrated that grasslands had the highest mean soil moisture contents among five different land use types (grassland, cropland, poplar land, interdunes and shrubland) in north-eastern China [33]. The degradation of physical soil properties can directly affect moisture dynamics in the vadose zone. For example, soil productivity decreased by converting natural pastures to farmlands in Iran [28].

The present paper reveals the findings of a Hungarian–Slovenian joint research project titled "Possible ecological control of flood hazard in the hill regions of Hungary and Slovenia". The key objective of the project is providing data on moisture dynamics of silty and loamy soils found on surfaces of high relief. The goal of this study is to present the influence of land use type coupled with periods of different water balances ('normal' or dry summer periods) on moisture dynamics. According to our hypothesis, soils of ploughlands should have a lower water retention capacity, whereas balanced moisture dynamics characterize soils under closer-to-natural land use types. The novelty of our paper is to provide data on the effect of agricultural land use types on soil moisture dynamics. The selected area is markedly affected by a changing climate of increasing aridity. Sustainable adaptation to changing climates at local scales helps in maximizing the site-specific efficiency of ecosystem services. The present study complements previous research carried out in Central Hungary [16].

#### **2. Materials and Methods**

#### *2.1. Location of Study Sites*

For the analyses, three sites were selected in the Transdanubian Hills (SW Hungary) in the vicinity of the city of Pécs: at the villages of Boda (ploughland), Palkonya (orchard) and Almamellék (pasture) (Figure 1). It is a region of a subhumid continental climate influenced by the air masses (whether continental, Atlantic or Mediterranean) and the orographic effect. Although in the long-term precipitation shows an increasing gradient towards the western part of the country, in many years, field rainfall totals show a rather mosaic pattern.

**Figure 1.** Location of the study sites on the digital elevation model (DEM) generated from a LiDAR survey. (**a**) Almamellék (pasture); (**b**) Boda (ploughland); (**c**) Palkonya-Villánykövesd (orchard).

In all study sites, slightly eroded brown forest soils with clay illuviation (WRB: Endocalcic Luvisol) are found. All soils are formed on loess. The three sites are similar in terms of textural type (silt and silt loam) and diagnostic soil type (Calcaric Phaeozem, WRB).

#### 2.1.1. Ploughland Site (Foothills of the Mecsek Mountains)

This study site is located at a distance of 10 km west of city of Pécs, in the southern footslopes of the Mecsek Mountains, gently sloping to the direction of the Pécs half-basin, west of the village of Boda. The elevation of the lower and upper station is 172 and 182 m, respectively (Table 1). The average slope for this site is 2.63◦, whereas the maximum is 6.48◦. The land use type is large-scale farming of conventional tillage, with sugar-beet, cereals, sunflower, soybean and rape seed as the most common crops. In both study years, this site was cropped with soybean. The distance between the two monitoring stations of this site is 190 m. A derasional valley and an erosional gully are found at this site, uphill and downhill of the lower station, which is located at the southern margin of a small grove. These landforms and the grove likely influence the soil moisture budget around the station.

#### 2.1.2. Orchard Site

The second study site lies at the southern edge of the village of Palkonya in the northwestern foreland of the Villány Hills. Land utilization is a cherry orchard. The parent material is Pleistocene loess overlying a Mesozoic limestone. Elevation of the lower and

upper stations is 175 and 182.7 m. The slope of the uphill station is steeper, with an average slope of 18◦, whereas the slope is gradually decreasing to a footslope position closer to the foothill station and the reservoir located in the valley bottom. The two stations of this site were installed at a distance of 156 m.

**Table 1.** General geographical parameters of the three study sites.


#### 2.1.3. Pasture Site

This study site is situated in the Zselic Hills, where the elevation of the lower and upper is 112.6 and 124.6 m, respectively. The average slope in the Almamellék site is 5.92◦ uphill from the uphill station, whereas it reaches a maximum of 9.7◦ immediately downhill from the upper station. Furthermore, the land is utilized here as a natural pasture and meadow. The monitoring stations are located at a distance of 167 m from each other.

#### *2.2. Field Monitoring Setup*

To track local moisture dynamics, SM monitoring was installed for each study site in December 2018. At each site, two monitoring stations were deployed. Rainfall was measured using tipping-bucket rain gauges (ECRN-100, Meter Group Inc., Pullman, WA, USA) of 0.2 mm resolution. Rain gauges as well as WP4 temperature and relative humidity sensors were installed only at the uphill stations. At both stations of each site, TDR-type soil moisture sensors (Meter Group Inc., Teros 12) and tensiometers (Teros 21) were used to measure volumetric water contents and matric potential, respectively. At each station, 4 sensors were deployed at depths of 10 and 30 cm (one soil moisture sensor and one tensiometer at each depth). The depths were selected based on soil type (loamy soil) and the typical crops grown in SW Hungary. Soils to a depth of 30 cm experienced the largest fluctuation in soil moisture. TDR sensors had been laboratory-calibrated prior to their installation in the field. Data were logged and stored with EM-60 data loggers at a time interval of 15 min.

#### *2.3. Particle Size Analysis of the Soil Samples*

Soil samples were taken from the depths of the sensors. Organic matter and CaCO3 were removed from the samples using H2O2 and 10% HCl, respectively. The grain size distribution of the soil samples was determined with static light scattering using a Malvern MasterSizer 3000 (Malvern Inc., Malvern, England, United Kingdom) particle size analyzer. The textural type was determined using an MS Excel macro (https://www.nrcs. usda.gov/resources/education-and-teaching-materials/soil-texture-calculator, accessed on 10 January 2023) and clay–silt and silt–sand boundaries at 2 and 63 μm, respectively.

#### *2.4. Calculation of the Pálfai Drought Index and the Aridity Indices*

The Pálfai Drought Index (hereafter PaDI, [34]) was calculated for the three study sites using mean monthly temperatures and weighted monthly precipitation totals. Potential evapotranspiration was calculated using the Thornthwaite equation [35,36], widely applied for the estimation of PET under humid and subhumid temperate climates.

#### *2.5. Analysis of Field Data*

The field data were statistically analyzed using MATLAB R2020b and MS Excel programs. The statistics focused on descriptive statistical parameter calculations (mean, median, standard deviation, minimum, maximum and range of the data). Boxplots were generated by the MATLAB program from raw data (excluding missing or inappropriate values). The general description of the boxplots was the following: the boxes' sizes show the interquartile range (IQR, data fall into 25–75% percentile), the median was plotted on the boxplots (red lines) and the outliers were measured (all variables located at a distance from the median 1.5 times larger than the IQR were outliers). The whiskers showed the 5–95% percentiles.

#### **3. Results**

#### *3.1. Soil Textural Types*

Soil texture at the three sites was dominated by the silt fraction, hence the soils were classified as either silt or silt loam types (Table 2). In general, all samples of the ploughland and the foothill pasture had higher sand content (24–32%) than the other sites.

**Table 2.** Fine earth fractions and textural types of the soils of the monitoring sites.


#### *3.2. Water Balance*

Rainfall distribution showed a rather contrasting picture among the three sites. The highest rainfall for both the summer and the period of January to August was measured for the orchard site in 2019 and for the pasture in 2022, whereas the lowest rainfall total in 2022 was observed in the orchard (Table 3).

**Table 3.** Precipitation totals [mm] for the periods of January to August and June to August, 2019 and 2022.


Both the antecedent (January to May) and summer precipitation of all three study sites indicated marked contrasts between the two studied years (Figure 2). January and May's monthly precipitation totals demonstrated the largest variation between 2019 and 2022.

Mean monthly temperatures showed a less diverse pattern than rainfall among the three sites. Both the highest mean annual and the highest summer mean temperatures were recorded in the orchard in 2019 and 2022 (Table 4). Consistently, the lowest temperatures were registered at the pasture in both studied years. Mean summer and mean annual temperatures were about 1 ◦C and 0.2 ◦C higher in 2022 than in 2019, respectively. The differences between spring season average temperatures were minor; however, the growing season temperature at all sites was about 0.7 ◦C higher in 2022 than in 2019. The greatest

variation was measured in autumn and differed by 0.8 to 1.1 ◦C from site to site (higher in 2019).

**Figure 2.** Monthly precipitation totals from January to August in (**a**) 2019 and (**b**) 2022.

**Table 4.** Mean annual, growing season (April–October), spring (March to May), summer (June to August) and autumn (September to November) temperatures [◦C] in 2019 and 2022.


The higher summer temperatures and lower precipitations of 2022 compared to 2019 generated significant differences in PET, aridity index and PaDI both seasonally and annually. Monthly potential evapotranspiration revealed the greatest variations between 2019 and 2022 in the summer months. The close-to-record temperatures in July (Tmax = 39.2 ◦C in Palkonya) generated a monthly PET of almost 160 mm at all three sites. The largest differences in PET were registered in May.

PaDI and the aridity index demonstrated great variations between the two studied years. Due to the high rainfall totals of 2019 at the orchard site, its PaDI did not indicate any water stress in the first study year, whereas the other two sites had a PaDI of 4.2 and 4.6 which referred to mild drought conditions. Although drought conditions remained in the class of mild drought in 2022 at the pasture and the ploughland, all three sites experienced water stress with the highest increase in PaDI at the orchard site, which entered the class of moderate drought (Figure 3a).

Aridity indices were around 1 in 2019 (common for Hungary since the onset of meteorological measurements), whereas they turned into a negative water balance (AI > 1) for the ploughland and the orchard by 2022 (Figure 3b).

Figure 4 shows the aridity index, total monthly precipitation and monthly evapotranspiration in 2019 and in 2022 for the ploughland, pasture and orchard locations. All sites in 2022 experienced semiarid or arid conditions especially during the growing season (from April to October), except for September. However, in 2019 aridity was less severe, but still reached the semiarid category during this most critical season. The orchard site remained under subhumid conditions. Compared to Figure 4 (PaDI), the orchard site showed more

intense drought in 2022, since over the growing season it experienced a lower amount of plant-available water (less precipitation) against high evapotranspiration.

In terms of the average aridity index, precipitation and evapotranspiration, summer periods had intensively critical values for almost all sites, especially in 2022. The main difference was found in the growing season when higher evapotranspiration and lower precipitation in 2022 resulted in more arid conditions (Table 5). Nevertheless, on an annual basis and in autumn the differences were negligible. During the winter of 2021/2022, all sites received a substantial precipitation amount, but this was not able to compensate for the low rainfall during the growing season in 2022. It is to be pointed out that in 2021 the summer, and, in fact, the entire growing season were also moderately dry (mild drought, with around PaDI = 5, not shown here).

**Table 5.** Summary statistics of the annual, growing season (April–October) and seasonal (spring, summer, autumn and winter. Seasons are the same as in Table 4, and the winter periods have been selected as the following: 2018/2019: December 2018–February 2019 and 2021/2022: December 2021–February 2022 mean of aridity index (AI), precipitation (P) and evapotranspiration (ET0).


**Figure 4.** Monthly precipitation totals (prcp), evapotranspiration (ET0) and aridity indices (AI) for the different sites (pasture, ploughland and orchard) of 2019 and 2022. Solid lines depict the value of AI = 1, dashed lines AI = 1.5 and dotted lines AI = 4 as a reference for humid (AI < 1), subhumid (1 ≤ AI ≤ 1.5), semiarid (1.5 ≤ AI ≤ 4) and arid (AI ≥ 4) conditions, respectively.

#### *3.3. Soil Moisture Regime*

In general, the SM regime showed a rather variable picture for the three study sites. On average, the natural pasture had the highest volumetric water content and soils and that site had the most responsive behaviour to rainfall events. The foothill site of the ploughland, however, had the lowest SM contents and the least variability of SM among all sites (Figure 5). The orchard site revealed a contrasting picture between the two studied years due to (i) necrosis, tree removal above the upper monitoring station and (ii) the markedly less precipitation in 2022 compared to 2019. While the moisture regime at this site demonstrated a great variability in 2019, the SM content showed a monotonously decreasing trend over the summer of 2022.

**Figure 5.** Soil moisture dynamics of the study sites (first column ploughland, middle column pasture and right column orchard) in the summers of 2019 (1st and 3rd rows) and 2022 (2nd and 4th rows), for (**a**–**f**) 10 cm and (**g**–**l**) 30 cm. The values on the right upper corners in subplots from (**a**) to (**f**) represent the precipitation amount during the summer (June to August).

The statistics of SM indicated marked variations among the three land use types. Commonly, the pasture showed the highest SM content, and the ploughland had the lowest median SM values. The footslope station of the ploughland revealed the lowest SM content and the lowest SM range, likely influenced by the grove uphill. On the other hand, the orchard showed the greatest range of SM over the two studied summers due to its extreme water balance and the removal of tree canopy at the upper station (Figures 6 and 7). The natural pasture presented the greatest water stress tolerance and the most homogeneous water dynamics among the three land use types. At all stations, mean and median SM contents in 2022 were either equal to (at the ploughland footslope, 30 cm) or lower than those in 2019 (at all other stations).

**Figure 6.** Mean soil moisture values of the study sites in the summer of 2019.

**Figure 7.** Mean soil moisture values of the study sites in the summer of 2022.

#### **4. Discussion**

Our research confirmed former results [16,21,27,30,37–45], i.e., moisture content of the vadose zone is markedly influenced by land use type, distance from landscape and morphological features, local water and moisture balance and soil texture.

In both summers, the most optimal moisture content (i.e., the highest median with the lowest variability) was revealed in the pasture. The highest median SM content was found here over both studied summers. A higher drought risk during the two studied summers was found both at the ploughland and the orchard. SM contents were low in the ploughland on many occasions and for prolonged periods in 2019, during which the matric potential fell below the permanent wilting point (data are not shown here). The longest of such periods lasted for 73 and 101 days in 2019 and 2022, respectively, at the upper station of the ploughland. According to our a priori hypothesis, the ploughland should have had a large evaporation loss and hence low mean SM content. Yet, due to the dense canopy cover of soybean until late September, evaporation loss caused by direct solar radiation was limited. The orchard performed well when the canopy was present at the site (in 2019); however, when the land use type was changed by 2022, its water retention capacity deteriorated. As a consequence, the orchard showed a markedly more negative water balance in 2022 compared to 2019, further exacerbated by the effect of the absence of canopy. The negative water balance of 2022 in the orchard produced the lowest median SM content at both monitored depths of the upper stations.

Our results have been partly confirmed by the findings of Wang et al. [46], who found the second highest SM content in grasslands in eastern China. Nonetheless, in their study,

corn had the highest SM content due to the reduced evaporation from the soil. However, Shi et al. [47] pointed out that intense transpiration created extreme water stress in orchards in the Loess Plateau of China, underpinning our result concerning the high variations of SM at the orchard site.

Opposed to the findings of Tölgyesi et al. [48], we did not find compelling evidence of the drying effect of trees in the top 30 cm of the soil. This may be explained by (i) the removal of trees, (ii) the finer soil textural types of our site and (iii) the extreme water balance of the orchard site in 2022 compared to the ploughland and the pasture. Our results, however, indicated the enhanced water retention and water storage capacity of the soil due to the canopy cover and shading of the orchard (cherry) trees during the summer of 2019, and hence contributed to the overall roles of natural ecosystem services. This finding is corroborated by the results of Syrbe and Grunewald [41] and Ribeiro and Šmid Hribar [49]. In a good agreement with the findings of the present study, previous studies also demonstrated the benefit of low-impact agricultural practices for the reduction or possibly the termination of the decline of plant-available water [50,51].

Depending on the water balance and the proximity of the groundwater table, the direction of water motion may differ temporally or seasonally. During the summer, according to matric potential data, capillary rise was common at all monitoring stations [30]. Such capillary rise-dominated periods were intermittently interrupted by intense infiltration events such as surface runoff and probably also through flow rates also intensified during heavy thunderstorms (e.g., 2 August 2019: 51.5 mm and 9 June 2022: 55.2 mm). Leitinger et al. revealed the marked influence of the slope gradient on the distribution of SM along the hillslope in the Eastern Alps [40]. They also found a significant influence of water balance and land management type on infiltration and surface runoff [40]. Their findings revealed the impact by cattle trampling and treading; however, at our pasture site, no grazing animals were kept.

#### **5. Conclusions**

The marked variations among the three study sites can be partially explained by the difference in land use types, whereas the contrast between 2019 and 2022 can be explained by the influence of water balance. The most stable behaviour was found for the natural grazing land in both years. Our hypothesis, however, according to which the ploughland should have demonstrated the worst moisture dynamics, was not proven, especially in 2022. This is likely attributed to the (i) spatial variability of water balance and (ii) crop type, as soybean forms a relatively dense canopy early in the growing season and harvesting is commonly timed to late September and early October in SW Hungary. Hence, evaporation loss from the soil due to direct irradiation is limited. A third possible reason for the observed variation among the three sites is the insufficient spatial resolution of field-monitored SM data.

Our result may be utilized by stakeholders in the field of agricultural and farming businesses especially under subhumid climates of the temperate zone.

Although tackling the challenges posed by drought is not a novel phenomenon in the Pannonian Basin, adaptation to altered climates is not just an option presently: it is crucial. Therefore, a long-term analysis of climatic trends and water budget is indispensable, similar to the site-specific differentiation and optimization of agriculture. The present study aimed at providing data for studies of this type and delivering a more accurate understanding of these processes with the aim to adjust ecosystem services at a local scale. The present study could be improved with analyses performed (i) under more controlled conditions, (ii) within a more restricted geographical area, (iii) using hydrologic models and (iv) estimating local water balances at a higher spatial resolution.

**Author Contributions:** Conceptualization, S.C.; methodology, S.Á.F.; formal analysis, N.S., E.P., A.C. and M.I.; investigation, S.Á.F.; data curation, G.N., S.C. and R.B.; writing—original draft preparation, S.C.; writing—review and editing, N.S., D.L., G.P., R.C., M.F. and S.Á.F.; visualization, N.S. and

G.N.; project administration, D.L.; funding acquisition, D.L., R.C. and M.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors are grateful for the financial support from the National Research, Development and Innovation Office (NKFIH) within the framework of the Hungarian–Slovenian collaborative project "Possible ecological control of flood hazard in the hill regions of Hungary and Slovenia" (contract no SNN 125727) and within the framework of the program Excellence in Higher Education, Theme II. 3. ("Innovation for sustainable life and environment"). The study was financed by the project "Possible ecological control of flood hazard in the hill regions of Hungary and Slovenia" (SNN 125727; Slovenian Research Agency ARRS, N6-0070) and a research programme Geography of Slovenia (Slovenian Research Agency ARRS, P6-0101).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are located on the University of Pécs data storage system (One Drive–Pécsi Tudományegyetem), which due to system safety issues are available upon request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Assessing Soil Erosion by Monitoring Hilly Lakes Silting**

**Yamuna Giambastiani 1,\*, Riccardo Giusti 1, Lorenzo Gardin 1, Stefano Cecchi 1, Maurizio Iannuccilli 1, Stefano Romanelli 2, Lorenzo Bottai 2, Alberto Ortolani 1,2 and Bernardo Gozzini 1,2**


**Abstract:** Soil erosion continues to be a threat to soil quality, impacting crop production and ecosystem services delivery. The quantitative assessment of soil erosion, both by water and by wind, is mostly carried out by modeling the phenomenon via remote sensing approaches. Several empirical and process-based physical models are used for erosion estimation worldwide, including USLE (or RUSLE), MMF, WEPP, PESERA, SWAT, etc. Furthermore, the amount of sediment produced by erosion phenomena is obtained by direct measurements carried out in experimental sites. Data collection for this purpose is very complex and expensive; in fact, we have few cases of measures distributed at the basin scale to monitor this phenomenon. In this work, we propose a methodology based on an expeditious way to monitor the volume of hilly lakes with GPS, sonar sensor and aquatic drone. The volume is obtained by means of an automatic GIS procedure based on the measurements of lake depth and surface area. Hilly lakes can be considered as sediment containers. Time-lapse measurements make it possible to estimate the silting rate of the lake. The volume of 12 hilly lakes in Tuscany was measured in 2010 and 2018, and the results in terms of silting rate were compared with the estimates of soil loss obtained by RUSLE and MMF. The analyses show that all the lakes measured are subject to silting phenomena. The sediment estimated by the measurements corresponds well to the amount of soil loss estimated with the models used. The relationships found are significant and promising for a distributed application of the methodology, which allows rapid estimation of erosion phenomena. Substantial differences in the proposed comparison (mainly found in two cases) can be justified by particular conditions found on site, which are difficult to predict from the models. The proposed approach allows for a monitoring of basin-scale erosion, which can be extended to larger domains which have hilly lakes, such as, for example, the Tuscany region, where there are more than 10,000 lakes.

**Keywords:** sediment monitoring; remote sensing; lakes; water capacity; sonar; aquatic drone; USLE; MMF

#### **1. Introduction**

#### *1.1. Erosion: A Worldwide Threat*

After almost a century of research and studies on the territory, soil erosion caused by water, wind and tillage is known to be the greatest threat to soil health, and to the ecosystem services it provides, in many regions of the world [1–3]. Its impact on global crop production has been estimated at a reduction of 0.4% per year [4]. Some authors argue that nearly a third of the world's arable land has been lost due to erosion over the past 40 years and continues to decrease at a rate greater than ten million hectares per year [5]. Erosion is a natural phenomenon which consists of the loss of the most superficial layer of the soil due to the action of precipitation or wind. With the advent of modern agriculture and, above all, with (I) the introduction of extensive mechanization, (II) the leveling of the

**Citation:** Giambastiani, Y.; Giusti, R.; Gardin, L.; Cecchi, S.; Iannuccilli, M.; Romanelli, S.; Bottai, L.; Ortolani, A.; Gozzini, B. Assessing Soil Erosion by Monitoring Hilly Lakes Silting. *Sustainability* **2022**, *14*, 5649. https:// doi.org/10.3390/su14095649

Academic Editors: Blaž Repe, Mario Elia and Antonio Ganga

Received: 30 March 2022 Accepted: 3 May 2022 Published: 7 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

slopes, (III) the abandonment of traditional hydraulic-agricultural solutions and (IV) the specialization of crops, erosion has assumed worrying proportions [6–8]. Erosion is now a worldwide threat, especially in hilly areas with significant economic impacts, particularly in areas with valuable crops [9,10]. Water erosion represents one of the main threats to the correct functionality of the soil, through (I) the removal of the fertile surface soil horizon, (II) the denser subsoil incorporation in the surface layer and (III) the possible decrease in the root zone [11].

A reliable assessment of this phenomenon is therefore particularly useful as a decisionsupport tool for planning soil conservation interventions [12–14]. To address these issues, the Community Agricultural Policy has ensured that agriculture is in line with the EU soil protection policies. Effective management of these issues is considered essential for many strategies and priorities of the European Green Deal, as defined primarily in the (I) thematic strategy and the sustainable management of soil [15], (II) the fight against erosion, and (III) the fight against the loss of organic carbon and biodiversity in soil. The quantitative assessment of soil erosion, due to both water and wind, is generally carried out through modeling the phenomenon or with experimental tests (plots, rain simulators, etc.) carried out directly on the field [16]. In recent decades, researchers in Italy have also conducted several direct studies on the phenomenon of erosion [17–22].

#### *1.2. Models for Erosion Estimation*

The most commonly used erosion estimation model is the universal equation of soil loss (USLE) [23], and its revised version (RUSLE) [24], which estimates the annual mean long-term loss of soil due to sheet (interrill) and rill erosion. It should be noted that soil loss caused by (ephemeral) gully erosion is not predicted by RUSLE [25]. Despite its shortcomings, RUSLE is still the most widely used model on a large scale [26,27]. It can process data input for large regions and provides a basis for scenario analysis and taking actions against erosion [28]. A recent work [29] estimated erosion on a European scale using the most in-depth processing of the single factors [29–32]. USLE has been applied in comparative studies between various analysis methods, and the authors have shown that it does not lead to greater errors than process-based physical models (WEPP and PESERA), although it has some limitations due to the simple empirical nature of the model [33–35]. Soil loss is also analyzed worldwide through the revised MMF—Morgan–Morgan–Finney model [36,37], in order to evaluate the land degradation and ecological status of specific catchment areas or wider territories [38–40]. This model allows an erosion simulation to be developed in relation to the characteristics of the vegetation cover. The comparison between these empirical models shows similar results [41,42]. The Soil and Water Assessment Tool (SWAT) is a semi-empirical model used for the assessment of erosion phenomena at the basin scale [43–45]. It also allows analyses related to hydrological processes [46], land management and climate change [47,48]. Other simulations have been performed directly on reconstructions of hydrographic basins in miniature or in experimental sites [49–51].

In Tuscany in 2009, soil erosion estimates were made at a regional scale with the USLE model through the elaboration of single climatic, pedological, land use and morphometric factors at high resolution [52,53]. Direct measurements carried out over the years in experimental fields [54–56] have allowed the model to be properly constrained and tested.

#### *1.3. Scope of Work*

With this work, we propose a new approach to monitor erosion phenomena at the basin scale, based on an expeditious estimate of the hilly lakes silting rate, through remote sensing techniques. The basic assumption is that a relationship exists between the soil loss (or sediment production) from the basin with the volume loss of the reservoir. The silting rate, estimated by sonar and aquatic drone [57], is compared with the soil loss obtained from two models (RUSLE and MMF) in order to evaluate erosion through direct measurements of the sediment produced. The hilly lakes distributed throughout the territory can be considered the containers of the sediment coming from the erosion phenomena of the slope, and therefore constitute a net of distributed monitoring. This expeditious method of estimating the reservoir capacity for the study of erosion phenomena is an innovative tool that enables the estimation of the sediment produced by a slope. The aim is to demonstrate that through the repetition over time of a simple procedure of silting estimation, it is possible to observe the evolution of the erosive phenomena and better understand the impact of anthropogenic actions or climate change on the quality of soils. In Tuscany, there are about 5000 lakes with a surface greater than 1000 square meters, which can become the mean for the distributed monitoring of erosive phenomena.

#### **2. Materials and Methods**

#### *2.1. Lakes Analyzed and Volume Changes*

The study took into consideration 12 hilly lakes in Tuscany, shown in Figure 1, of which the volume calculated in 2010 is known, thanks to a past survey by the former Agency for Development and Innovation in Agriculture (ARSIA) of Regione Toscana (RT regional administration of Tuscany). The volume estimate was carried out by a private company (Aquaterra, Florence) using a boat, a depth sounder and GPS, thus carrying out a bathymetric survey [58]. LaMMA Consortium, applying the methodology described by Giambastiani et al. 2020 [57], measured the lakes again in 2018 thanks to a monitoring project. We assume that the two methodologies are comparable, as the same types of tools and procedures are used. Furthermore, the measurements were carried out for both years in early spring, when the reservoir tends to be full from winter rains and agricultural use is limited. Each lake and its basin was evaluated and investigated in order to carry out a modeling analysis of the surface erosion with the common models (RUSLE, MMF). These lakes are mainly used for irrigation of agricultural crops; however, some are used for sport fishing, forest-fire-fighting and other objectives. Table 1 shows the main characteristics of the basins corresponding to such lakes. Table 2 reports historical data regarding the lakes volume at the time of construction. The comparison between 2010 and 2018 is summarized in Table 3.

**Figure 1.** Lakes geographic distribution in Tuscany, Italy.

In order to implement the erosion estimation models (RUSLE and MMF), data for the hydrographic basins were collected relative to the precipitation (Figure 2) of the meteorological stations closest to the lakes; the hydrological network was elaborated, for each basin, from a DTM (Digital Terrain Model) with resolution 10 × 10 m (Figure 3); and land cover was processed via photointerpretation (Figure 4, Table 4) in 9 main classes.

**Table 1.** Main characteristics of the basins corresponding to the studied lakes (Appendix A): Altitude and slope are obtained from a DTM with 10 m resolution. Hydrographic networks are taken from a database of the regional administration of Tuscany (https://www.regione.toscana.it/-/geoscopio, accessed on 1 April 2018); viability is taken from the OpenStreetMap database.


**Figure 2.** Rainfall trends in the period object of study, for the nearest weather stations. The legend on the right side associates each weather station to the corresponding lake(s).

**Figure 3.** Relationships between lake surface and corresponding drainage basin.

**Figure 4.** Distribution of land-cover classes for each lake. The land-use classes are shown along the x axis, described in Table 4, while the legend shows the lake GID. In X axis: 100 = artificial surfaces; 210 = lands under a rotation system used for annually harvested plants and fallow lands; 220 = permanent crops (vineyards and olive groves); 301 = forest with a complete canopy closure or a little less; 302 = forest with a sparse canopy closure (40–60%), shrubs and max 10% of soil bare; 303 = degraded forest (canopy closure less of 40%), shrubs cover of 40% and bare soil max 30%; 320 = permanent shrub and/or herbaceous vegetation associations; 330 = degraded soil or bare rock; 500 = water bodies.

From research conducted in the archives of the body in charge of the regulation and authorization of artificial lakes, it was also possible to recover the lake volume on the project deposited during the authorizations phase. From what has been learned, however, unregistered changes have often occurred regarding the morphology of the reservoir, so the related dimensional parameters are to be considered just indicative (Table 2, Figure 5).

**Figure 5.** Trend of the reservoir capacity from construction to the analysis period. Dashed lines indicate the uncertain range, while solid lines indicate the trend found with the actual analysis.


**Table 2.** Construction year and project volumes.

**Table 3.** Lake parameters for the years 2010 and 2018: surface area, volume, variation (in volume, percentage and percentage per year), and silting rate, the latter normalized to the lake surface area. For 2010, we show harmonized volumes, indicated as 2010 h.


#### **Table 4.** Land cover classes corresponding to C and P factors.


Harmonization

In order to compare the two volume measures, a lake-surface-based harmonization was applied, as this parameter (surface area) was easily obtained from the Tuscany Region orthophotos at 20 cm resolution (https://www.regione.toscana.it/-/geoscopio, accessed on 1 April 2018). Harmonization is necessary because in two different years we could have a different reservoir capacity due to the water level, according to different previous rainfall. It is based on the surface variation between 2010 and 2018 (Figure 6), according to the following equations.

**Figure 6.** Orthophotos of lake "gid 8477" where it is possible to check the difference in water level.

We can write a generic expression for the lake volume *V*, according to Giambastiani [57], as:

$$V = \iint h(\mathbf{x}, \mathbf{y}) \cdot d\mathbf{x} dy = \int\_{S} h \cdot d\sigma \tag{1}$$

where *h* is the height that depends on the coordinate positions (*x*,*y*), which we have omitted in the second step, rewriting the integral as a surface one (*σ*). Using the integral mean value theorem, we can write *V* as:

$$V = \langle h \rangle\_S \cdot S \tag{2}$$

*h<sup>S</sup>* being the lake height average (over the surface *S*). If we assume that its variation is negligible for limited variation of *S*, we can write the variation Δ*V* of the volume as a linear function of Δ*S*, the latter being the variation of the surface with time (depending for instance to rainfall, evaporation, etc.).

$$
\Delta V = \langle h \rangle\_S \cdot \Delta S = \underline{h} \cdot \Delta S \tag{3}
$$

The last step is just to rename *h<sup>S</sup>* with *h*, both for simplicity and to highlight the assumption that it is no more dependent on the surface extension *S*.

In practice, we have measured *h* for the reference year *y*0, which for us was the year 2010, for which we had the ARSIA measurements of the surface areas with the corresponding volumes.

Where the "Surface measured 2010" is the ARSIA surface, measured simultaneously with the volume

$$
\underline{h} = \frac{V\_{y\_0}}{\mathcal{S}\_{y\_0}} \tag{4}
$$

For a generic year *y*, the volume to be compared with the one at the reference year *y*<sup>0</sup> becomes:

$$V\_y = V\_{y\_0} + \Delta V = V\_{y\_0} + \underline{h} \cdot \Delta S \tag{5}$$

with Δ*S* as the surface difference between 2018 and 2010; surfaces were obtained through photointerpretation of orthophotos.

#### *2.2. Erosion Simulation by the Morgan–Morgan–Finney Model*

The MMF model divides the soil erosion process into two phases: the phase related to the water component, which determines the energy of the rainfall, and the phase related to the production of sediments, based on the characteristics of the soil. Soil loss in relation to erosion is determined on the basis of precipitation and transport capacity, influenced by soil cover and slope [41,59]. The MMF model is implemented within the open-source SAGA GIS software. Input data come from various sources. Starting from the Digital Terrain Model (DTM), with a resolution of 10 m, the slope map and the channel network were elaborated, while the "plant height" map was obtained through the Crown Height Model (CHM—10 × 10 m). Canopy cover, permanent interception and ground cover derive from Sentinel-2 image processing, in particular based on the NDVI calculation [60], with 10 m resolution. The characteristics of the soils (bulk density, effective hydrological depth, percentages of clay, sand and silt, etc.) are derived from the soil database of RT (http://www502.regione.toscana.it/geoscope/pedologia.html, accessed on 1 April 2018). Other necessary input variables for the model were obtained from direct processing of the land use and land cover map, carried out by photointerpretation. Annual precipitation data were taken from the meteorological stations closest to the lakes in question. In particular, the average distance between the lakes and the rain gauges is 4.5 km, with a standard deviation of 1.8 km.

#### *2.3. Erosion Estimate by RUSLE Model*

The Universal Soil Loss Equation (USLE) [23] and subsequent revisions (RUSLE) [24], is an empirical relationship, as it derives from experimental plots carried out in the United States and from the mathematical definition of the results found from these plots, which models soil erosion as a process resulting from a set of six main factors: the energy and intensity of precipitation (R factor), the erodibility of the soil (K factor), the length and slope of the plot (LS factor), vegetation cover (C factor) and conservation practices (P factor). The employed R factor is derived from the SIAS project (ISPRA 2016), through which a simplified relationship is elaborated between the amount of rainfall and the erosion value [61]. This simplification does not critically affect the accuracy, as it emerges from comparison work [31]. In fact, the mean value of R factor for Tuscany is 1765 (standard deviation = 710) against 1748 (standard deviation 365), as found by Panagos 2015 [29]. Larger fieldwork, developed in 2006 for punctual soil data, allowed the calculation of K factor carried out following the original methodology [23]. The LS factor was calculated for the basins of each lake, using a digital elevation model (DEM) of 10 × 10 m, with the method developed by Desmet and Govers 1996 [62]. In order to avoid overestimation of the LS factor in heterogeneous landscapes, the lengths of long slopes were limited to a value of 333 m [24,62]. The length exponent (m) is based on the original USLE method [63]. The C factor was completely updated, carrying out a detailed photo interpretation on orthophotos, using functional classes for the purpose. For each soil cover class identified, the values C and P were attributed as in Table 4, consistent with the values reported in the bibliography [24,31,64,65].

From the photo interpretation, it was possible to verify that in the agricultural landscapes of the study areas, there were no particular conservation techniques similar to those already codified in the RUSLE model: for this reason, we decided to always adopt a factor P = 1.

#### Sediment Delivery Ratio

The Sediment delivery ratio (SDR) is applied to estimate the amount of sediments produced by the erosion phenomena that reaches the lake [66]. SDR is the erosion fraction, generated by each single source cell, which reaches the nearest permanent drainage line. As a first approximation, the SDR can be considered constant for the whole basin or sub-basin [67], but in recent works, it is calculated pixel by pixel as a function of the length and slope of the path in the downstream direction [68]. The most complete al-

gorithm for its modeling is proposed by [65], which accounts for the connectivity index (IC) for each pixel, considering the morphological and hydrological characteristics of both the hydrological upstream and downstream portion of the pixel. It describes the hydrological link between sediment sources and collection and transfer points such as streams [69]. For SDR calculation in the study area, we have used the InVEST model (https://naturalcapitalproject.stanford.edu/, accessed on 3 September 2021), which implemented the algorithm of Borselli [65], making some minor simplifications.

#### **3. Results**

The silting rate (SR) (Figure 7) is very variable among the lakes under study and no significant relationships appear between the lake volume or the surface of the basin, or other main dimensional parameters. Some lakes have a high volume variation (e.g., 12964, 5171) and this is not related to either the size of the lake or the basin. The annual silting rate is obtained by dividing the volume variation by 8. We consider this operation significant as the surveys were carried out, in both cases, during the months of March and April, with a difference of a few weeks. The lakes' surfaces did not vary much during the 8 years of analysis. Figure 8a shows the relationship between the area of 2010 and 2018. The strong correlation indicates that the variation in the surface responds to ordinary dynamics. The surface change is present in almost all lakes; from this, we can confirm the need to harmonize volumes. While this process may be the source of further processing errors, we believe it is robust as the volume variations are small. A greater variation may exist in the relationship between the lakes' mean depths (Figure 8b). For all cases, we found that the mean depth has decreased. This is further confirmation of the soundness of the harmonization process. The correlation matrix highlights significant relationships between several parameters considered (Figure 9). Among these, we find a good correlation between soil loss and the erodibility of the lithology (K\_lito\_Sl, r = −0.72), the quantity of specialized (olive grove, vineyard) agricultural land cover (r = 0.85) and with the presence of roads (r = 0.68).

In Figure 9, we summarize the many relationships developed between the silting and the characteristics of the lake or basin. Among these are the physical characteristics of the basin, such as the altimetry and the difference in height (alti-max and altit\_lake, disl\_basin), the length of the network present in the basin (hydro\_network), the presence of roads (road\_net), the various soil classes (sup\_ucs: Table 4) and rainfall (rain\_acc, num\_event).

**Figure 7.** Silting rate and water capacity of the lakes under study.

Comparing the silting rate (SR) of each lake, obtained with the proposed methodology [57], with the annual soil-loss values obtained from the described empirical models, the link between these methodologies is evident (Figure 10). In absolute terms, the sediment values produced (SL) were almost always lower than the SR values. SL by RUSLE was similar to SR for three lakes. In seven cases, however, it was different, even if it was

consistent with the MMF results. For example, Lake 5171 had the highest silting rate with 3701 Mg y<sup>−</sup>1, which was similar to the MMF estimate (3384), while with RUSLE, we found a third of the soil was lost. It should be noted that this lake has a much larger basin than all the others. Lake 8477 showed an inverse and very different trend compared to the models: 202 (SR), 630 (SL-RUSLE), 675 (SL-MMF). For other lakes, we found a close similarity. For example, lake 2629 had SR = 643; RUSLE = 499; MMF = 558 Mg y−1. Additionally, in statistical terms, SR showed higher absolute values (SR/ha mean = 15.1; SL MMF/ha mean = 7.9; SL RUSLE/ha mean = 6.3).

Using the RUSLE and SAGA MMF models, it was possible to estimate the soil-loss rate of each catchment area [70], which was compared with the average silting rate of the reservoirs (considered as the closure section of the basin), in the period 2010–2018. The average sediment produced per hectare for the entire analysis sample was in line with the work of Angeli et al. 2004 [71]. The average annual soil loss per hectare obtained by RUSLE was 6.35 Mg, while MMF returned 7.96 Mg. The average silting per hectare of catchment area was 15.13 Mg y−<sup>1</sup> (Table 5). Both models showed a good correlation with the silting rate measured for each lake (Figure 11), but with a stronger relationship with the RUSLE model. The good significance of such relationships suggests a close link between the loss of soil and the silting rate, as visible in Figure 10.

**Table 5.** Summary of sediment volumes and average values per hectare, obtained from field surveys (silting) and models (soil loss for RUSLE and MMF).


**Figure 8.** Relationships between variations in surface area (**a**) and mean depth (**b**).

**Figure 9.** Correlation matrix, based on Pearson's analysis.

**Figure 10.** Comparison between the soil loss (SL) calculated with the models (RUSLE and MMF) and the silting rate for each individual lake.

**Figure 11.** Relationship between the measured silting rate and the soil loss obtained from the MMF and RUSLE models.

#### **4. Discussion**

The methods that allow the monitoring of erosive phenomena are often very expensive to apply: the models require several input variables and therefore (I) challenging surveying campaigns, (II) data collection from different sources to be harmonized and (III) aerial images and high-resolution digital terrain models for GIS analysis and photointerpretation. All this involves high costs, low reaction time in carrying out analyses at critical moments, and possible relevant evaluation errors due to the high number of interactions and variables to take into account [4]. The proposed approach for the erosion evaluation is based on monitoring the water capacity of hilly lakes, which are considered as real sediment containers. The variation in lake volume, following silting, can be linked to the production of sediments from the afferent basin. Given the high distribution of artificial hilly reservoirs in the Tuscan territory, the proposed approach could create conditions for a periodic assessment of the degradation phenomena of the soil and the territory in general. The use of input data with greater accuracy and precision, such as using LiDAR Dems [70] and an in situ meteorological station for precipitation measurements, could lead to improvements. This approach is also easily applicable to different contexts, as long as they have a good distribution of hilly reservoirs. The methodology, however, has some limitations for calculating the volume of the lake, in particular due to the simplification of the shape of the lake profile [57]. Anyway, the application of this methodology, from a monitoring point of view, can be valuable for reducing major evaluation errors arising from too-long samplingtime intervals that we have when using classical approaches. In fact, field measurements for medium-sized reservoirs (up to 4–5 hectares of surface area) can be carried out in a short time (e.g., 30 min), and processing can be carried out automatically within a few minutes. This allows a wide and rapid application, analyzing wide domains with relatively low effort and costs. The results obtained show significant measures of the lake volume variation, well correlated with other physical lake variables. The comparison of the mean depth between 2010 and 2018 (Figure 8) shows a very strong relationship (R<sup>2</sup> = 0.89 and *<sup>p</sup>*-value = 3.619 × <sup>10</sup>−6), which indicates a good significance of the methodology. Furthermore, the values are located below the bisector, as in no case do we find negative silting (greater lake depth). This leads us to think further that the methodology is correct. Given the small variation in terms of volume in most cases (Figure 7), calculation errors, however accepted, could lead to mathematical anomalies. Mathematical models verify the physical process of sediment accumulation in the lake. The 12 basins analyzed are very different from each other (average surface = 78 ± 54 hectares); they mainly have soil covers relating to arable land and forest, so with high variability. We also found evident differences in terms of precipitation. The lakes examined are well distributed and are representative enough of the study area, Tuscany. Some lakes exhibit very high-volume variations (Figure 7). For example, Lake 12964 has a silting rate of 45 tons/year per hectare of basin. Close to the lake, we found a pig farm on an area of about seven hectares, a practice that has probably greatly affected erosion. Lake 5171 has the largest basin of all the lakes under study, with a greater total length of variability, another important source of sediment when maintenance is limited. Lake 8454 also has a strong silting; in this case, the data available do not show any trend and the lack of knowledge of the specific territory does not allow us to put forward hypotheses. Lake 11525 has a silting rate of 0.21 ton/year per hectare of basin, and has the basin completely covered by forest. The models used provided data compatible with the results of other authors [34,40]. The soil-loss values obtained from the basins show a significant correlation with the silting rate values (Figure 10), regardless of the intrinsic great variability. In some cases, soil loss is greater than the volume variation (8 out of 24, Table 5); in particular, this concerns large basins (about 100 hectares). One reason can be that the sediment produced by the basin does not reach the lake. Another factor that has not been taken into account concerns the suspended sediment lost by the reservoir spillway. Tauro [72] indirectly estimates suspended solids by means of water turbidity. The concentration of suspended solids typically increases with the flow speed. In a lake, the effect of containment and slowdown of the outflows allows greater sedimentation. In fact, the water flowing in the spillways usually appears clear [73]. The opposite behavior occurs in smaller basins (less than 50 hectares), which are characterized by higher silting rates and lower soil loss values. The actual erosion is caused by phenomena that, in some cases, the models are unable to consider, which concern the slopes closest to the lake (such as the case of the lake with pig breeding). Having such present shortcomings in mind, the applied methodology can be useful for building decision-support systems in spatial planning and monitoring areas that show critical characteristics related to hydrological processes [74].

#### **5. Conclusions**

A new approach for the soil erosion analysis was proposed. It consists of the use of an aquatic boat bringing a GPS sonar. In few minutes, it is able to detect data about water depth in the reservoir. Using an automatic GIS process, it is possible to obtain an estimation of the volume. The methodology was applied to 12 hilly lakes mainly for irrigation purposes, for which the volume (or reservoir capacity) was measured and repeated after 8 years, using comparable instruments (sonar with GPS). The volume variation in this period was compared with soil-loss estimates obtained from well-established models widely known in the scientific community (RUSLE and MMF), obtaining a clear relationship between the two variables. This approach allows low-cost monitoring of the soil erosion phenomena in relation to changes in land use or climate change. Being based on lakes, the analysis can refer to specific portions of the territory. The main advantage is the speed of carrying out the survey on the lake; the instrumentation is inexpensive and it is not necessary to acquire other parameters relating to the basin. The processing procedure can be automated in the GIS environment. The method can be applied to land management issues as a tool for a decision-support system. The harvesting of more data could permit the development of some estimate models about the erosion phenomena based on the silting rate of lakes.

**Author Contributions:** Conceptualization, S.C. and R.G.; methodology, L.G. and S.C.; software, L.B. and S.R.; validation, R.G. and M.I.; formal analysis, Y.G. and L.G.; investigation, Y.G., R.G., S.C.; resources, B.G. and A.O.; data curation, Y.G.; writing—original draft preparation, Y.G.; writing review and editing, A.O., S.C. and R.G.; visualization, M.I.; supervision, L.B. and A.O.; project administration, B.G.; funding acquisition, B.G. and L.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was carried out with the ordinary funds of the LaMMA Consortium and CNR-IBE and with external funding from the Tuscany Region: Decrees 100/2018 and 1345/2018 "Mappatura della totalità dei laghi in Regione Toscana e costituzione del catasto informatizzato".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The data used for this article derive from the hilly lakes census of the Tuscany Region. **Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Figure A1.** Castelfalfi lakes and basins. GID: 8454 = Castelfalfi 1; 12964 = Castelfalfi 2; 8477 = Castelfalfi 3.

**Figure A2.** Cavalcanti lake and basin. GID: 2629.

**Figure A3.** Fabbrica lake and basin. GID: 5171.

**Figure A4.** Galliano lake and basin. GID: 3036.

**Figure A5.** Pavone lake and basin. GID: 7438.

**Figure A6.** Romena lake and basin. GID: 1156.

**Figure A7.** Schifanoia lake and basin. GID: 7719.

**Figure A8.** Cornia lakes and basins. GID: 8969 = Potenti 1; 8967 = Potenti 2; 11525 = Angiola.

#### **References**


### *Article* **Mapping of Land Degradation Vulnerability in the Semi-Arid Watershed of Rajasthan, India**

**Lal Chand Malav 1, Brijesh Yadav 1,\*, Bhagwati L. Tailor 1, Sarthak Pattanayak 2, Shruti V. Singh 3, Nirmal Kumar 4, Gangalakunta P. O. Reddy 4, Banshi L. Mina 1, Brahma S. Dwivedi <sup>4</sup> and Prakash Kumar Jha <sup>5</sup>**


<sup>5</sup> Feed the Future Innovation Lab for Collaborative Research on Sustainable Intensification, Kansas State University, Manhattan, KS 66506, USA

**\*** Correspondence: brijesh.yadav@icar.gov.in

**Abstract:** Global soils are under extreme pressure from various threats due to population expansion, economic development, and climate change. Mapping of land degradation vulnerability (LDV) using geospatial techniques play a significant role and has great importance, especially in semi-arid climates for the management of natural resources in a sustainable manner. The present study was conducted to assess the spatial distribution of land degradation hotspots based on some important parameters such as land use/land cover (LULC), Normalized Difference Vegetation Index (NDVI), terrain characteristics (Topographic Wetness Index and Multi-Resolution Index of Valley Bottom Flatness), climatic parameters (land surface temperature and mean annual rainfall), and pedological attributes (soil texture and soil organic carbon) by using Analytical Hierarchical Process (AHP) and GIS techniques in the semi-arid region of the Bundi district, Rajasthan, India. Land surface temperature (LST) and NDVI products were derived from time-series Moderate-Resolution Imaging Spectroradiometer (MODIS) datasets, rainfall data products from Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), terrain characteristics from Shuttle Radar Topography Mission (SRTM), LULC from Landsat 9, and pedological variables from legacy soil datasets. Weights derived for thematic layers from the AHP in the studied area were as follows: LULC (0.38) > NDVI (0.23) > ST (0.15) > LST (0.08) > TWI (0.06) > MAR (0.05) > SOC (0.03) > MRVBF (0.02). The consistency ratio (CR) for all studied parameters was <0.10, indicating the high accuracy of the AHP. The results show that about 20.52% and 23.54% of study area was under moderate and high to very high vulnerability of land degradation, respectively. Validation of LDV zones with the help of ultra-highresolution Google Earth imageries indicates good agreement with the model outputs. The research aids in a better understanding of the influence of land degradation on long-term land management and development at the watershed level.

**Keywords:** analytical hierarchical process; land degradation vulnerability; NDVI; land surface temperature; soil properties

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **1. Introduction**

Land is a vital and precious resource to produce food, fiber, fuel, and other ecosystem services for the survival of humans and animals [1,2]. However, the constant pace of degradation and deterioration due to persistent human-induced disturbances and climatic irregularities [3] places livelihood and sustainable progress under acute threat [4]. Land degradation is a major environmental problem all around the world and influences human society and its livelihoods. Globally, the life of around 3.2 billion people totally depends on degraded lands, and around one-third of the world's lands are affected by land degradation [5,6]. In recent years, land degradation has been considered a pivotal factor

Tailor, B.L.; Pattanayak, S.; Singh, S.V.; Kumar, N.; Reddy, G.P.O.; Mina, B.L.; Dwivedi, B.S.; Jha, P.K. Mapping of Land Degradation Vulnerability in the Semi-Arid Watershed of Rajasthan, India. *Sustainability* **2022**, *14*, 10198. https://doi.org/10.3390/ su141610198

Academic Editors: Antonio Ganga, Blaž Repe and Mario Elia

**Citation:** Malav, L.C.; Yadav, B.;

Received: 19 June 2022 Accepted: 11 August 2022 Published: 17 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

in environmental issues and has attracted the attention of all stakeholders [7]. The United Nations General Assembly adopted Sustainable Development Goal 15.3 in September 2015, which focuses on achieving land degradation neutrality (LDN) by implementing the best management practices that reduce the loss of healthy land and maintain or improve the productivity of the land [8,9]. Land degradation can be defined as a spatio-temporal deterioration of physico-chemical and biological properties of land, making it unsuitable for human society, and a deterioration of the soil ecosystem, influencing agricultural production and ecological instability [10,11].

Around 24% of the world's total geographic area (approximately 3500 Mha) is severely affected by land degradation [11,12]. Around 20% of cropland, 10% of grassland, and 30% of forests are under the process of land degradation throughout the world [13]. In India, around 36.7% of total geographical area (TGA) (120.7 Mha) is under different types of land degradation such as soil erosion, soil acidity, soil salinity and alkalinity, and waterlogging [14], and soil salinity and alkalinity alone affect 6.73 Mha in different arid, semi-arid, and sub-humid areas [15]. According to the Indian Space Research Organization (ISRO), land degradation accounts for around 29.32% of the TGA of India. It covers 96.4 Mha of agricultural, forest, and non-forest land spread across the country [16]. India joined the Bonn Challenge and the United Nations Decade on Ecosystem Restoration 2021–2030 to maximize ecological and economic advantages from the restoration of degraded ecosystems, pledging to rehabilitate 26 Mha of degraded land by 2030 [17].

The problem of land degradation is especially severe in arid and semi-arid areas of the country, such as the state of Rajasthan. Land degradation affects 67% of Rajasthan's land, where wind erosion contributes to the maximum percentage (44.2%), and water erosion (11.2%), vegetal degradation (6.25%), and salinization (1.07%) are the next most common forms of degradation. Chambal ravines in the state of Rajasthan are perhaps among the worst physically degraded lands, as cultivated fertile lands were engulfed by ravines and rendered unsuitable for agricultural activities [18]. The Chambal ravines are very typical as they are deep to very deep (>20 m) and are devoid of any kind of vegetation, with ravines and gullies being the typical forms of degradation [19]. For the development of effective strategies to minimize and lessen the effects of land degradation, it is a prerequisite to understand the process of land degradation, including the causes and its consequences for major functions of the ecosystem and the proper identification of the affected area and the regions at high risk.

Modeling and assessing the vulnerability of land degradation play a pivotal role in land degradation neutrality planning and prioritization processes and in fulfilling targets for restoration. Assessment of land degradation requires various information such as climate, soil properties, topography, land use, etc. Several techniques are being adopted in monitoring and evaluating the area, rate, and type of land degradation. A survey using satellite images overcomes the time-consuming and expensive traditional survey, particularly in areas tough to assess [20]. Geospatial techniques such as remote sensing (RS) and geographic information system (GIS) play an important role in the assessment and monitoring of land degradation vulnerability. Satellite imageries with precise spatial and spectral resolution are excellent resources for detecting, mapping, and monitoring various degradation kinds and issues in a rapid, consistent, reliable, and cost-effective manner [21–25].

The integrated use of geospatial techniques with the multi-criterion decision analysis (MCDA) method is the most feasible option to assess and map land degradation vulnerability. This MCDA technique has numerous applications in multiple areas such as groundwater potential mapping, crop suitability zonation, and land degradation vulnerable mapping. It is mostly used to solve complex problems by breaking them up into sections, then solving and integrating each section to obtain the ultimate results. The AHP, which was first developed by Saaty (1980), is the most widely used multi-criterion decision method for the mapping of vulnerable zones [26,27]. Decisions may be made using this strategy based on judgements, hierarchical structure, and accurate perception, all of which have a dominant influence on the final decision [27,28]. The AHP approach is a widely recognized, basic, and well-structured decision-making technique. Few research findings have been generated by other researchers [12,13,29] with respect to the assessment and mapping of land degradation vulnerability zones (LDVZ) based on AHP and GIS modeling approaches and their validation with Google Earth imageries. Considering the importance of land degradation vulnerability assessment through remote sensing and GIS and AHP approaches, the present study was carried out in the semi-arid region of Rajasthan, western India. In the present study area, water erosion is the most important cause of land degradation due to favorable erosion geology, vegetal degradation, and the perennial Chambal River. Despite this fact, so far, no studies have been carried out in this area to assess and prepare a land degradation susceptibility map. The core objectives of the study are to (i) characterize the terrain, climatic, vegetative, and pedological variables of the watershed and (ii) identify the most vulnerable areas to land degradation using remote sensing and geospatial techniques. Furthermore, the research provides important information for long-term land use management and development.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The Chanda Kalan Watershed is in the Bundi district of Rajasthan, western India, and it lies between latitude 25◦41 N to 25◦46 N and longitude 76◦16 E to 76◦22 E. Geographically, it covers an area of 2629 hectares (Figure 1). The watershed falls within the Northern Plain (and Central Highlands) including Aravalli, a hot semi-arid eco-region (4.2) denoted as an agro-ecological sub-region (AESR). The climate of the study area is semi-arid with an average annual rainfall of 681 mm, in which the southwest (SW) monsoon contributes roughly 90% of the rainfall. The altitude ranges from 187 to 459 m from the mean sea level (MSL). The watershed is mainly drained by the Chambal River and its tributaries. Major soils are deep brown loamy and brown clayey. The important crops cultivated in the study area are wheat, maize, rapeseed, soybean, paddy, etc. Geologically, the watershed is exposed by rock formations belonging to the Vindhyan Super Group. Vindhyan sedimentary sequences have occupied a major part of the watershed. The Bhander Group of the Vindhyan Super Group and their formations (Upper Bhander shale, Balwan Limestone, Maihar Sandstone) are well exposed in the study area [30]. The watershed has a systematic drainage system, and most of the study area is drained by the southwest to northeast flowing Chambal River and its tributaries. The aquifer area formed in the watershed comes under younger alluvium.

#### *2.2. Dataset Used*

In the current study, eight thematic layers were considered to identify the land degradation vulnerable zones, including Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI), MODIS land surface temperature (LST), Climate Hazards Group Infrared Precipitation with Station data (CHIRPS) rainfall, land use/land cover (LULC), Topographical Wetness Index (TWI), Multi-Resolution Index of Valley Bottom Flatness (MRVBF), and soil texture and soil organic carbon. The Landsat 9 images and Shuttle Radar Topography Mission (SRTM) DEM data were collected from the United States Geological Survey (USGS) website (https://earthexplorer.usgs.gov, accessed on 10 February 2022). The CHIRPS rainfall, MODIS NDVI, and MODIS LST products were downloaded for the period of 10 years (2011–2020) using Google Earth Engine. Soil organic carbon data were downloaded from Soil Grids (https://soilgrids.org/, accessed on 11 February 2022). In addition, soil texture data were collected from the ICAR—National Bureau of Soil Survey and Land Use Planning, Nagpur, at 1:250,000 scale. Various datasets and their specifications are summarized in Table 1.

**Figure 1.** Location map of the study area.



#### *2.3. Processing of Data*

#### 2.3.1. Processing of Terrain Parameters

The SRTM DEM was downloaded and reprojected to Universal Transverse Mercator (UTM), 43 N coordinate system, and filled in QGIS. After that, the filled DEM was used to produce TWI and MRVBF of the watershed. The TWI is widely used to evaluate the impact of topography on different hydrological processes, and it is considered an important indicator of the wetness conditions of a particular region [31]. TWI depicts the water accumulation tendency of a region [32]. Therefore, with respect to land degradation, a higher value of the TWI indicates less vulnerability to degradation or water erosion, and vice versa. In this study, TWI was computed in the SAGA GIS using the following equation:

$$\text{TWI} = \ln(\text{As}/\tan \beta \text{ }) \tag{1}$$

where As is the area of the ascending slope and β is the gradient of the slope.

The flatness and lowness of valley bottoms are measured by an index called the Multi-Resolution Index of Valley Bottom Flatness (MRVBF). A higher value of MRVBF denotes a flatter valley with higher deposition, and vice versa. MRVBF values range from 0 to a positive integer value. In this study, MRVBF was computed in the SAGA GIS.

#### 2.3.2. Processing of Climate Parameters

The combined effects of climate, physical processes, and land use practices are often the cause of land degradation. Rainfall is the most significant factor in land degradation, and it has a direct impact on the detachment of soil particles and migration of eroded sediment [33]. As a result, it is recognized as a major factor in assessing land degradation. In this study, CHRIPS-based rainfall products of 5 km spatial resolution were downloaded for 10 years (2011–2020) using Google Earth Engine and reprojected from the Geographic Coordinate System (GCS) to the UTM 43N coordinate system in QGIS. The downloaded products were resampled to 30 m resolution in QGIS by using the bilinear interpolation technique. The intensity and distribution of land surface temperature are directly linked to the vegetative condition of a region [13]. Therefore, land surface temperature is considered as an important indicator of land degradation. In the present study, MODIS MOD11A2 products for the period of 10 years (2011–2020) were downloaded using Google Earth Engine. The data were converted to degrees Celsius (◦C) by using Equation (2).

$$\text{LST} = 0.02 \times \text{DN} - 273.15 \tag{2}$$

Subsequently, the datasets were reprojected from the sinusoidal coordinates system to the geographical coordinate system and resampled to 30 m by using the bilinear interpolation technique in QGIS. Finally, the thematic layer was classified into five subclasses: <32.70 ◦C, 32.70–33.30 ◦C, 33.30–33.91 ◦C, 33.91–34.52 ◦C, and >34.52 ◦C.

#### 2.3.3. Processing of Vegetation Parameters

Vegetal degradation is a direct indicator of land degradation. Therefore, LULC and NDVI were taken as important thematic layers for assessing land degradation. The LULC map was prepared from the downloaded Landsat 9 images using supervised classification in the QGIS environment. In the present study, MODISMOD13Q1 NDVI products were downloaded for a period of 10 years (2011–2020) using Google Earth Engine. NDVI, which is a dimensionless index, depicts the difference of reflectance between near-infrared and red bands and can be used to analyze vegetative greenness over an area. It ranges from −1 to +1, where low NDVI values indicate stressed vegetation and higher values indicate healthy vegetation. Temporal smoothing of the NDVI time series data was carried out with the Savitzky–Golay (SG) filter [34]. SG filter fits a polynomial function based on a weighted least squares regression approach. The processing was executed in Google Earth Engine and downloaded. It was then reprojected from GCS to the UTM 43N coordinate system in QGIS. Datasets were resampled to 30 m resolution by using the bilinear interpolation technique in the QGIS. The layer was classified into six subclasses: <0.15, 0.15–0.20, 0.20–0.25, 0.25–0.30, 0.35–0.40, and >0.40.

#### 2.3.4. Processing of Soil Parameters

Soil organic carbon data downloaded from Soil Grids were reprojected to the geographical coordinate system and resampled to 30 m resolution in QGIS. The layer was classified into five subclasses: <122.5, 122.5–176.5, 176.5–230.5, 230.5–284.5 and >284.5 decigram/kg. Soil texture data were taken from NBSS&LUP and resampled to 30 m in the QGIS environment. Soil texture data were classified into three classes, namely, fine loamy, clayey, and rock outcrops. The detailed methodology is given in Figure 2.

**Figure 2.** Methodology followed in the present study.

#### *2.4. Analytical Hierarchical Process (AHP)*

The most widely used and well-known GIS-based method for demarcating land degradation vulnerability zones is MCDA using the AHP technique. To make an organized decision of priorities, we need to make comparisons and a scale of numbers that show how much more important one parameter is in comparison to another in terms of the criterion being compared. The AHP is a pairwise comparison assessment theory, where parameters are compared with each other using Saaty's scale of relative importance (Table 2) [31,35].

**Table 2.** Saaty's 1–9 scale of relative importance in AHP.


The relative weight of each variable was determined by a knowledge-based spatial decision support system and referring to the literature [13,29]. We selected LULC as the first significant layer since LULC changes are one of the main human-induced activities affecting the land degradation of a region. NDVI was selected as the second most important parameter in the hierarchy as it is the most significant indicator of vegetal degradation. The soil texture was chosen as the third element in the hierarchy because soil erosion is directly controlled by the size and distribution of soil particles. The LST was selected as the fourth layer in the hierarchy, and it was mainly based on the assumption that higher LST zones have low vegetation cover compared to low LST. Other remaining layers were assigned lower order in the hierarchy. Consequently, all layers were compared to one another in a pair-wise comparison matrix.

#### *2.5. Consistency Analysis*

To authenticate the decision on the pair-wise comparison of the thematic layers and their sub-classes, the consistency ratio (CR) was utilized [28]. For computing the CR, the following equation was used:

$$\text{CR} = \frac{\text{CI}}{\text{RCI}} \tag{3}$$

where RCI stands for Random Consistency Index, and its values are based on Saaty's standard (Table 3). CI indicates consistency index, which was computed using the following equation:

$$\text{CI} = \frac{(\lambda \text{max} - \text{n})}{(\text{n} - \text{1})} \tag{4}$$

where λmax is the principal eigenvalue and n is the total number of thematic layers used in the study.

**Table 3.** Saaty's Random Consistency Index.


A CR value ≤0.10 is acceptable to conduct a weighted overlay analysis using AHP. If the CR is >0.10, the judgement must be revised to identify the cause of the inconsistency and fix it until the CR ≤0.10 is reached.

#### *2.6. Mapping of Land Degradation Vulnerability Zones*

In GIS-based modeling, AHP-based weights were given to thematic layers and their sub-classes to demarcate the land degradation vulnerability (LDV) zones. In the present study, the following equation was used to delineate the land degradation vulnerability map:

LDV = LULCCwi × LULCSCwi + NDVICwi × NDVISCwi + STCwi × STSCwi + LSTCwi × LSTSCwi + TWICwi × TWISCwi + MARCwi × MARSCwi + SOCCwi × SOCSCwi0.05 × MRVBFCwi + MRVBFSCwi (5)

where LULC, NDVI, ST, LST, TWI, MAR, SOC, and MRVBF indicate land use/land cover, Normalized Difference Vegetation Index, soil texture, land surface temperature, Topographic Wetness Index, mean annual rainfall, soil organic carbon, and Multi-Resolution Index of Valley Bottom Flatness, respectively; Cwi is the class weight and SCwi is the sub-class weight. The generated LDV map was classified into five classes, namely, very low, low, moderate, high, and very high. Using the ultra-high resolution Google Earth imagery of 2022, the very high and high LDV classes were validated at five randomly selected sites. Finally, validation of the results was performed using the ROC curve generated from the site selected from the Google Earth image. The area under the curve (AUC) was estimated from the ROC curve and its values range from 0.5 to 1. The AUC value closer to 1 implies great model performance, whereas a value near to 0.5 indicates poor prediction accuracy.

#### **3. Result**

#### *3.1. Input Parameters and Their Variability*

TWI of the watershed ranged between 1.46 and 25.94, as illustrated in Figure 3a. Five classes less than 6.13, 6.13–9.45, 9.45–12.77, 12.77–16.10, and more than 16.10—were generated after reclassification. Nearly 64% of the district area came under the first and second subclass of TWI and the remaining 36% came under other subclasses. TWI values show the parts in the study area that are more prone to water erosion.

**Figure 3.** Thematic layers: (**a**) TWI, (**b**) MRVBF, (**c**) LST, (**d**) MAR.

In this study, MRVBF is classified into three classes, namely, first class (<1.33), second class (1.33–2.77), and third class (>2.77) (Figure 3b). The highest percentage of the study area comes under third class (43%), followed by first class (35%) and second class (22%). The land surface temperature of the study area divides the whole area into six subclasses: <32.7 ◦C, 32.70–33.30 ◦C, 33.30–33.91 ◦C, 33.91–34.52 ◦C, and >34.52 ◦C (Figure 3c). The highest area 801.54 ha (30.51%) comes under subclass LST 33.91–34.52 ◦C, followed by subclass LST >34.54 ◦C occupying 630.9 ha (24.02%), and the lowest area comes under subclass LST <32.7 ◦C of an area 271.89 (10.35%). Nearly 54.53% of the study area came under LST values >34.52 ◦C and 33.91–34.52 ◦C, which lies in the central part of the watershed, and the remaining 45.37% of the area came under other subclasses. The analysis of decadal (2011–2020) mean annual rainfall trends shows that the study area is divided into two subclasses (Figure 3d), where northern, northeastern, eastern, and the majority of the central area received relatively higher annual rainfall (>843.10 mm), covering an area of 1760.04 ha that accounts for 66.88% of the total area, while southwestern and southern parts received relatively less annual rainfall (<843.1 mm), covering an area of 871.65 ha that accounts for 33.12% of the total study area.

The spatial analysis of the mean NDVI values from 2011 to 2022 divides the whole area of the Chanda Kalan Watershed (2626.56 ha) into six subclasses: 0.15–0.20, 0.20–0.25, 0.25–0.30, 0.30–0.35, 0.35–0.40, and >0.40 (Figure 4a). The highest area, 926.19 ha (35.26%), comes under the low vegetal degradation category with NDVI values of 0.30 to 0.35, followed by an area of 851.13 ha (32.4%) covering maximum greenness with very low vegetal degradation with NDVI values of 0.35–0.40. The lowest area, 40.5 ha (1.54%), falls under the very severe vegetal degradation category with NDVI values of 0.15 to 0.2. Nearly 67.66% of the study area came under NDVI values 0.30–0.35 and 0.35–0.40, which lies in the southeast and southwest watershed, and the remaining 32.34% of the area came under other subclasses. The LULC map was prepared from Landsat 9 using supervised classification classes (Figure 4b). The study area of Chanda Kalan Watershed is divided into seven classes, namely, Agriculture, Bare Ground, Shrubs/Scrub, Open Forest, Ravines, Built Up, and Water Bodies, based on the LULC map, where the area under Agriculture covers the highest area of the

watershed, which lies in the central and northeast part of the watershed. The soil textural map of the study area divides the whole area into three textural classes, namely, (i) clayey, (ii) fine loamy soil, and (iii) rock outcrops (Figure 4c). Most of the study area comes under clayey soil, particularly the central and the southeastern parts, while the northern and northeastern areas come under the fine loamy soil category and the southwestern area comes under the rock outcrop category. The study area is divided into five soil organic carbon class for soil depth of 0–15 cm based on soil organic carbon content (decigram/kg): <122.5, 122.5–176.5, 176.5–230.5, 230.5–284.5, and >284.5 (Figure 4d). The maximum area of 1707.84 ha covering 65.26% comes under the subclass with SOC content <122.5 decigram/kg and lies in the northwest and central part of the watershed, while the minimum area with <284.5 decigram/kg covers 57.24 ha (2.19%) and lies in the lower part of the watershed.

**Figure 4.** Thematic layers: (**a**) NDVI, (**b**) LULC, (**c**) soil texture, (**d**) soil organic carbon.

#### *3.2. Land Degradation Vulnerability*

Consistency ratios for each thematic layer (Table 4), normalized matrix (Table 5), and subcategories of each thematic layer (Table 6) were calculated before the integration of thematic layers. The results revealed that the judgement matrices utilized in the investigation were accurate (CR < 0.10) and had reasonable consistency. The reclassified thematic layers are combined using the weighted overlay approach based on their respective weight. In this study, five LDVZ categories, namely, very low, low, moderate, high, and very high, were identified through the AHP- and GIS-based modeling approach. The quantile breaks were used for the above classification of the integrated product. The results represent that about 1444.68 hectares of the total study area (55%) are under very low to low classes of land degradation vulnerability, and these lands covered almost half the area of the watershed (Figure 5). About 530.01 hectares (20.52%) of the watershed came under the moderate class of the LDVZ and covered mainly southern to southeastern parts of the watershed. High and very high classes of LDVZ zones covered about 607.95 hectares (23.54%) of the watershed. These classes covered the area mostly the northern and somewhat central and southern parts of the study area. These two classes showed high to very high severity of land degradation, such as ravines and gullies.


**Table 4.** AHP pairwise comparison matrix for thematic layers.

LULC, land use/land cover; NDVI, Normalized Difference Vegetation Index; ST, soil texture; LST, land surface temperature; TWI, Topographic Wetness Index; MAR, mean annual rainfall; SOC, soil organic carbon; MRVBF, Multi-Resolution Index of Valley Bottom Flatness.

**Table 5.** Normalized matrix for thematic layers.


**Table 6.** Weighting of sub-classes.


**Figure 5.** Land degradation vulnerable zones and validation with Google Earth images (Dark green color indicates low vulnerability and deep red indicates higher vulnerability).

#### *3.3. Validation of Land Degradation Vulnerability Zones*

Validation of LDVZ of the study area was conducted by the visual validation method. In this process, validation was performed with the help of high-resolution Google Earth images. Five sites from the degraded part of the study area were validated with the high-resolution Google Earth images. The visual assessment of high and very high classes using high-resolution Google Earth images of growing season of 2022 indicated that the degree of land degradation (ravines, gullies, and bare grounds) in the selected watershed is in accordance with the outcomes of the model used in this study (Figures 5 and 6).

**Figure 6.** Validation of land degradation vulnerable zones with field photographs.

Figure 7 shows the ROC curve of the LDVZ map generated using the AHP method. The AUC value of the ROC curve was found to be 86%. Hence, it was determined that the AHP model derives reasonable results in predicting land degradation vulnerability zones in the study area.

**Figure 7.** ROC curve of the LDVZ map using AHP model.

#### **4. Discussion**

Land degradation is generally considered one of world's most serious environmental issues. In India, the western state of Rajasthan is a part of the Thar Desert, where degradation is a severe issue. The Chambal River valley in the state of Rajasthan is one of the severely affected regions in the country, where gully erosion/ravines have major physical and economic implications [18,19,36]. For sustainable agricultural planning and development, the identification of vulnerable hotspots to soil/water erosion is the need of the hour. Therefore, the present research was carried out to identify hot spots of land degradation in a small watershed using an AHP- and GIS-based modeling approach. Previous research has found that only a few variables play an important role in the assessment of land degradation [13,37]. In the present investigation, LULC, NDVI, TWI, MRVBF, LST, MAR, soil texture, and SOC were considered for the mapping of land degradation vulnerable zones. LULC and NDVI were taken as the most influential layers for land degradation vulnerability. Land use/land cover implies man-made and natural modification of the land surface, and it is a major cause of land degradation [38,39]. The NDVI has long been recognized as a useful measure for determining the greenness of flora and it is well accepted in science that a decrease in NDVI is a sign of land degradation and is closely linked to climatic conditions [40,41]. The most basic soil physical property, on the other hand, is soil texture, which impacts hydraulic properties and surface soil loss [42].

LST is an important parameter in the semi-arid region as it is directly linked with soil moisture availability and indirectly linked with the flora conditions of the study area [43,44]. A rise in the LST might result in a reduction in vegetative greenness and an increase in land degradation. Increased rainfall during the monsoon season increases the risk of topsoil loss by higher water velocity, which causes more soil erosion [37]. SOC is a universal biomarker of soil degradation since its decrease may have severe consequences for soil-derived ecosystem services [45]. Similarly, TWI is one of the most important terrain parameters and plays a significant role in assessing land degradation vulnerability. A higher TWI value is associated with good vegetation cover, and vice versa [46]. As a result, vegetation cover promotes infiltration, reduces surface runoff, and thus greatly delays the incidence of soil erosion [47]. Therefore, thematic layers were given weights based on their importance. The AHP model assigned the weightage of each factor, i.e., LULC (0.38), NDVI (0.23), soil texture (0.15), LST (0.08), TWI (0.06), MAR (0.05), SOC (0.03), and MRVBF (0.02). The higher the index value, the more exposed the area is to land degradation, whereas the lower the value, the less vulnerable it is. Consistency ratios for each thematic layer and subclasses of each thematic layer were calculated before the integration of thematic layers. The computed CR value was less than 0.1, which shows that all the parameters' assumptions about their impact on soil erosion are valid.

Research findings showed that five land degradation vulnerability zones (LDVZ) namely, very low, low, moderate, high, and very high, were identified in the study area. Very low, low, moderate, high, and very high classes covered 27%, 29%, 20%, 11%, and 12% of the area of the watershed, respectively. Parmar et al. (2021) [48] also conducted a study to assess land degradation vulnerability using the geospatial technique in the Kutch district of Gujarat, India, and the results revealed that 67% of the land area has high vulnerability to land degradation, and 27% of the area falls under

the moderate class. Similarly, an assessment of potential land degradation using the geospatial technique and multi-influencing factor technique was carried by Senapati et al. (2020) [49] in the Akarsa Watershed, West Bengal, and they also classified the study area into five land degraded zones. The analysis revealed that the very low to low categories of land degradation vulnerability covered almost half of the area of the watershed, and this portion of the study area is associated with good vegetative coverage with open forest, very low vegetative degradation, adequate rainfall (843.21–846 mm), and well-drained soils deep in nature with clayey texture.

All the environmental covariables are linked with each other, e.g., adequate rainfall positively correlates with NDVI and LULC and a good amount of LULC links with optimum SOC content and better soil health, which directly relate to a lower chance of land degradation [50]. The moderate class of LDVZ was related to very less vegetative coverage in the scrub/shrub class of LU/LC, with normal rainfall (<843.20 mm) and rock outcrops. This class also represented a low to medium MRVBF value with low TWI. High and very high classes of LDVZ covered about 607.95 hectares (23.54%) of the watershed. These two classes showed high to very high severity of land degradation, such as ravines and gullies. This section of the watershed had no or very little vegetation cover, higher rainfall (>846 mm), high valley bottom flatness, higher LST, and clay to fine loamy texture soils with low soil organic carbon. In this research, the AHP- and GIS-based modeling approach showed its potential for the assessment of vulnerability to land degradation by compiling different parameters. Validation of LDVZ was carried out with the help of Google Earth images of high resolution and the results were very well in agreement with the AHP–GIS model-based approach. Similar work has been conducted by several researchers [12,13,29,51–54] with respect to assessment and mapping of LDVZ based on an AHP and GIS modeling approach and their validation with Google Earth imageries.

This study has identified areas that are more prone to land degradation, which can help prioritize and implement soil water conservation practices to reduce the consequences of degradation. Farmers should be encouraged to grow cover crops and crop rotation practices to maintain soil quality over time. Farmers should maintain crop residue and biomass over soil surface after harvesting to avoid exposing the topsoil. Furthermore, the findings of this research may be useful in developing better soil and water management policies. Although this study was carried out at the watershed level, it should be replicated to the sub-district or district level. The pedological parameters used in this study are available at coarse resolution, which caused some challenge and gaps in the results. Future research should concentrate on high-resolution satellite and soil survey data to delineate degradation zones with higher accuracy.

#### **5. Conclusions**

In the study, LULC, NDVI, soil texture, LST, MAR, TWI, SOC, and MRVBF were considered major contributing factors in the identification of land degradation vulnerability zones through the GIS- and AHP-based model. The AHP- and GIS-based modeling shows that about 607.95 hectares of the total study area are in the high and very high categories of LDV, and 530.01 hectares are in the moderate LDV category. Validation of moderate, high, and very high LDV classes using high-resolution Google Earth imagery demonstrates that the degree of land degradation features of Google Earth imagery of the selected study area was in agreement with the AHP–GIS modelbased approach. This study demonstrates the potential of high-resolution satellite data and the robustness of GIS-based spatial modeling in obtaining accurate, reliable, and cost-effective results for the assessment of land degradation in semi-arid ecosystems. The prevalence and severity of LDV were determined using AHP- and GIS-based modeling, which will be extremely useful in recommending soil conservation and management measures that are suited for each site, particularly in highly and extremely vulnerable regions, for long-term land resources management. These data were derived from satellite data that could cause some challenges and gaps in the results. Therefore, macro- and micro-scale observations are required to account for the high environmental variability and to distinguish between the influences of anthropogenic actions and climate variability on land degradation processes.

**Author Contributions:** Conceptualization, B.Y. and L.C.M.; methodology, G.P.O.R. and N.K.; software, B.L.T. and L.C.M.; validation, S.P. and B.L.M.; formal analysis, B.Y. and L.C.M.; investigation, B.Y.; resources, B.S.D.; data curation, N.K. and S.V.S.; writing—original draft preparation, B.Y., L.C.M., and S.P.; writing—review and editing, P.K.J., G.P.O.R., and S.V.S.; visualization, P.K.J. and B.L.T.; supervision, B.L.M. and B.S.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors are grateful to the ICAR-NBSS&LUP and DST for help and support.

**Conflicts of Interest:** The authors declare that there is no conflict of interest, either financially or otherwise.

#### **References**


### *Article* **Spatial Mapping of Soil Salinity Using Machine Learning and Remote Sensing in Kot Addu, Pakistan**

**Yasin ul Haq 1,\*,†, Muhammad Shahbaz 1,†, H. M. Shahzad Asif 1, Ali Al-Laith <sup>2</sup> and Wesam H. Alsabban <sup>3</sup>**


**Abstract:** The accumulation of salt through natural causes and human artifice, such as saline inundation or mineral weathering, is marked as salinization, but the hindrance toward spatial mapping of soil salinity has somewhat remained a consistent riddle despite decades of efforts. The purpose of the current study is the spatial mapping of soil salinity in Kot Addu (situated in the south of the Punjab province, Pakistan) using Landsat 8 data in five advanced machine learning regression models, i.e., Random Forest Regressor, AdaBoost Regressor, Decision Tree Regressor, Partial Least Squares Regression and Ridge Regressor. For this purpose, spectral data were obtained between 20 and 27 of January 2017 and a field survey was carried out to gather a total of fifty-five soil samples. To evaluate and compare the model's performances, the coefficient of determination (R2), Mean Squared Error (MSE), Mean Absolute Error (MAE) and the Root-Mean-Squared Error (RMSE) were used. Spectral data of band values, salinity indices and vegetation indices were employed to study the salinity of soil. The results revealed that the Random Forest Regressor outperformed the other models in terms of prediction, achieving an R2 of 0.94, MAE of 1.42 dS/m, MSE of 3.58 dS/m and RMSE of 1.89 dS/m when using the Differential Vegetation Index (DVI). Alternatively, when using the Soil Adjusted Vegetation Index (SAVI), the Random Forest Regressor achieved an R2 of 0.93, MAE of 1.46 dS/m, MSE of 3.90 dS/m and RMSE of 1.97 dS/m. Hence, remote sensing technology with machine learning models is an efficient method for the assessment of soil salinity at local scales. This study will contribute to mitigating osmotic stress and minimizing the risk of soil erosion by providing early warnings regarding soil salinity. Additionally, it will assist agriculture officers in estimating soil salinity levels within a shorter time frame and at a reduced cost, enabling effective resource allocation.

**Keywords:** DVI; machine learning; remote sensing; random forest; spatial mapping; soil salinity; salinity indices; vegetation indices

#### **1. Introduction**

Soil salinization refers to the accumulation of salts in the soil, leading to adverse effects on water quality, agricultural yield, soil composition and economic growth [1]. This type of land degradation, known as salinization, is particularly prevalent in arid and semi-arid regions where evaporation rates exceed precipitation rates [2]. According to the findings of the Food and Agriculture Organization (FAO), a total of 831 million hectares (mha) of land area are affected by salt, with 397 mha classified as saline soils and 434 mha as sodic soils [3]. Soil salinity negatively impacts both water and soil quality [4], leading to adverse consequences for agricultural production. Salinity obstructs plant development and poses challenges for the sustainable use of land resources [5–7], resulting in an annual decline of 1–2% in Pakistan. Based on FAO estimates from 2010, it is approximated that

**Citation:** Haq, Y.u.; Shahbaz, M.; Asif, H.M.S.; Al-Laith, A.; Alsabban, W.H. Spatial Mapping of Soil Salinity Using Machine Learning and Remote Sensing in Kot Addu, Pakistan. *Sustainability* **2023**, *15*, 12943. https://doi.org/10.3390/ su151712943

Academic Editors: Antonio Ganga, Blaž Repe and Mario Elia

Received: 18 July 2023 Revised: 24 August 2023 Accepted: 25 August 2023 Published: 28 August 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

around 60% of the world's farmlands are significantly afflicted by salinization. Additionally, approximately 3% of the world's resources are impacted by salt [5,7]. The detrimental impacts of salinity can be further intensified by climate change, drought, water resource scarcity and changes in land use [8]. According to the FAO report from 2011, salinization has had adverse effects on 25% of Pakistan's irrigated lands. Consequently, a substantial portion of agricultural land, specifically 1.40 mha, has been abandoned due to the impacts of salinization [7]. In areas where salinization has affected soils, crop losses ranging from 30% to 60% have been observed, leading to the abandonment of approximately 20% to 30% of these affected regions, as reported by the World Bank in 2006 [6].

With the world's population growth and the anticipated demand for large agricultural lands, it has become critically important to monitor soil salinization in real-time and detect its early warning signs to improve the land utilization [5]. Making informed decisions regarding the proper reclamation and management of such lands necessitates ongoing salinity monitoring [9]. The traditional techniques for measuring salinity are laboratory analysis and field survey. Salinity monitoring and mapping typically involve conducting a comprehensive soil survey and extrapolating data obtained from analytical samples. The spatial mapping of the salinity of soil in an area requires dense sampling, which is a laborious, expensive and hectic job [10–14]. Indeed, remote sensing offers a faster, more cost-effective and accurate approach to examining and plotting soil salinity [5,15,16].

Almost 65 years ago, both color and black-and-white images were employed to identify soil salinity and gather information about various elements present on the Earth's surface [17]. In the present day, satellite remote sensing offers a cost-effective means to explore salinization across several geographical and time spans. Remote sensing employs the reflected electromagnetic energy from the land surface to acquire data pertaining to diverse objects at varying levels of intricacy. The salinity of soil can be adequately assessed using this approach. The emergence of GIS and remote sensing techniques has provided an opportunity for technology to potentially replace or supplement traditional approaches in soil salinity assessment. In terms of predicting accuracy, employing spectral reflectance derived directly from sensors or applying spectral transformations like PCA [18,19], tasseled cap transformation [20] and spectral indices [21,22], has yielded promising outcomes. Many researchers highlight the worth of spectral reflectance in remote sensing investigations, considering it a fundamental perception in the field [23–28]. Numerous research initiatives have been dedicated to digital soil mapping, employing diverse forms of satellite data, and employing geostatistical or statistical methodologies.

In the Tafilalet plain of Morocco, a study demonstrated the Successfulness of Landsat 8 OLI imagery for modeling the salinity of soil [29]. The findings revealed that the RMSE varied from 0.62 to 0.80 dS/m, whereas the R<sup>2</sup> varied between 0.53 and 0.75. Another research study conducted by Hihi et al. [30] utilized a simple linear regression model and found a moderate correlation (R2 = 0.48) between spectral indices and electrical conductivity (EC) extracted from a Sentinel 2 MSI imagery. Similarly, an association was found between the canopy temperature received from MODIS data and soil salinity in a study conducted in Uzbekistan [31]. According to Hoa et al. [32], employing the Gaussian processes technique on SAR Sentinel-1 imagery, along with modern ML models, resulted in an exceptionally accurate model with an R2 of 0.808. This model effectively captures the correlation with satellite data and EC. Taghadosi et al. [33] indicated the effectiveness of radiance images obtained from VH and VV polarizations of SAR Sentinel-1 data in differentiating the soil's salinity. They achieved the most accurate technique by applying the SVR approach with an RBF kernel, which resulted in an R<sup>2</sup> of 0.9783 and an RMSE of 0.3561. In Algeria [11] demonstrated the value of EO-1 ALI and Landsat ETM<sup>+</sup> images in recognizing and defining sodic and saline soils. In their study on the temporal–spatial variation in the salinity of soil in Libya, Zurqani et al. [34] utilized the temporal data of Landsat images covering a period of 29 years (1972–2001), in conjunction with ground truth data. To enhance precision, [35] advocated the adoption of hyperspectral photography. They introduced an innovative salinity index gained from EO-1 data and achieved R<sup>2</sup> = 0.873 in univariate regression analysis. Similarly, Sahbeni [36] utilized Sentinel 2 MSI data and multiple linear regression to simulate the dispersion of salinity of the soil in the Great Hungarian Plain. In their findings, Sahbeni [36] investigated the effectiveness of geographically weighted regression (GWR), MLR, and RFR models for predicting salinity of the soil. They found that topographic factors and geographical position have an essential task in modeling methods. Additionally, they observed that satellite imagery captured in the arid season is particularly useful for predicting EC. The results indicated that the final model achieved a significant level of intermediate accuracy, having an RMSE of 0.1942 g/kg and an R2 of 0.51. It was found that the MLR model exhibited the least level of accuracy compared to the GWR and RFR models. However, the RFR model demonstrated superior estimated accuracy compared to the GWR technique. In a different investigation by [37], eight distinct prototypes for the mapping of salinity of the soil in a desert region were explored by spectral signatures measurements and Landsat 8 OLI data. Significant analysis conducted on methods built on VNIR bands yielded poor results, with an exceptionally high RMSE of 0.65 or higher and an R2 of 0.41. Conversely, the models based on SWIR bands have given valuable findings, with an R<sup>2</sup> of 0.97 and an RMSE of 0.13.

Similarly, it was observed by Zhang X. and Huang B. [38] that spectral transformations and smoothing techniques had an impact on the accuracy of models used for predicting soil salinity based on soil-reflected spectra. Among the different methods evaluated, the Principal Component Regression model with the median filtering data smoothing method demonstrated the precise outcomes, with an R2 of 0.7206 and an RMSE of 0.3929. In addition to the salinity index, such as the NDSI, vegetation indices like the NDVI and SAVI have also demonstrated their effectiveness in indirectly detecting soil salinity. These indices utilize markers such as vegetation health and halophytic plant identification, as noted by [39–42]. These indices provide a strong relationship with the EC of the soil, making them valuable in identifying areas damaged by salt.

Salinization impacts the district of kot Addu heavily and it seemed crucial to use satellite data for monitoring soil salinity in the district. Consequent to that, the district has been marked as a model with an objective of mapping soil salinity in the region. The approach may not only protect the land of agriculture rather to save the additional areas at risk through the proposed approach. The monitoring and mapping process aims to reduce the risk of soil erosion by promptly alerting stakeholders about soil salinity levels. By utilizing various spectral indices, the study seeks to identify salt-affected areas and evaluate the effectiveness of these indices in this specific context.

The objectives of this research are:


#### **2. Material and Method**

#### *2.1. Study Area*

Kot Addu city is situated in the center of Pakistan, in the southern region of the Punjab Province, in the District of Muzaffargarh Figure 1. Sugarcane, wheat and cotton are the principal crops farmed in the alluvial plain that surrounds the city; rice, maize, mash, ground nuts, bajra and oil seeds (rapeseed) are cultivated in a very small ratio. Frequently, certain areas remain flooded. Mangoes, citrus, dates and pomegranates are the principal fruit trees grown; however, many citrus and mango farms also have a small amount of space for pears, dates and bananas [43]. Most of the agricultural lands in this region are mildly to moderately salinized. Kot Addu experiences mild winter and scorching summer seasons due to its desert climate. The city has encountered some of the most severe climate

conditions in Pakistan. The peak temperature ever noted was roughly 51 °C (324.15 °K), and the lowest temperature ever noted was roughly −1 °C (272.15 °K). About 127 mm of rain falls every year on average (5.0 in.). Such a climate exacerbates the salinity issue; therefore, research is being conducted to track the salinity issue in this region.

**Figure 1.** Site map of Kot Addu.

#### *2.2. Satellite Data*

The Landsat 8 data used in this study was obtained from January 10 to 30, 2017. Images with a cloud cover of less than 5 percent were exclusively chosen. Spectral indices were calculated utilizing spectral band values, as shown in Table 1. Band 8 panchromatic was excluded due to its proximity and susceptibility to cloud interference.

**Table 1.** Spectral indices for current study.


#### *2.3. Soil Sample*

To gather soil samples, a survey study was carried out by the Soil and Environmental Sciences Institute at Agriculture University Faisalabad in the Kot Addu region of the Muzaffargarh District in January 2017 [43]. The main objective was to monitor the salt status across the expansive 32,457-acre study area. A total of fifty-five soil samples were randomly collected from the top 15 cm of the soil surface using an auger. Geographical coordinates for each sample location were recorded using a GPS device. The collected soil samples were properly labeled, placed in polybags, and sent to the Soil and Water Testing Laboratory at the University of Agriculture, Faisalabad, for further analysis [43]. The soil specimens underwent grinding, air drying and filtration through a 2 mm sieve. The sieved soil was then transformed into a soil-saturated paste using purified water and left to rest overnight. The concentrated solution extracted from this paste was measured for electrical conductivity (EC) using a conductivity meter, to determine the soil EC [43].

#### *2.4. Spectral Indices*

Using Arc Map 10.3, twelve spectral indices were calculated from the preprocessed data based on the literature review [22,44–52]. The selection of these indices for the study was facilitated by their potential link to the detection of salinity.

The formulas for spectral indices are listed in Table 1.

After computing the spectral indices and forming the dataset, the subsequent step involved applying the standard scaler technique. This was carried out to ensure uniform contributions from the features before proceeding with the training of the machine learning models. The objective is to comprehend the association among satellite data (spectral indices) and the salt content of the soil specimens.

**Figure 2.** Methodology of current study.

#### *2.5. Random Forest Regressor*

Breiman [53] and Cutler and Stevens [54] proposed an RF algorithm that works on the basis of several decision trees which are not correlated with each other [55]. It works as a classifier if labels are given, and like a regressor for continuous or numeric values. Random samples are selected from the calibration set to build the decision tree. For the complete division of variable space, random selections are made from n inputs to split each node in the decision tree. The average result of all decision trees is the final value of the RF model. Significant attention needs to be given to the tuning of the RF model for prediction, specifically in regards to the count of decision trees (ntree), the count of randomly sampled

variables as candidates for each split (mtree) and the least count of specimens essential for a node to be considered a leaf (nodesize). The RF model is more stable, having a higher value of ntree. The nodesize value is determined by repeated testing (nodesize = 1). The The Residual Sum of Square (RSS) formula is used in the RFR model for regression. In this study, the RFR model is implemented by using the "RandomForestRegressor" from sklearn in jupyter notebook.

$$RSS = \sum\_{left} (y\_i - y\_L^\*)^2 + \sum\_{right} (y\_i - y\_R^\*)^2 \tag{1}$$

where *y*∗ *<sup>L</sup>* is the mean *y* value for the left node and *y*<sup>∗</sup> *<sup>R</sup>* is the mean *y* value for the right node.

#### *2.6. AdaBoost Regressor*

AdaBoost, known as Adaptive Boosting, is an ensemble learning technique utilized for classification and regression tasks. For regression problems, it is referred to as AdaBoost Regressor. This widely used machine learning algorithm blends the predictions of several weak learners, typically shallow decision trees, to construct a robust ensemble model. The fundamental concept behind the ABR involves iteratively training a sequence of weak learners, with each subsequent learner emphasizing examples that previous ones struggled to predict accurately. This adaptive nature enables the model to enhance its performance through successive iterations.

#### *2.7. Decision Tree Regressor*

The Decision Tree Regressor functions as a supervised machine learning technique utilized for regression purposes. Unlike classification challenges that seek to predict categorical labels, regression tasks focus on forecasting continuous numerical values. A decision tree takes the form of a tree-like model in which internal nodes indicate decisions based on specific features, while leaf nodes correspond to the predicted output values. In the case of the DTR, the value at each leaf node is derived from the average (or another measure) of the target values from the training samples that lead to that particular leaf.

#### *2.8. Ridge Regressor*

The Ridge Regressor is a supervised linear regression algorithm, commonly employed in machine learning tasks, particularly when dealing with multicollinearity among predictor variables. It is a modified version of standard linear regression that incorporates a penalty term in the cost function to prevent overfitting and enhance generalization. In ordinary linear regression, the objective is to determine coefficients for predictor variables that minimize the sum of squared differences between predicted and actual target values. However, when predictor variables are highly correlated, the coefficients might grow large, leading to heightened sensitivity to noise and potential overfitting. To tackle this issue, the RR introduces an L2 regularization term to the linear regression cost function. This regularization term imposes a penalty based on the squared magnitudes of the coefficients. By applying this penalty, the RR encourages the model not only to fit the data but also to maintain small coefficients, effectively reducing the influence of individual predictors.

#### *2.9. Partial Least Squares Regression*

Partial Least Squares Regression is a statistical technique utilized to model the association between a group of independent variables (X) and a dependent variable (Y). It proves particularly valuable when working with datasets that exhibit high dimensionality or when there is a possibility of multicollinearity among the predictor variables. The primary objective of PLSR is to discover a reduced representation of both the independent and dependent variables by forming new latent variables (also called components) as linear combinations of the original variables. These components are crafted in a manner that maximizes the covariance between the independent and dependent variables within each subsequent component.

#### *2.10. Evaluation*

The regression algorithms received spectral indices as input, with the observed soil salinity serving as the target value. Overall, 55 soil specimens were gathered between January 20 and 27 January 2017. The dataset was then distributed into training and testing sets, with a proportion of 80% for calibration and 20% for validation. Out of the 55 samples, 45 were utilized for training the model, while the remaining 10 were used for testing. Various statistical parameters, including the R2, MAE, MSE, and RMSE, were employed to assess the proficiency of the ML techniques.

$$R^2 = 1 - \frac{\sum\_{k=1}^{n} (y\_k - \hat{y}\_k)^2}{\sum\_{k=1}^{n} (y\_k - \bar{y}\_k)^2} \tag{2}$$

$$MAE = \frac{1}{n} \sum\_{k=1}^{n} |y\_k - \mathfrak{H}\_k| \tag{3}$$

$$MSE = \frac{1}{n} \sum\_{k=1}^{n} (y\_k - \hat{y}\_k)^2 \tag{4}$$

$$RMSE = \sqrt{\frac{1}{n} \sum\_{k=1}^{n} (y\_k - \hat{y}\_k)^2} \tag{5}$$

Here, *yk* is the kth observed salinity, *y*ˆ*<sup>k</sup>* is the kth predicted salinity, *y*¯*<sup>k</sup>* is the mean salinity of all the soil samples, and *n* is the total count of specimens.

#### **3. Results**

#### *3.1. Models' Performance*

This study predicted the salinity of soil using spectral indices, and various statistical parameters, including the R2, MAE, MSE and RMSE, were employed to assess the performance of ML models. Among these parameters, R2 played a significant role in evaluating the model's performance. It is a measure of how well the model has learned the data, and a higher R2 value reveals a better fit of the model to the data. Twelve different spectral indices were used in the study, and their R2 values were compared to determine which index is more efficient in mapping the salinity of soil. In Figure 3, a comprehensive analysis of R<sup>2</sup> values across a variety of spectral indices, including the NDVI, SAVI, MSR, NDSI, SR, DVI and RSI, is presented. The evaluation encompassed the utilization of multiple models, specifically the RFR, ADR, DTR, RR and PLSR. After a thorough analysis of the results, it becomes clear that the RFR model exhibited exceptional performance, particularly in relation to the DVI and SAVI spectral indices. Impressively, it achieved noteworthy R2 values of 0.94 and 0.93 for these indices, respectively. In contrast, the implementation of the ADR model led to a distinct decline in R2 values for these same indices, producing R<sup>2</sup> values of 0.59 and 0.56 for DVI and SAVI, respectively. Furthermore, the RFR model displayed superior R2 values across several other spectral indices, including NDVI, MSR, NDSI, SR, and RSI, when compared to the ADR model. Conversely, the ADR model presented lower R2 values for certain spectral indices like DTR, RR, and PLSR, when compared to the RFR model. As a result, the RFR model stands as a more suitable choice for accurately mapping soil salinity in this study. For a detailed account of the R2 values for different spectral indices, refer to Figure 3.

The appropriate selection of spectral indices is crucial when mapping soil salinity using satellite data. Spectral indices play a fundamental role as input features for ML models in remote sensing applications. In the context of this study, the spectral indices VSSI, SI1, SI2, SI3 and SI4 displayed lower R2 values, as depicted in Figure 4. These lower R2 values indicate that, when utilizing these spectral indices as input features for the ML models, the models performed poorly, and their predictions did not closely match the actual soil salinity levels.

**Figure 3.** Comparison of R<sup>2</sup> of machine learning models using spectral indices NDVI, SAVI, MSR, NDSI, SR, DVI and RSI.

**Figure 4.** Comparison of R2 of machine learning models using spectral indices VSSI, SI1, SI2, SI3 and SI4.

#### *3.2. Model Calibration and Validation*

For calibrating ML models using Landsat 8 OLI data, a total of twelve spectral indices were employed. The models were trained using 80% of the available samples, while the remaining 20% was used for testing. The implementation was carried out in Python, and four statistical methods (R2, MAE, MSE and RMSE) were utilized to assess the models' performance. The results presented in Figure 3 indicate that certain spectral indices, namely DVI, SAVI, NDVI, MSR, NDSI, SR and RSI, achieved high R2 values, reflecting a strong correlation between the predictions made by the ML models and the actual soil salinity levels. On the contrary, Figure 5 reveals that specific spectral indices, such as VSSI, S1, S2, S3 and S4, produced very low R2 values. These findings indicate that the ML models utilizing these indices were not effective in accurately predicting soil salinity levels, as their predictions did not closely match the actual data. Conversely, the exceptional performance of the RFR technique is particularly noteworthy, especially when integrating the DVI spectral index. The RFR model accomplished a significant R2 value of 0.94, signifying a robust correlation between its predictions and the observed soil salinity levels. Furthermore, the model exhibited minimal errors with an MAE of 1.42, MSE of 3.58, and RMSE of 1.89. Collectively, these metrics emphasize the RFR model's exceptional capability to precisely forecast soil salinity levels. Contrasting the error rates across various models using the DVI spectral index further highlights the superiority of the RFR model. The RFR model outperforms other models, illustrating notably reduced error rates in its soil salinity predictions, particularly when making use of the DVI spectral index. Remarkably, the RFR model attains an MAE of 1.42, an MSE of 3.58, and an RMSE of 1.89. In contrast, the ADR model shows higher error rates, presenting an MAE of 2.89, MSE of 22.70, and RMSE of 4.76. Similarly, the DTR and RR models demonstrate elevated errors, with MAE values

of 2.75 and 4.32, along with corresponding MSE values of 22.40 and 30.78, respectively. Furthermore, the PLSR model exhibits comparable error rates to the RR model, featuring an MAE of 4.32, MSE of 30.67, and RMSE of 5.54. The significant difference in error rates between the RFR model and the alternative models highlights the RFR model's superiority in precisely forecasting soil salinity, particularly when utilizing the DVI spectral index as an incorporated feature.

**Figure 5.** Errors comparison of ML Models (**a**) NDVI, (**b**) SAVI, (**c**) MSR, (**d**) NDSI, (**e**) SR, (**f**) DVI, (**g**) RSI, (**h**) VSSI, (**i**) SI1, (**j**) SI2, (**k**) SI3, (**l**) SI4.

The line graphs depicting the RFR (Figure 6), ABR (Figure 7), DTR (Figure 8), RR (Figure 9), and PLSR (Figure 10) illustrate the visualization of the observed and estimated soil salinity values. On the x-axis, sample numbers are represented, while the y-axis displays both actual and predicted EC values, distinguished by different colors. Within the used spectral indices, encompassing DVI, SAVI, NDVI, MSR, NDSI, SR, and RSI, a close alignment is observed between the lines representing actual and predicted values in the line graphs. This close alignment indicates that these spectral indices yield better performance when used with the RFR model, resulting in more accurate predictions of soil salinity. On the other hand, spectral indices VSSI, SI1, SI2, SI3 and SI4 exhibit larger discrepancies between the actual and predicted lines in the line graphs. This observation suggests that these spectral indices are associated with higher error rates when analyzed with the RFR model, indicating that they may not be as effective in accurately predicting soil salinity. Overall, the visual analysis of the line graphs supports the finding that the RFR model, especially when utilizing distinct spectral indices, outperforms other ML models and offers more reliable predictions of soil salinity levels.

**Figure 6.** Line graphs of EC values using Random Forest Regressor (**a**) NDVI, (**b**) SAVI, (**c**) MSR, (**d**) NDSI, (**e**) SR, (**f**) DVI, (**g**) RSI, (**h**) VSSI, (**i**) SI1, (**j**) SI2, (**k**) SI3, (**l**) SI4.

In Figure 6a–g, the predicted and actual lines using the RFR model are positioned closely together, indicating a strong alignment and accurate predictions. This finding suggests that certain spectral indices utilized in this section had lower error rates and resulted in the best model fit for predicting the salinity of soil. The close proximity of the lines in these graphs signifies that the model's predictions closely match the actual soil salinity values, leading to more reliable results. However, a different scenario is observed in Figure 6i–l where spectral indices VSSI, SI1, SI2, SI3 and SI4 are used. In these graphs, the predicted and actual lines are more dispersed, indicating higher error rates and less accurate predictions by the RFR model. This outcome implies that, when mapping soil salinity, these specific spectral indices did not perform well in capturing the true salinity variations in the soil, resulting in less reliable predictions.

In Figure 7a–g, the predicted and observed lines obtained through the ABR model align closely when utilizing spectral indices such as the NDVI, SAVI, MSR, NDSI, SR, DVI and RSI. This alignment implies that these spectral indices demonstrate reduced error rates and yield more precise predictions for soil salinity mapping. The proximity of the lines in these graphs signifies the strong performance of the ABR model with these specific spectral indices, effectively capturing authentic salinity fluctuations within the soil and yielding dependable forecasts. Conversely, in Figure 7i–l, where the spectral indices VSSI, SI1, SI2, SI3 and SI4 are employed, the predicted and observed lines exhibit greater dispersion. This observation suggests elevated error rates associated with these spectral indices when employed for soil salinity mapping. The wider distribution of lines indicates that the ABR model's efficacy with these indices is diminished, resulting in less precise predictions and increased uncertainties.

**Figure 7.** Line graphs of EC values using AdaBoost Regressor (**a**) NDVI, (**b**) SAVI, (**c**) MSR, (**d**) NDSI, (**e**) SR, (**f**) DVI, (**g**) RSI, (**h**) VSSI, (**i**) SI1, (**j**) SI2, (**k**) SI3, (**l**) SI4.

In Figure 8a–g, the predicted and actual lines using the DTR (Decision Tree Regression) model are not as closely aligned as those in Figure 6a–g, which indicates that the DTR model has higher error rates compared to the RFR (Random Forest Regression) model when predicting soil salinity. The spread between the predicted and actual lines in Figure 8a–g suggests that the DTR model did not perform as well as the RFR model, leading to less accurate predictions and higher uncertainties. However, when comparing the DTR model (Figure 8) to the RR model (Figure 9) and the PLSR model (Figure 10), the predicted and actual lines are closer. This observation indicates that the DTR model has lower error rates compared to the RR and PLSR models when using the same spectral indices for predicting soil salinity.

**Figure 8.** Line graphs of EC values using Decision Tree Regressor (**a**) NDVI, (**b**) SAVI, (**c**) MSR, (**d**) NDSI, (**e**) SR, (**f**) DVI, (**g**) RSI, (**h**) VSSI, (**i**) SI1, (**j**) SI2, (**k**) SI3, (**l**) SI4.

When comparing the predicted and actual lines of the PLSR model (Figure 10) to those of the RFR (Figure 6), ABR (Figure 7) and DTR (Figure 8) models, the gap is more pronounced. This visual analysis underscores that the PLSR model exhibits the highest error rate among the models under consideration. The comparative evaluation of line graphs underscores that the PLSR model is characterized by elevated error rates and less accurate predictions compared to the other models.

**Figure 9.** Line graphs of EC values using Ridge Regressor (**a**) NDVI, (**b**) SAVI, (**c**) MSR, (**d**) NDSI, (**e**) SR, (**f**) DVI, (**g**) RSI, (**h**) VSSI, (**i**) SI1, (**j**) SI2, (**k**) SI3, (**l**) SI4.

**Figure 10.** Line graphs of EC values using Partial Least Square Regressor (**a**) NDVI, (**b**) SAVI, (**c**) MSR, (**d**) NDSI, (**e**) SR, (**f**) DVI, (**g**) RSI, (**h**) VSSI, (**i**) SI1, (**j**) SI2, (**k**) SI3, (**l**) SI4.

#### **4. Discussion**

The first research objective was to assess the accuracy of predicting soil salinity using Landsat 8 OLI data with the aid of ML models. We successfully achieved soil salinity predictions by employing these ML models in conjunction with various spectral indices. The RFR model performed exceptionally well with an R2 of 0.93 (Figure 3), MAE of 1.46, MSE of 3.90 and RMSE of 1.97 when using the SAVI spectral index as shown in (Figure 5). However, upon further analysis, we found that the RFR model provided even more precise predictions with an R<sup>2</sup> of 0.94 (Figure 3), MAE of 1.42, MSE of 3.58 and RMSE of 1.89 when utilizing the DVI spectral index (Figure 5). The predicted and actual lines using the RFR model were closely aligned in Figure 6f, indicating the model's strong performance in accurately predicting soil salinity. Conversely, when using the SAVI spectral index, the ADT model exhibited higher error rates with an R<sup>2</sup> of 0.56, MAE of 3.03, MSE of 24.23 and RMSE of 4.92. However, the ADT model showed improved predictions with an R<sup>2</sup> of 0.59, MAE of 2.89, MSE of 22.70 and RMSE of 4.76 when utilizing the DVI spectral index. Similarly, the DTR model revealed increased error rates, presenting an R2 value of 0.58, a MAE of 3.07, an MSE of 23.24 and an RMSE of 4.82 when making use of the SAVI spectral index.

On the other hand, the DTR model illustrated enhanced performance, achieving an R<sup>2</sup> of 0.59, a MAE of 2.75, an MSE of 22.40 and an RMSE of 4.73 through the utilization of the DVI spectral index. In contrast, the RR model demonstrated diminished R2 values and elevated error rates, indicating a suboptimal fit when employing the SAVI spectral index, resulting in an R2 of 0.47, a MAE of 4.25, an MSE of 29.63 and an RMSE of 5.44. A slight improvement in results was noted as the RR model integrated the DVI spectral index, resulting in an R2 value of 0.47, a MAE of 4.25, an MSE of 29.63, and an RMSE of 5.44. Similarly, the utilization of the SAVI spectral index by the PLSR model led to higher error rates, yielding an R<sup>2</sup> of 0.47, a MAE of 4.25, an MSE of 29.51, and an RMSE of 5.43. A slight increase in error rates was observed when the PLSR model applied the DVI spectral index, resulting in an R2 of 0.47, an MAE of 4.32, an MSE of 30.67 and an RMSE of 5.54.

Conversely, when utilizing the VSSI, SI1, SI2, SI3, and SI4 spectral indices for soil salinity prediction, the ML models exhibited high error rates and notably low R<sup>2</sup> values. These outcomes clearly indicated that these specific spectral indices were not well-suited for the models and they had significant errors in predicting the actual salinity values, as depicted in Figure 4. The large discrepancies between the forecast and real salinity values for these spectral indices suggest that they may not capture the essential information needed for accurate soil salinity predictions, highlighting the importance of careful selection of appropriate spectral indices for improving model performance. The RFR model outperformed the ADR, DTR, RR and PLSR models due to its implementation of random sampling [55], superior fitting on small datasets [56] and consequent enhancement in decision-making accuracy [54]. Overall, the RFR model outperformed the other models in predicting soil salinity, especially when using the DVI spectral index. The selection of appropriate spectral indices is crucial in optimizing the performance of the ML models for accurate soil salinity predictions.

The second research objective aimed to identify the most efficient spectral indices for mapping the salinity of soil using Landsat 8 OLI data. For this purpose, twelve spectral indices were employed for mapping the salinity of soil. Higher R2 values and minimum error rates were found during the prediction of salinity of soil by seven specific spectral indices (DVI, SAVI, NDVI, NDSI, MSR, SR, and RSI). Likewise, notable correlation between NDSI, NDVI, and SAVI indices to the level of salinity of soil was revealed in preliminary analysis [38]. More importantly, a unique demonstration of performance was made by DVI index with an impressive R<sup>2</sup> value of 0.94. Moreover, it demonstrated outstanding accuracy in predicting the salinity of soil with the minimal values of MAE of 1.42, an MSE of 3.58, and an RMSE of 1.89 (Figure 6f). This underscores its strong alignment with the RFR model and its significant contribution to precise soil salinity predictions. The findings indicated that spectral indices utilizing the NIR band [37] displayed greater accuracy in predicting soil salinity compared to those relying on visible bands. In contrast, the VSSI, SI1, SI2, SI3 and SI4 indices exhibited lower R<sup>2</sup> values (Figure 4) and higher error rates (Figure 4), indicating their reduced effectiveness in accurately estimating soil salinity.

The third research objective was to identify the most efficient ML model for determining the salinity of soil using Landsat 8 OLI data. The performance of five ML techniques were compared based on R2, MAE, MSE and RMSE metrics. The results clearly demonstrated that the RFR model, specifically using the DVI index, outperformed the other ML models, including the ABR, DTR, RR and PLSR, when using the selected spectral indices (excluding VSSI, SI1, SI2, SI3 and SI4) from Landsat 8 OLI data to forecast the salinity of soil. The RFR model achieved an impressive R2 of 0.94, indicating a strong correlation between predicted and observed soil salinity values, and exhibited minimum error rates as depicted in Figure 3. When comparing the predicted and actual salinity lines, the RFR model (Figure 6) demonstrated a closer alignment, highlighting its superior performance compared to the ABR (Figure 7), DTR (Figure 8), RR (Figure 9) and PLSR (Figure 10) models. This indicates that the RFR model provided more accurate predictions and better captured the variations in soil salinity, making it the most efficient ML model for this specific task [54,55]. These findings emphasize the significance of selecting an appropriate

ML algorithm and spectral indices when mapping soil salinity using satellite data, as it directly impacts the accuracy and reliability of the predictions.

This study makes a valuable addition to the application of ML models for the mapping of salinity of soil based on spectral indices, because we have successfully predicted the salinity of soil by Landsat 8 imagery with the highest 0.94 of R2. We can determine the soil salinity of barren land by just entering the geo-coordinates. It can be useful to predict the soil salinity of barren land in Pakistan and the global region. If the soil salinity of the land is determined in time, then we can take necessary steps to decrease the soil erosion. The proper soil salinity value (EC < 4) has a significant impact on the growth of the plants. The soil salinity level is an important factor in deciding the suitable crop for cultivation on a specific plot of land. So, this study helps to improve agriculture management practices.

#### **5. Conclusions**

The salinity of soil is a significant global issue, notably in arid and semiarid regions, and it poses detrimental effects on food security worldwide. It deteriorates the soil environment and has adverse impacts on climate, hydrology, agriculture, geochemistry and the economy. Regular monitoring of the scale and intensity of soil salinity is crucial to mitigate these environmental risks. Hence, remote sensing presents a suitable option for remotely determining the salinity of soil in specific regions. The current study demonstrates the effectiveness of Landsat 8 OLI data, specifically spectral indices, in characterizing and evaluating soil salinity. Satellite data reveal that saline soils exhibit higher reflectance in the visible, NIR and SWIR spectra compared to regular soils. While salinity indices SI1, SI2, SI3 and SI4 are not useful in vegetated regions, they can be employed for comprehensive observations and the assessment of saline soils. Among the five ML models utilized for estimating the salinity of soil derived from spectral indices, the RFR model demonstrated the best performance in evaluating salinity in study area with an R2 0.94, MAE 1.42, MSE 3.58 and RMSE 1.89 as compared to ADB, DTR, RR and PLSR. Among the twelve spectral indices utilized, DVI, SAVI, NDVI, NDSI, MSR, SR and RSI exhibited significant correlations with soil salinity. These indices are recommended to spatially map the salinity of soil. On the other hand, VSSI, SI1, SI2, SI3 and SI4 had unclear correlations with the salinity of soil. Based on the results, it is possible to develop a web-based application that allows users to map soil salinity by entering the geocoordinates. Such an application would be beneficial for farmers and agricultural management, enabling them to make informed decisions regarding crop selection to mitigate monetary losses caused by climate change. This method offers a cost-effective and efficient approach for detecting soil salinity in specific regions. Furthermore, since Landsat 8 OLI data are freely available, the results of this study can be optimized by increasing the dataset and applying deep learning models.

**Author Contributions:** Conceptualization, Y.u.H. and M.S.; methodology, Y.u.H.; software, Y.u.H.; validation, Y.u.H., M.S. and H.M.S.A.; formal analysis, Y.u.H., M.S., H.M.S.A. and A.A.-L.; investigation, Y.u.H., M.S.; resources, Y.u.H., M.S., H.M.S.A., A.A.-L. and W.H.A.; data curation, Y.u.H.; writing—original draft preparation, Y.u.H.; writing—review and editing, Y.u.H., M.S., H.M.S.A., A.A.-L. and W.H.A.; visualization, Y.u.H., M.S. and H.M.S.A.; supervision, M.S. and H.M.S.A., A.A.-L. and W.H.A.; project administration, M.S. and H.M.S.A.; funding acquisition, W.H.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research was sponsored by NRPU Project No. 17006, granted by the Higher Education Commission of Pakistan.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Upon reasonable request, the underlying data for this article will be made available by the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **GIS-Based Soil Erosion Risk Assessment in the Watersheds of Bukidnon, Philippines Using the RUSLE Model**

**Indie G. Dapin 1,2,\* and Victor B. Ella <sup>2</sup>**


**Abstract:** The sustainability of watersheds for supplying water and for carbon sequestration and other environmental services depends to a large extent on their susceptibility to soil erosion, particularly under changing climate. This study aimed to assess the risk of soil erosion in the watersheds in Bukidnon, Philippines, determine the spatial distribution of soil loss based on recent land cover maps, and predict soil loss under various rainfall scenarios based on recently reported climate change projections. The soil erosion risk assessment and soil loss prediction made use of GIS and the RUSLE model, while the rainfall scenarios were formulated based on PAGASA's prediction of drier years for Bukidnon in the early-future to late-future. Results showed that a general increase in soil loss was observed in 2015, over the period from 2010 to 2020, although some watershed clusters also showed a declining trend of soil erosion, particularly the Agusan-Cugman and Maridugao watershed clusters. Nearly 60% of Bukidnon has high to very severe soil loss rates. Under extreme rainfall change scenario with 12.61% less annual rainfall, the soil loss changes were only +1.37% and −2.87% in the category of none-to-slight and very severe, respectively. Results showed that a decrease in rainfall would have little effect on resolving the excessive soil erosion problem in Bukidnon. Results of this study suggest that having more vegetative land cover and employing soil conservation measures may prove to be effective in minimizing the risk of soil erosion in the watersheds. This study provides valuable information to enhance the sustainability of the watersheds. The erosion-prone areas identified will help decision-makers identify priority areas for soil conservation and environmental protection.

**Keywords:** RUSLE; GIS; climate change; soil erosion; erosion risk assessment

### **1. Introduction**

Soil erosion by water is a naturally occurring process associated with the hydrologic cycle. At small spatial scales, soil particle detachment by rainfall impact may predominate, but at larger spatial scales other water erosion processes are likely to dominate. Erosion of soil from catchment areas and sediment deposition in waterways can reduce storage capacity in the reservoir and degrade downstream water quality. Increasing soil erosion can also decrease soil fertility and crop yield [1], severely threatening local, national, and global food production systems and environmental sustainability [2,3]. Soil erosion severely restricts agricultural land use by reducing the productive potential of soils. It also contributes to water pollution by introducing suspended matter and nutrients into bodies of water. In addition, land that has been eroded becomes susceptible to other environmental impacts. Soil erosion is primarily caused by unsustainable agricultural practices, forest clearance, overgrazing, mining, and construction activities [4].

Soil erosion is considered one of the worst environmental issues in the Philippines [2,5]. Many parts of the country are highly susceptible to soil erosion because of their steep topographic conditions, which are compounded by the occurrence of severe rainfall events,

**Citation:** Dapin, I.G.; Ella, V.B. GIS-Based Soil Erosion Risk Assessment in the Watersheds of Bukidnon, Philippines Using the RUSLE Model. *Sustainability* **2023**, *15*, 3325. https://doi.org/10.3390/ su15043325

Academic Editors: Mario Elia, Antonio Ganga and Blaž Repe

Received: 21 December 2022 Revised: 30 January 2023 Accepted: 2 February 2023 Published: 11 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

land cover degradation, poor farming practices, and other soil-related factors [2]. Reduction in reservoir storage capacity is significantly related to increased soil erosion in the catchment areas. This is true in the case of the Pulangi IV reservoir in Bukidnon, Philippines. It was reported by Deutsch et al. [6] that the Pulangi IV reservoir was silting up at a rate of about one meter per year at the dam and that sedimentation had reduced the reservoir capacity by approximately 50%. Moreover, this contributed to the premature deterioration of hydropower turbines and frequent power outages [6]. Bukidnon is a watershed–landlocked province and the catchment area of various rivers in Mindanao.

To some extent, soil erosion can be mitigated. Better land use planning can help reduce the long-term threat of soil erosion [7]. There is a need for a straightforward and practical approach to estimating and mapping soil erosion risk that uses readily available data to improve water and soil conservation initiatives [3]. Estimating the risk of soil loss and its spatial distribution is critical for a successful assessment of soil erosion. Erosion-induced soil loss can be calculated using a prediction model such as the Revised Universal Soil Loss Equation (RUSLE) [8]. RUSLE is an empirical model for estimating soil erosion and is practical for predicting soil loss on hillslopes [2,9]. The model has been widely adopted due to its simple and straightforward computational input requirements compared to other conceptual and process-based models [10]. Numerous studies on soil erosion risk assessment and soil loss prediction around the world have been conducted using RUSLE in conjunction with Geographic Information Systems (GIS) and remote sensing techniques [1–4,8,10–14]. GIS has great potential in soil erosion inventory for soil erosion modeling and erosion hazard assessment [1] because it facilitates the manipulation, integration, analysis, and display of large amounts of spatial data, and can provide spatial distribution information on erosion [15,16].

According to the climate projection of the Philippine Atmospheric, Geophysical, and Astronomical Services Administration (PAGASA) [17], Mindanao will generally have a decreasing rainfall trend in 2050 (2036–2065). In 2050, seasonal rainfall in Bukidnon is projected to decrease significantly under high- and medium-range emission scenarios, most notably during the December–January–February (DJF) and March–April–May (MAM) seasons. Overall, the projected annual rainfall will reduce as well. The equivalent annual rainfall change based on the projected seasonal rainfall change under high-range and medium-range emission scenarios were estimated at −1.74% and −8.32%, respectively. A more recent report about the Philippine's climate extremes revealed that the projected climate will be drier across the country, with more severe conditions expected in Visayas and Mindanao. Bukidnon, in the early-future (2020–2039), mid-future (2046–2065), and late-future (2080–2099), is still projected to have drier years [18]. Therefore, it is critical to incorporate climate projection scenarios when assessing the future risk of soil erosion.

While previous studies have attempted to characterize the soil erosion characteristics in some watersheds in Bukidnon [5,19,20], no study on GIS-based soil erosion assessments in all Bukidnon watersheds exists in the published literature, particularly in a predictive mode based on recent land use data and on recently reported climate change scenarios by PAGASA. Adornado and Yoshida [21] have previously used the RUSLE model to assess soil erosion in Bukidnon. However, the land cover in Bukidnon has changed significantly over time, and the climatic conditions are also expected to change. Additionally, technological advancements have occurred in recent years, land cover maps are updated regularly, the DEM data has much higher resolution, and satellite precipitation data are already available. Thus, this study aims to generate a more updated GIS-based soil erosion risk assessment in the watersheds in Bukidnon, assess the spatial distribution of soil loss based on the land cover maps in 2010, 2015, and 2020, and predict soil loss under various rainfall scenarios based on recently reported climate change projections.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The province of Bukidnon is a landlocked province in the middle of Mindanao Island, southern Philippines, and is geographically located between 7◦18.42 –8◦38.22 N latitude and 124◦15.6 –125◦31.74 E longitude, as shown in Figure 1. The topography of the province is predominantly hilly and mountainous, especially in the eastern portion, and the other two mountain ranges in the west, have an average elevation of 915 m, and a range of 22 to 2867 m above sea level (see Figure 2a). It has rolling uplands, deep canyons, valleys alternating with the low plains, and terrain characterized by deep ravines and dense forest mountains in several mountain ranges. Due to its relatively high elevation, Bukidnon remains relatively cool and moist throughout the year. Bukidnon has a developing agricultural-based economy and is primarily a producer of rice, corn, sugar, coffee, rubber, flower, fruits, vegetables, poultry, and livestock.

**Figure 1.** Location map of the province of Bukidnon with watershed cluster map.

**Figure 2.** The (**a**) elevation map and (**b**) soil map of Bukidnon.

#### *2.2. Datasets*

For rainfall, the TRMM\_3B42\_Daily v7 was used in this study. These data are a daily accumulated precipitation product generated from the research-quality three-hourly Tropical Rainfall Measuring Mission (TRMM) Multi-Satellite Precipitation Analysis (TMPA) [22]. Accordingly, it is produced at the National Aeronautics and Space Administration (NASA) Goddard Earth Sciences Data and Information Services Center (GES DISC), as a valueadded product. A simple summation of valid retrievals in a grid cell was applied for the day data. These rainfall data in millimeters are available in raster format with a resolution of 0.25◦ × 0.25◦ (https://disc.gsfc.nasa.gov/datasets/TRMM\_3B42\_Daily\_7/summary (accessed on 10 November 2022)). For this study, the annual accumulated TRMM\_3B42\_Daily v7 was taken from the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) website (https://giovanni.gsfc.nasa.gov/giovanni (accessed on 10 November 2022)). GIOVANNI is a web-based application developed by the GES DISC that provides access to Earth science remote sensing data (https://gpm.nasa.gov/data/ sources/giovanni (accessed on 10 November 2022)). The TRMM 3B42\_Daily v7 was used in this study because it has the least overall monthly bias and most closely matches the rainfall distribution observed at weather stations, particularly the dry days and torrential rain days, across the entire Philippines, compared to other gridded rainfall products [23]. During this study, the available TRMM\_3B42\_Daily v7 data were from 1998 to 2019.

The Interferometric Synthetic Aperture Radar (IfSAR) Digital Elevation Model (DEM) with 5 × 5 m resolution was used to delineate the province's major river watersheds. This was obtained from the National Mapping and Resource Information Authority (NAMRIA) of the Philippines. The DEM was projected into WGS 84/UTM zone 51 N. The elevation map of Bukidnon is shown in Figure 2a.

The soil map of Bukidnon was extracted from the soil map of the Philippines and was downloaded from the Geoportal Philippines website (https://geoportal.gov.ph (accessed on 1 December 2022)). For some areas in the soil map with soil texture classified as undifferentiated, the Food and Agriculture Organization (FAO) Soil Map of the Philippines [24] and the Harmonized World Soil Database (HWSD) [25] were used as a reference to determine the soil textural class in those areas. The soil map of Bukidnon is shown in Figure 2b.

Three land cover maps for 2010, 2015, and 2020 were used in this study. The land cover maps of Bukidnon were extracted from the 2010, 2015, and 2020 land cover maps of the Philippines. The land cover maps and the administrative boundary map were downloaded via the Geoportal Philippines website (https://geoportal.gov.ph (accessed on 1 December 2022)), managed by NAMRIA. The land cover maps are shown in Figure 3.

**Figure 3.** The land cover maps of Bukidnon in (**a**) 2010, (**b**) 2015, and (**c**) 2020.

#### *2.3. RUSLE*

The Revised Universal Soil Loss Equation, or the RUSLE model, is a well-known and widely used empirical model for estimating soil erosion [9]. It was developed using the five following factors to estimate the average annual soil loss (*A*): rainfall erosivity factor (*R*), soil erodibility factor (*K*), slope length and steepness factor (*LS*), cover management factor (*C*), and conservation practice factor (*P*). RUSLE is expressed as:

$$A = R \times K \times LS \times \mathcal{C} \times P; \text{ in t/ha/y} \tag{1}$$

The rainfall erosivity factor (*R*) quantifies the kinetic energy impact of rainfall and predicts the rate and amount of runoff associated with that precipitation [9]. Originally, to determine the *R*-factor, rainfall intensity data is required [10,26]. Due to the unavailability of rainfall intensity records for Bukidnon, the equation for *R*-factor (Equation (2)) used by Salvacion [2] in Marinduque, Blanco and Nadaoka [27] in Laguna Lake Watershed, and Adornado et al. [28] in Quezon, Philippines was adopted. Accordingly, this simplified equation produced a result that was acceptable for tropical and humid subtropical climatic zones [9,28]. The equation is as follows:

$$R = 38.5 + 0.35P \tag{2}$$

where *R* is the rainfall erosivity factor (MJ mm/ha/h/y), and P is the average annual precipitation (mm). Three sets of *R*-factor maps were generated with three different sets of average annual precipitation maps. The three sets of the average annual precipitation maps represented the average annual precipitation of 1998–2009, 2003–2014, and 2008–2019. Each period consists of an equal number of years, with a seven-year overlap between subsequent periods. This was done because this study used three land cover maps at different periods, as previously mentioned. Furthermore, the calculation of the average annual precipitation using no less than ten years of precipitation data was based on the methodology of the previous studies [28–31].

The soil erodibility factor (*K*) indicates the soil's resistance to erosion caused by raindrop impact, as well as by the runoff generated from the rainfall. Soil erodibility is determined by geological and soil characteristics such as structure, texture, parent material, porosity, and organic matter content. Regardless of the soil concentration of sand and clay, the silt content is directly related to soil erodibility [9,26]. This study adapted the K-factor values from David [32] for each soil textural class, which was also adapted by Salvacion [2].

The *LS*-factor in RUSLE is the slope length (*L*) and steepness (*S*) factors combined to reflect the effect of regional topography on the rate of soil erosion. Cumulative runoff increases in both amount and rate as the slope lengthens. As the land slope increases, the runoff velocity increases proportionately, resulting in massive erosion [9,28]. The *LS*-factor was generated using Equation (4). Equation (4) is a method developed by Moore and Burch [33] and Moore and Wilson [34] and was used by Hrabalíková and Janeˇcek [35] in which it was proven as one of the better options to calculate the *LS*-factor. According to Andreoli [11], this expression is appropriate for areas with complex topography, including plateaus, terraced ledges, and mountains, because it considers the convergence and divergence of the flows. Using the DEM, flow accumulation and slope were generated in Quantum GIS (QGIS) through the r.flow Geographic Resources Analysis Support System (GRASS) algorithm, and slope algorithm of the Geospatial Data Abstraction Library (GDAL), respectively, in the QGIS toolbox. The equations are as follows:

$$A\_s = FA \,\,\, \* \,\, cell \,\, size \,\,\, \tag{3}$$

$$LS = \left(A\_s / 22.1\right)^m \* \left(\frac{\sin \beta}{0.0896}\right)^n \tag{4}$$

where *As* is the specific catchment area, *FA* is the flow accumulation in each grid cell and its value corresponds to the number of flowlines that traverse that grid cell, cell size is the resolution of the grid (for this study 5 m), the value of m is 0.4, n is equal to 1.3, and *β* is the slope angle [9,11,35].

The *C*-factor values represent how cropping and management practices affect the erosion rate. It is inextricably linked to land use types and is a factor in reducing soil erosion vulnerability. It is defined as the ratio of soil loss from land under specific conditions to the equivalent loss from clean-tilled, continuous fallow. Essentially, vegetative cover prevents raindrops from colliding with the soil surface and dissipates the kinetic energy of rainfall before it reaches the soil surface, slowing down runoff, thus facilitating infiltration; hence, the amount and type of vegetation cover has a significant effect on soil loss. *C*-factor is directly related to the vegetation type, stage of growth, and percentage of cover [1,9]. In this study, the *C*-factor values from David [32] and Delgado and Canters [36] were used as a reference in assigning the *C*-factor for each land cover class of the three land cover maps.

The conservation or support practice factor (*P*) indicates the effects of implementations that decrease the rate and amount of runoff, thereby reducing the amount and rate of soil erosion. It indicates the proportion of soil loss caused by a particular support practice compared to soil loss caused by upward and downward slope, contour farming, and tillage. Primary support practices include strip cropping, contour farming, terracing, cross-slope cultivation, and grassed waterways. *P*-factor values are calculated as the ratio of soil loss caused by a specific support practice to soil loss caused by row farming in both upward and downward slope conditions [9]. The value of the *P*-factor ranges from 0 to 1, where a value close to 0 indicates good conservation practice while values approaching 1 indicate poor or no erosion control practice [37]. Since no records document the extent and adoption of conservation practices in the province, though some may have adopted them, a value of 1 was assigned for the *P*-factor for the entire province.

#### *2.4. Annual Rainfall Change Scenario*

Initially, a baseline average annual rainfall scenario was established before formulating annual rainfall change scenarios. Since the accumulated annual TRMM\_3B42\_Daily data used in this study was from 1998 to 2019, the same period was used in generating the baseline scenario condition. Thus, the average annual rainfall for the period 1998–2019 was used to generate the *R*-factor of the baseline scenario.

As previously mentioned, Bukidnon is expected to experience drier years in the future. In Bukidnon, the projected rainfall on the early-future (2020–2039), mid-future (2046–2065), and late-future (2080–2099), and under both the moderate emission (RCP4.5) and high emission (RCP8.5) scenarios, range from −3.70 to −12.61% from the baseline value [18]. On this basis, three annual rainfall scenarios were generated (Table 1) to assess the risk of soil erosion under the projected rainfall conditions. Rather than using the six scenarios proposed by DOST-PAGASA et al. [18] to represent each future category and emission scenario, only three scenarios were considered because all projected annual rainfall values are consistently lower than the baseline value. To generate the rainfall amount of each scenario, the amount of rainfall equivalent to the percent change in rainfall, as shown in Table 1, was subtracted from the baseline average annual rainfall scenario. The results in each scenario were used to obtain the corresponding *R*-factor.


**Table 1.** Rainfall change scenarios.

#### **3. Results**

#### *3.1. RUSLE Factor Distribution*

GIS was used in this study to generate the RUSLE factors, calculate the average annual soil erosion rates, and produce maps showing the distribution of these factors, the soil erosion rates, and the soil erosion risk map in Bukidnon. Cell-by-cell calculations of the

mean annual soil erosion rates [1] were conducted; thus, the RUSLE factors were prepared in raster format using QGIS. Figures 4–6 show the map of the RUSLE factors used in calculating the annual soil erosion rates.

**Figure 4.** The *R*-factor map of Bukidnon derived from the average annual rainfall for the period (**a**) 1998–2009, (**b**) 2003–2014, and (**c**) 2008–2019.

**Figure 5.** The (**a**) *LS*-factor and (**b**) K-factor map of Bukidnon.

As previously mentioned, the TRMM\_3B42\_Daily accumulated annual precipitation data in raster format downloaded from GIOVANNI were used as the dataset to generate the average annual rainfall in the province. Since three land cover maps were used in the study, three maps for the average annual precipitation were also created. Each represents the average for 1998–2009, 2003–2014, and 2008–2019, respectively. Using the average annual rainfall from 1998–2019 as a reference, it was observed that most of the northern and western parts of the province in 1998–2009 were wetter while the eastern areas were drier. In addition, the average annual rainfall in 2003–2014 was generally higher in all areas of the province compared to other time periods. In 2008–2019, however, the province was predominantly wetter, with a few drier areas on the western side. The average annual precipitation maps were used to calculate the *R*-factor maps using Equation (2), thus resulting in three *R*-factor maps, as shown in Figure 4. Generally, values of the *R*-factor are much higher in the northeastern and eastern parts of the province, but lower values of the *R*-factor can be found in the northwestern part of the province, as shown in Figure 4.

**Figure 6.** The *C*-factor maps of Bukidnon for (**a**) 2010, (**b**) 2015, and (**c**) 2020.

Using Equations (3) and (4), the *LS*-factor of the province was obtained. The spatial distribution of the *LS*-factor is shown in Figure 5a. Approximately, 48.3% of the province's *LS*-factors are below 5, followed by 22.98% within 5–10, 24.16% within 10–25, 4.12% within 25–50, and only approximately 0.54% above 50. There are certain areas in the province with *LS*-factors over 500, but they only represent approximately 0.00008% of the province's total area. Higher *LS*-factor values can be found in the mountain ranges, in deep canyons, and in nearby river networks, whereas lower values can be found in the plains and perhaps the cropland areas in the province. Extremely high values of the *LS*-factor are expected, given that the range elevation in the province is relatively high, having a standard deviation of 445 m. Additionally, the slope in the province ranges from 0 to 77◦ with 16.08◦ as the average and 12.27◦ as standard deviation, and approximately 20.33% of the province is mountainous and extremely steep, with slopes exceeding 26.6◦ or 50%, resulting in some extremely high *LS*-factor. Most of the *LS*-factors over 50 are in areas with a slope above 50%. *LS*-factors over 50 were also observed by Salvacion [2] in Marinduque, Philippines. Using the same expression of the *LS*-factor used in this study, Andreoli [11] was able to obtain higher *LS*-factors over 350 and others over 2000 [38,39].

Figure 5b shows the soil erodibility factor (*K*) of the province, which ranges between 0.19 and 0.6. Lower K-factor values are more prevalent in the eastern side of the province, while higher values are generally located in the northern side and a few other areas. Most of the province has a lower K-factor, ranging between 0.19 and 0.3.

Three *C*-factor maps were generated based on the land cover maps of 2010, 2015, and 2020. As shown in Figure 6, the *C*-factor values range from 0 to 1. During *C*-factor classification based on the land cover classes, the 0 value was assigned for water bodies while 1 was set for barren and built-up areas. The rest of the land cover classes were reclassified based on data available from the existing literature [2,32,36,40]. As land cover changes over time, so does the *C*-factor distribution in the province.

#### *3.2. Soil Loss*

Figure 7 shows the spatial distribution of soil erosion rate in Bukidnon during 2010, 2015, and 2020. Because the province is a watershed area of the neighboring provinces, the Bukidnon watershed areas were clustered into seven watershed clusters based on the classification considered by Rola et al. [41], as shown previously in Figure 1. These watershed clusters include Tagoloan in the north; Agusan-Cugman, and Cagayan in the northwest; Upper Pulangi in the central and eastern side; Maridugao in the southwest; Lower Pulangi in the south; and Davao-Salug in the southeastern side of Bukidnon.

**Figure 7.** The annual soil loss map of Bukidnon in (**a**) 2010, (**b**) 2015, and (**c**) 2020.

On average, the RUSLE model-predicted soil erosion rates in Bukidnon range between 312 to 363 t/ha/y based on the periods under consideration. The predicted values range from 0 to 114,275 t/ha/y. The predicted soil erosion rates may appear to be exceedingly high. However, a study conducted in Marinduque, Philippines, showed that the RUSLE predicted soil erosion rates is around 120 t/ha/y on average, and can vary from 0 to as high as 20,767 t/ha/y [2]. These high erosion rates were observed in an island province that has a drier climate and less complex topography than Bukidnon. Additionally, the plot experiment on soil erosion in a few selected areas in Bukidnon revealed that a month of 5 mm rainfall can cause up to 229 t/ha of soil loss on certain plots with a slope of 15%. Although soil deposition can also occur, reaching 400 t/ha in some plots [42], this could also mean that somewhere near the plots an accumulated soil loss equal to that amount is also possible. Thus, the RUSLE-predicted soil erosion rates obtained in this study are comparable with the empirical evidence obtained from previous studies in the Philippines. As illustrated in Figure 7, areas prone to soil erosion are primarily found along the buffer zones of the major mountain ranges in Bukidnon. These areas are frequently found in deep river canyons and rolling areas that were previously used for crop production. The predicted soil erosion rates in these areas are extremely high, exceeding 300 t/ha/y on average. It can be observed in Figure 7 that the areas with lower soil erosion rates are the plains and mountain ranges that have good vegetative cover, the protected areas of the province.

Figure 8 illustrates the mean value of the average annual soil erosion rate for each watershed cluster. The thin lines in the graph represent the magnitude of standard deviations on top of the mean soil erosion rates. The predicted soil loss in 2020 is lower than in 2015 in most areas of the watershed clusters, except for Tagoloan. This decline may be attributable to a wetter climate in 2003–2014, compared to 2008–2019, that was used in deriving the *R*-factor. The exception in Tagoloan may be attributed to the decrease in vegetative cover in 2020, especially in the forest areas, despite the climate in 2008–2019 being drier than in 2003–2014. The soil erosion rates in the Maridugao watershed cluster are slightly decreasing. In contrast, the Upper and Lower Pulangi, and the Davao-Salug watershed clusters have an increasing soil erosion rate from 2010 to 2020, of which the highest can be observed during 2015. The increase in soil erosion in the Upper Pulangi watershed cluster may have contributed more to the accumulation of sediments in the Pulangi reservoir.

As shown in Table 2 and Figure 8, Maridugao and Tagoloan have the highest mean soil erosion rates among the watershed clusters. From 2010 to 2020, the Tagoloan watershed had the highest maximum soil erosion rates. This can be attributed to a combination of higher *LS*-factor and *R*-factor values in the region. In fact, the cluster with the highest *LS*-factor can be found in Tagoloan watershed cluster. In contrast, the Davao-Saug cluster in 2010, the Agusan-Cugman cluster in 2015, and the Cagayan watershed cluster in 2020 all have lower maximum values of soil loss rates. Table 2 contains statistical values for soil erosion rates predicted by RUSLE by watershed cluster.


Several areas in Bukidnon had extremely high values of the predicted soil erosion rates, as shown in Figure 7. This could be the result of setting the *P*-factor value equal to one for the entire province and could also be attributed to the resolution of the DEM that was used to generate the *LS*-factor. The resolution may affect the computation of the *LS*-factor. Hrabalíková and Janeˇcek [35] predicted soil loss rates closer to those observed while using a similar *LS*-factor expression to that used in this study, and a DEM with1mresolution. Hence, to avoid overestimating the predicted soil loss, a much higher DEM resolution would be preferable, as would documentation of conservation practices in the area.

Based on the predicted values of soil erosion rates shown in Table 3, the areas with very severe soil erosion in Bukidnon decreased slightly in 2020 compared to 2015, by about 1.77%, equivalent to 16,564 ha. Though it can be observed that there was a slight increase in the severe areas from 2015 to 2020, the moderate and high soil erosion areas had decreased by 2.03, and 3.66%, respectively. This decrease has led to an increase in the extent of areas under none to slight soil loss categories by about 7.1%, equivalent to 66,445 ha. When the results in Table 3 were compared with those reported by Adornado and Yoshida [21], the extent of severe and very severe areas had increased by about twofold. The extent of severe areas went from 6.66% to 13.45% on average, while very severe areas went from 15.19% to as high as 34.16% on average. The twofold increase on these areas has been prevalent since 2010. This may be due to the decrease in the areas classified as none to slight, high, and very high soil loss categories. The differences of the results may also be attributed to the nature of the model inputs used in the earlier study. In that study, a DEM was derived from 100 m interval contour lines to obtain an *LS*-factor. The generated DEM they used had less details about the topography of the province and this could have affected the calculation of the *LS*-factor. Additionally, their *C*-factor was generated from an analog land cover map and the ASTER image available during that time [21]. Nevertheless, both results indicate the potential severity of soil erosion in Bukidnon as the rates of the very high to very severe soil erosion areas have not dropped significantly from 2015 to 2020, based on the predicted soil erosion rates, and this poses a serious problem on the sustainability of land and water resources in this province.


**Table 3.** The extent of erosion in Bukidnon 1.

Note: <sup>1</sup> Bukidnon total area = 935,846 ha, based on the calculation using GIS.

#### *3.3. Soil Loss under Annual Rainfall Change Scenarios*

As expected, the predicted rate of soil loss (Table 4) decreased under the very severe category while it increased under none to slight, high, very high, and severe soil loss categories, in all the rainfall scenarios, as all generated rainfall scenarios had negative percent changes. The areas categorized as none to slight increased with a range of 0.4 to 1.37% against the baseline scenario, equivalent to 3743 to 12,821 ha. The extent of the very severe areas fell within the range of 0.79 to 2.87%, which means around 7393 to 26,858 ha of land will no longer experience very severe soil loss if annual rainfall decreases by 3.17 to 12.61% in future decades. However, this reduction in very severe areas will also result in an expansion of areas classified as having high, very high, and severe soil loss. In addition, approximately no more than 0.9% (8423 ha) of the province of Bukidnon may experience a decrease in the moderate soil loss rates areas. The extent of very severe soil loss areas may reduce significantly but remains small compared to the total size of the province.

Tables 5 and 6 depict the extent of areas under the baseline scenario and the third rainfall change scenario (R3) at various soil loss categories for each watershed cluster. The extent of areas for each soil loss category are expressed in percent relative to the total area of each watershed cluster. The results indicate that the extent of the none to slight soil loss category for most of the watershed clusters will increase relative to baseline values under the R3 scenario. By contrast, the extent of the very severe areas will decrease relative to baseline values under the R3 scenario. The extent of the none to slight soil loss categories will increase between 0.48 and 1.98%, whereas the extent of the very severe areas will decrease between 2.49 and 3.89% of the area of the watershed clusters.


**Table 4.** The extent of soil erosion under rainfall change scenarios in Bukidnon 1.

Note: <sup>1</sup> Total area considered = 935,846 ha, based on the calculation using GIS.

**Table 5.** Percent (%) of the extent of soil erosion under baseline rainfall scenario in Bukidnon 1.


Note: <sup>1</sup> Total area considered = 935,846 ha, based on the calculation using GIS.

**Table 6.** Percent (%) of the extent of soil erosion under R3 rainfall change scenario in Bukidnon.


As shown in Table 6, the Upper Pulangi, Davao-Salug, and Cagayan watershed clusters will consistently and significantly have larger areas with none to slight soil loss. On the other hand, Lower Pulangi, Maridugao, and Agusan-Cugman watershed clusters will have a smaller proportion of areas with none to slight soil loss but will have a greater proportion of areas with very severe soil loss in future decades. Relative to the watershed size, an extremely large extent of area with very severe soil loss will be in the Lower Pulangi watershed cluster. The distribution of soil loss, particularly in Maridugao and Lower Pulangi watershed clusters, appears to be negatively skewed. This means that there is a greater proportion of areas that have experienced very high to very severe soil loss, as shown in Table 5, and this will still happen, as shown in Table 6, despite an overall decrease in the very severe areas and an increase in the none to slight soil loss areas, as shown in Table 4. Significant variation can be found when comparing the soil erosion rates of the baseline scenario (Table 5) and the R3 scenario (Table 6) in all soil loss categories, except for the none to slight and very severe categories. The extent of areas under the moderate soil loss category in Agusan-Cugman and Lower Pulangi will potentially increase while the rest of the watershed clusters will decrease. All the watershed clusters will experience increases in the high, very high and severe soil loss rate categories.

The predicted soil loss using the RUSLE model at various time periods and under the rainfall change scenarios provided insights on the soil loss behavior in Bukidnon watershed areas. GIS was crucial in identifying the distribution of soil loss both in provincial and watershed cluster scales. The predicted soil loss under various rainfall change scenario may serve as baseline information for determining the potential soil loss in future decades under future rainfall conditions. Results showed that a reduction of approximately 12.61% of the annual rainfall could reduce the extent of very severe areas up to 2.87% and could increase the none to slight soil loss areas at 1.37% of the total area of Bukidnon. This implies that the reduction in rainfall alone will have little effect on the severity of erosion in the province. This suggests that increasing land cover and adapting soil conservation measures may be a more effective way to mitigate the severity of soil erosion than just anticipating for the province to experience less rainfall. Moreover, it is necessary to map and monitor areas adopting and implementing soil conservation measures and practices in order to compare the soil erosion in the areas with and without conservation measures. Not only will this help reduce soil loss, but it will also allow for a more accurate prediction of soil erosion. Nevertheless, the maps presented in this study may help planners in identifying priority areas for soil conservation measures, particularly those with excessive soil loss. Neighboring provinces downstream of the major rivers in Bukidnon should consider coordinating their efforts to implement conservation measures with those in the province of Bukidnon, as they are not exempted from the effects of excessive soil erosion in Bukidnon.

#### **4. Discussion**

The RUSLE model has proven to be a practical method for assessing soil erosion risk at the watershed scale and the impact of the rainfall change to soil erosion-prone areas in the province of Bukidnon. Excessive soil loss occurs on steep hillslopes with less vegetation that are devoid of support and conservation practices. As observed during the generation of the *LS*-factor, the *LS*-factor is sensitive to the resolution of DEM. The higher the resolution of DEM, the wider the range of values of the *LS*-factor. This is because the expression of the *LS*-factor resulted in much higher maximum values at a higher resolution [39], which may add to the uncertainty of the predicted soil erosion rates. Michalopoulou et al. [39] suggested a method to avoid overestimation of the *LS*-factor; however, it has not been evaluated and validated whether this strategy prevents overestimation or can lead to underestimation of the *LS*-factor. In addition, different land cover classification of land cover maps at different time periods made it difficult to compare the predicted soil loss rates between each period and from earlier studies, as changes in land cover classification entail a different *C*-factor value. Nevertheless, the results indicated that predicted soil loss under drier rainfall change scenarios was less significant compared to the size of the province, implying that the severity of soil erosion in the province may not necessarily be reduced just by experiencing less rainfall alone. Increased land cover and the adoption of conservation measures such as conservation agriculture may reduce the risk of extreme soil erosion more than anticipating future rainfall reductions.

The new updated information generated in this study could serve as the basis for the formulation of policies geared towards soil conservation and environmental protection in the province. The approach developed and employed may also be extended to other erosion-prone provinces and regions in the country. The application and integration of RUSLE and GIS to identify the areas most vulnerable to soil erosion will allow policymakers and decision-makers to identify priority areas to focus in implementing soil erosion control measures in the future. It is highly recommended to use DEM with higher resolution, whenever available, and promote the implementation and documentation of soil conservation and support practices in the province. It is also recommended that future studies consider alternative expressions for generating the *LS*-factor that would limit its overestimation and underestimation, and to use field measurements for evaluation and validation purposes.

**Author Contributions:** Conceptualization, I.G.D. and V.B.E.; methodology, I.G.D.; formal analysis, I.G.D. and V.B.E.; writing—original draft preparation, I.G.D. and V.B.E.; writing—review and editing, I.G.D. and V.B.E.; visualization, I.G.D.; supervision, V.B.E.; funding acquisition, I.G.D. and V.B.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was funded by the Department of Science and Technology—Science Education Institute (DOST-SEI) and Engineering Research and Development for Technology (DOST-ERDT) of the Philippines.

**Institutional Review Board Statement:** The study was approved by the University of the Philippines Los Baños, Graduate School.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This study was conducted under the auspices of DOST-SEI and ERDT program. Acknowledgement is due to all agencies that provided the needed data used in this study.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Assessing Landslide Susceptibility by Coupling Spatial Data Analysis and Logistic Model**

**Antonio Ganga 1, Mario Elia 2,\*, Ersilia D'Ambrosio 2, Simona Tripaldi 3, Gian Franco Capra 1, Francesco Gentile <sup>2</sup> and Giovanni Sanesi <sup>2</sup>**


**Abstract:** Landslides represent one of the most critical issues for landscape managers. They can cause injuries and loss of human life and damage properties and infrastructure. The spatial and temporal distribution of these detrimental events makes them almost unpredictable. Studies on landslide susceptibility assessment can significantly contribute to prioritizing critical risk zones. Further, landslide prevention and mitigation and the relative importance of the affecting drivers acquire even more significance in areas characterized by seismicity. This study aimed to investigate the relationship between a set of environmental variables and the occurrence of landslide events in an area of the Apulia Region (Italy). Logistic regression was applied to a landslide-prone area in the Apulia Region (Italy) to identify the main causative factors using a large dataset of environmental predictors (47). The results of this case study show that the logistic regression achieved a good performance, with an AUC (Area Under Curve) >70%. Therefore, the model developed would be a useful tool to define and assess areas for landslide occurrence and contribute to implementing risk mitigation strategy and land use policy.

**Keywords:** landslide; logistic regression; environmental hazard; risk assessment

#### **1. Introduction**

Landslides, one of the most devastating natural/human-induced disasters worldwide, cause injuries and loss of human life as well as damage to properties and infrastructure [1,2]. Haque et al. [3] estimated that from 1995 to 2014 in Europe, there were a total of 476 landslide events, with 1370 deaths and 740 injuries recorded. These alarming numbers illustrate the need to enhance national and regional efforts to prevent or minimize such impacts on human and natural assets.

With 620,808 recorded events, Italy is one of the most affected European countries [4]. Every year landslide events cause the displacement of hundreds of thousands of people, building and infrastructure damages, and conspicuous loss of cultural heritage. To show the magnitude of the recent status, 172 main events were recorded in 2017, 146 in 2016, 311 in 2015, 211 in 2014, and 112 in 2013. Related economic damages have been estimated at c.a. EUR 2.5 billion per year [4] in the period between 2014 and 2020.

Identifying critical zones becomes essential to managing landslide susceptibility, especially in areas where landslides severely threaten human and natural resources and the country's massive cultural heritage. In this regard, the scientific community can help decision-makers by developing ad hoc risk analysis models based on empirical evidence and drivers of landslide occurrence. Geomorphological and topographic conditions, seismicity, intense rainfall, and anthropogenic factors, such as road and railway infrastructures,

**Citation:** Ganga, A.; Elia, M.; D'Ambrosio, E.; Tripaldi, S.; Capra, G.F.; Gentile, F.; Sanesi, G. Assessing Landslide Susceptibility by Coupling Spatial Data Analysis and Logistic Model. *Sustainability* **2022**, *14*, 8426. https://doi.org/10.3390/su14148426

Academic Editor: Salvador García-Ayllón Veintimilla

Received: 11 May 2022 Accepted: 4 July 2022 Published: 9 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

urban sprawl, land cover changes, and agricultural practices, represent the main predisposing and triggering factors of landslide events [5–7].

Studies focusing on landslide susceptibility assessment can provide an outstanding contribution to these purposes since they estimate the spatial distribution of landslide occurrence probability in a given area based on predisposing factors [1]. The literature is characterized by inventory-based studies, data-driven (including bivariate and multivariate statistics) analysis, knowledge-driven works, and probabilistic, physically-based, and deterministic approaches [2]. Recently, several machine learning-based techniques have been tested to develop landslide monitoring tools. However, these methods still feature significant drawbacks for modeling complex algorithms, including uncertain model performances, scarce aptitude to detect local variability, and the use of "black boxes", which is not appreciated yet by the operative world. Consequently, most landscape managers still prefer more traditional and sound models, such as logistic regression [8].

In fact, the most widely used method at local and regional scales is logistic regression [2,9,10], which has an advantage over other multivariate statistical methods in that it is independent of data distribution and able to handle continuous, categorical, and binary data [11]. In a logistic regression approach, the binary dependent variables are related to a vast dataset of independent drivers, such as topographic parameters, land uses, and others [12]. This study aims to test logistic regression in a survey area (Apulia, Italy), characterized by frequent seismic events and historically affected by widespread landslides. Specifically, we aimed to (*i*) understand the relationship between selected environmental variables and the occurrence probability of landslide events and (*ii*) produce landslide susceptibility maps for management, mitigation, and prevention purposes.

The proposed analytical framework can be applied to other areas worldwide with similar environmental features. Consequently, results may be helpful for landscape/environmental planners involved in landslide risk mitigation and land-use planning activities.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The research was carried out across the Dauni Mountains (902.9 km2; Min. Lat. 41◦4 32.01", Min. Long.14◦55 59", Max. Lat. 41◦55 35.27", Max. Long. 15◦31 51.6") which are located in the western part of the Apulia Region (Italy) (Figure 1). They belong to the Dauno Sub-Apennine dominion, representing the marginal external front of the Southern Apennine chain [13,14]. The chain units, affected by intense tectonic deformations, are thrusted over the Bradanic Trough unit, which in turn overlays the Apulian Foreland units. Folds and thrust structures and eastward verging, along with clay-rich flysch formations, characterize the structural and geological framework of the area [15,16].

As reported by the parametric catalog of Italian earthquakes CPTI15 [17,18], the study area has low seismic activity. However, more intense seismic activity characterizes the nearby Southern Apennines and Gargano Promontory areas, located less than 50 km from the study area [19,20]. Due to the more frequent and more energetic seismic activity of the Southern Apennine chain, the study area may be affected by seismic shaking able to induce landslides, as reported by Del Gaudio et al. (2012), who created a seismic hazard map of an area of the Daunia.

The elevation ranges between 73 and 1135 m a.s.l. (mean 566 m a.s.l.). The morphology is typically hilly-mountainous, strongly shaped by landslides favored by lithological and soil features, seismicity, and the area steepness (*vide infra*). The most diffuse lithotype are clay-limestone (CL), sands and conglomerates (SC), and clayey and clayey-limestone (CCL) units. Soils are strongly influenced by both parent material and the whole environmental features, being more developed in CL and CCL units (Alfisols and Ultisols) [21,22] while less (Inceptisols and Entisols) in SC units and where landslide movements modify their original nature. Land uses are featured by common arable crops (70%, with olives as main tree crops), broad-leaved, coniferous, and mixed forests (21%). Urban areas are very limited (1%). Mean annual rainfall (1918–2007) ranges from 533 to 931 mm. Mean monthly

temperatures range from 5 ◦C in January to 25 ◦C in August, negatively correlated with the altitude.

**Figure 1.** Dauni Mountains study area location with landslide detachment niches identification.

The study area is characterized by a high seismic hazard that, together with the previously reported environmental features, makes it extremely prone to landslide events; between 1918 and 2007, 515 mass movements were registered (Table 1).


**Table 1.** Number of landslides registered within the study area (1918–2007) with their main features.

### *2.2. Data Collection*

We extracted data from multisource and multiscale geographic databases (Table S2). A total of 47 predictors were selected according to 7 main macro-class classification/factors: topography, seismic, geolithologic, pedologic, land use, morphologic, and climatic. Some of them are represented in the Figure S1.

The whole investigated area was divided into several grids of 250 square meters each. For each grid, the value of all predictors (vide supra) was calculated using GIS software. The seismic factor was evaluated using data from the national database and additionally evaluating the fault distance by applying the Euclidean distance algorithm.

Concerning the dependent variable, the location of landslide detachment niches was obtained from the landslides regional inventory. Since this database provides information concerning landslides verified between 1918 and 2007 without providing a specific occurrence date, the causative factors' assessment was limited to these years. Thus, climate causative factors were quantified as mean values for the same period. Meanwhile, the study area was reduced to subareas in which land use did not vary throughout this period in order not to consider the effect of land cover variation on slope stability. Available Corine Land Cover Maps (1990, 2000, 2006) were crossed, and only land cover persistence areas (582.2 km2) were retained for further analysis. Overall, the total number of landslide detachment niches considered within the study area was 328.

Finally, the modified study area was divided into 9316 cells of 250 × 250 m, and, for each of them, the mean values of the causative factors were assessed. In addition, a dichotomic value (i.e., 0 or 1) linked to the absence or presence of landslide detachment niches was assigned to each cell as the dependent variable's value.

#### *2.3. Causative Factor Selection*

The first fundamental step was factor selection. Several factors occur in the process of landslide hazard assessment [1,2,23]. In fact, they are the result of interaction between intrinsic and external factors [24]. As a matter of fact, there are no definitive factor selection lists, and the tendency is often to rely on accessible databases [9]. However, the selected model allows for the use of a large number of independent variables, and therefore, a large number of databases have been considered, which are collected in the categories below (Figure 2).

**Figure 2.** Flow chart of the methodological approach used.

The attached table (Table S1) shows all the calculated factors, the original source, and the spatial/temporal resolution. These data were acquired from public data infrastructures managed by individual regions, Italian Environmental Agencies (such as ISPRA), and the European Environmental Agency (EEA). According to [23], we classified the factors into (*i*) predisposing factors (e.g., topography and geological substrate), (*ii*) triggering factors (e.g., precipitations and earthquakes), and (*iii*) accelerating factors (e.g., humanaltered systems).

Regarding predisposing factors, the following groups of predictors were considered:


characteristics of hydrological processes [25]. The main topographical indices are reported in Table S1.


**Table 2.** List of causative factors.


Considered triggering factors were:


ing it. This is especially true when located in areas with critical topographical and lithological conditions [39].


#### Variable Selection

Predictors were selected starting from the original 47 × 9316 observation matrix dataset. In a multivariate statistical analysis, the selection process is one of the most important tasks [40]. In this work, the elimination of highly correlated predictors was carried out in the first variables selection according to previous works [41,42]. In this step, the predictors with Pearson's correlation coefficient >0.75 were removed. The final correlation matrix is reported in Table S1.

#### *2.4. Logistic Regression*

Logistic regression analyzes the relationship between multiple independent variables and a categorical dependent variable while estimating the occurrence probability of an event by fitting data to a logistic curve [43]. The logistic function is described as [31]:

$$P = \frac{1}{1 + (e\varepsilon p^{-Z})} \tag{1}$$

where *P* is the probability of landslide occurrence and *Z* a linear combination of casual *X*<sup>i</sup> factors:

$$Z = \beta\_0 + \beta\_1 \mathbf{x}\_1 + \beta\_2 \mathbf{x}\_2 + \beta\_3 \mathbf{x}\_3 + \dots + \beta\_n \mathbf{x}\_n \tag{2}$$

where *β<sup>i</sup>* are the landslide casual factor's coefficients.

The models were performed using R statistical software. In order to operate a predictor's selection, the regressions were performed using a stepwise model. Stepwise regression is a set of iterative search and model comparison procedures that identify which independent variables have the strongest association with the dependent one (Draper and Smith, 1981; Hauser, 1974). Using this approach, the model was performed for each type of landslide and for the total landslide.

For the four models, the Area under Curve (AUC), the Accuracy, and the Akaike information criterion (AIC) were calculated. In order to evaluate the predictive capacity of the logistic model, the McFadden pseudo R2 area was calculated. This is intended as a logistic regression analog of R2 and uses ordinary least-squares (OLS) regression [44].

#### **3. Results**

#### *3.1. Main Factors Affecting Slope Stability*

Table 3 reports the results of significant drivers for each model. In particular, stepwise logistic regression was performed using four different sub-datasets, from the whole landslide dataset, according to the following typology: (*i*) total landslides (TL); (*ii*) rotational/translational landslide (RTL); (*iii*) slow earth flow (SEF), and (*iv*) complex landslide (CL).

**Table 3.** List of coefficient drivers. Table S2 reports a complete list of factors with their significant metadata. The value of factors removed by stepwise logistic regression was not reported.



**Table 3.** *Cont.*

\*\*\* *p* < 0.001, \*\* *p* < 0.01, \* *p* < 0.05, **.** *p <* 0.1.

#### *3.2. Model Performance*

In Table 4, performances of the model applied on four sub-datasets (*vide supra*) were reported.

**Table 4.** Models performance.


\* AIC = Akaike Information Criterion. \*\* AUC = Area Under Curve.

Slow earth flow showed the best performance, with an accuracy >0.75, meaning that this can be considered an accurate model. Overall, all implemented models showed values > 0.70, thus considered suitable. The area under curve (AUC) values are comparable with previous similar studies [45]. To evaluate the model's performance, McFadden's Pseudo R squared, instead of R squared, was applied. A McFadden's Pseudo R squared between 0.2–0.4 indicates a very good fit. The RTL and SEF landslide models showed the best fit with 0.19 and 0.21, respectively.

#### *3.3. Susceptibility Mapping*

According to the method performed by Elia et al. (2019), four susceptibility maps were created; each recorded landslide's detachment niche was georeferenced.

Implemented models and resulting maps show the landslide occurrence probability within the study area (Figure 3). Based on probability values, areas were classified in terms of susceptibility as: (i) high (>0.65, red areas); (ii) moderate (0.50–0.65, orange); (iii) moderate-low (0.30–0.50, yellow); (iv) low (20–30, light green); and (v) very low (0.00–0.20, green). In the total landslide map, areas affected by high-level susceptibility are distributed mainly in the northwest area. Conversely, in RTL maps, areas featured by high-level susceptibility are localized in the southern part of the region. In the slow heart flow model maps, the susceptibility distribution is closer to that observed for the total model; however, areas with high susceptibility are more concentrated in the extreme northwestern part. In the complex landslides map, high-level susceptibility spots were identified in several parts of the investigated area. Overall, in every implemented map, the low susceptibility is concentrated in the central–east part of the study area, in correspondence with the less hilly areas. In fact, the study area shows an elevation gradient from west to east, and the landslides distribution follows this gradient, especially in RTL and CL.

**Figure 3.** Landslide triggering probability distribution maps. The black dots represent the landslide positions.

#### **4. Discussion**

#### *4.1. Logistic Regression Metrics*

The assessment of areas likely to produce land movement phenomena was carried out by adopting a logistic regression model, one of the most used approaches in landslide susceptibility analysis. In fact, it allows predicting where landslides are more likely to occur based on the relationship between past events and ad hoc selected causative factors. The logistic regression was demonstrated to be highly suitable in assessing what the most correlated and the most predictive factors in estimating the landslide occurrence susceptibility are. This method provided consistent and significant results in the investigated area. By carrying out a cutoff of the probability of occurrence equal to 0.5, it appears that about 80% of the territory has been classified correctly, i.e., that for 80% of cases where no event has occurred, the probability is lower than 0.5; on the opposite, it is higher where events occurred.

According to the literature, AUC values of 0.5–0.7 are considered low accuracy, 0.7–0.9 suggest valuable applications, and values around 0.9 indicate high accuracy (Swets 1988). In our study, the AUC indicates 0.76, 0.68, 0.80, and 0.69 for TL, RTL, SEF, and CL, respectively, between predicted probabilities and observed outcomes. The findings are consistent with other models developed to estimate landslide susceptibility worldwide, especially given the high rate of human and natural variability across the study area [34,46,47].

#### *4.2. Correlation Analysis*

As shown in Table 2, the correlation analysis revealed that only 37 of the 47 causative factors were uncorrelated and were thus included in the subsequent logistic regression. These factors are related to topography (T\_E, T\_F\_acc, T\_FLAT, T\_NW, T\_TPI, T\_S, T\_TWI, T\_VRM, T\_W), seismicity (S\_Epic), geolithology (G\_TA, G\_SSC, G\_CM, G\_C, G\_DDB, G\_LG, G\_SC, G\_SSM, G\_CCL, G\_ML), land cover (LU\_urb, LU\_grs, LU\_for, LU\_nat), morphology (D\_riv, D\_road), and climate (C\_p\_max). It is well known [48–51] that the slope length and steepness factor (T\_Ls) is one of the main landslide predictors, being a parameter used to characterize the effects of topography and hydrology on soil loss [52,53]. In our study, the T\_Ls showed the highest coefficient in predicting the totality of recorded events and one of the most predicting variables for the other landslide types, evidencing its influence in the evolution of landslide types. Similarly, Huangfu et al. (2021) [54] found that slope is one of the three most important predictors affecting landslide geo-hazard risk in China.

Generally speaking, landslides occurring on steep slopes are triggered by a reduction of the apparent cohesion of colluvium (soil and debris accumulated upon an impermeable bedrock), resulting from water infiltration into the soil. Concave shape also affects the development of rotational/translational mass movement, reflecting the importance of the hydrological regime in slope stability. In these areas, a convergence of surface and subsurface water streams saturates rapidly the soil, which becomes more prone to movements [51].

As expected, most of the other topographic variables showed a high predicting power in landslide occurrence. For example, T\_TPI showed an elevated ability to predict the landslide, particularly the SEF. This finding is consistent with other studies [55,56], showing how the TPI is essential for slope stability analysis aiming at classifying priority zones in landslide occurrence [57]. However, the configuration of such a set of predictors is quite different among the four landslide types. Table 2 suggests that topographic and land cover variables are the most significant predictors for RTL type; seismic and geolithological variables had the highest predicting power for SEF, while the presence of forest and annual maximum rainfall significantly affects CL. These findings demonstrated that different landslide types are regulated by different predisposing or preparatory factors [58]. For example, distance to earthquake epicenter (S\_Epic) had only a positive and significant relationship with the SEF landslide type. This is probably because mountainous landscapes, susceptible to landslides, are often characterized by moderate rates of earthquake events. Many authors similarly found this relationship [59–61], arguing that the presence of a fault damage

zone is the primary control on the distribution of earth flows; this was primarily due to ground quaking and the subsequent expansion of superficial debris favoring accelerated water infiltration.

On the contrary, all types of landslides showed a negative relationship with the distance from the roads, while only two types with the distance from rivers. The more the distance from the road increases, the more the probability of a landslide decreases. This negative trend was observed by other authors worldwide [62–65], highlighting the importance of anthropogenic impact on slope stability. Additionally, other authors [26,66] reported that the landslide susceptibility gradually increases up to a proximity zone of 200 m from a road network.

#### *4.3. Perspectives*

Our findings in the Apulia region are of primary importance for a more broad assessment of landslide and seismic risk management. The proposed approach can be applied to a larger scale of analysis which constitutes one of the future perspectives. Indeed, we recognize the limits of our study. Firstly, to achieve a significant landslide susceptibility assessment, we used the most detailed available datasets on landslide occurrence and environmental predictors. However, although we employed the most harmonized and available dataset, we are aware that there are several limitations in its accuracy. Numerous factors such as the phenomenon's complexity, the high diversity of geomorphic landslide features, and the use of different mapping and sampling procedures undoubtedly affected the accuracy and precision of the applied landslide dataset [67–69]. Additionally, our model results from data collected in dynamic landscapes and ecosystems constantly subjected to changes. Therefore, starting from these findings, new and innovative approaches must be developed as new knowledge is obtained from a geological, hydrological, and geotechnical point of view.

#### **5. Conclusions**

Landslides are one of the most devastating natural disasters, causing human and economical losses. In order to mitigate landslide risk, it is fundamental to identify critical risk zones. In the current study, logistic regression was applied to a landslide-prone area in the Apulia Region (Italy) in order to identify the main causative factors and produce a landslide susceptibility map. The model has never been applied before in the study area.

In this case study, it was found that logistic regression achieved a good performance, with an AUC value higher than 70%. Thus, it could constitute a useful tool in identifying critical areas for landslide occurrence and defining a risk mitigation strategy and land use policy. For example, in high–moderate susceptibility areas, mitigation infrastructures, resilience building, and a relocation strategy for families living in those areas could be provided. In addition, as a result of this study, it was found that human infrastructures, such as road networks, are one of the main elements triggering landslides. Thus, peculiar attention must be paid to the choice of the route when designing the transport networks.

However, further detailed studies should be conducted to compare different models, widen the number of causative factors considered (i.e., geotechnical data), and assess landslide susceptibility individually for each type of landslide since the current study combined different types of landslides. Nonetheless, this study gives a comprehensive and preliminary understanding of the likely future insights for researchers and policy makers.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/su14148426/s1; Table S1: Correlation matrix; Table S2: Factors table; Figure S1: maps of some important driving factors.

**Author Contributions:** Conceptualization, A.G., M.E. and E.D.; methodology A.G. and M.E.; software, A.G. and E.D.; validation A.G., M.E., F.G. and G.S.; formal analysis, A.G.; investigation, A.G., M.E. and E.D.; resources, A.G., M.E., F.G. and G.S.; data curation, A.G., M.E. and E.D.; writing—original draft preparation, A.G., M.E., F.G., G.F.C., S.T. and G.S.; writing—review and

editing, A.G., M.E., F.G., G.F.C., S.T. and G.S.; visualization A.G.; supervision, F.G. and G.S.; project administration, F.G. and G.S.; funding acquisition, F.G. and G.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research has been developed within the project "Earthquake disasters management integrated system—ERMIS" (Code 5003751) by INTERREG V-A Greece-Italy (EL-IT) 2014–2020.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Soil Order-Land Use Index Using Field-Satellite Spectroradiometry in the Ecuadorian Andean Territory for Modeling Soil Quality**

**Susana Arciniegas-Ortega 1,2,\*, Iñigo Molina 1,\* and Cesar Garcia-Aranda 1,\***


**Abstract:** Land use conversion is the main cause for soil degradation, influencing the sustainability of agricultural activities in the Ecuadorian Andean region. The possibility to identify the quality based on the spectral properties allows remote sensing methods to offer an alternative form of monitoring the environment. This study used laboratory spectroscopy and multi-spectral images (Sentinel 2) with environmental covariates (physicochemical parameters) to find an affordable method that can be used to present spatial prediction models as a tool for the evaluation of the quality of Andean soils. The models were developed using statistical techniques of logistic regression and linear discriminant analysis to generate an index based on soil order and three indexes based on the combination of soil order and land use. This combined approach offers an effective method, relative to traditional laboratory methods, to derive estimates of the content and composition of soil constituents, such as electrical conductivity (CE), organic matter (OM), pH, and soil moisture (HU). For Mollisol index.3 with Páramo land use, a value of organic matter (OM) ≥8.6% was obtained, whereas for Mollisol index.4 with Shrub land use, OM was ≥6.1%. These results reveal good predictive (estimation) capabilities for these soil order–land use groups. This provides a new way to monitor soil quality using remote sensing techniques, opening promising prospects for operational applications in land use planning.

**Keywords:** remote sensing; soil quality; soil properties; indices; Andean region

### **1. Introduction**

Soil is the main natural resource for food and energy production [1]. It controls the movement of water in the landscape and functions as a biological filter for the possible leaching of pollutants into environmental spheres [2]. However, soil can be degraded by chemical and physical processes, which reduces its ability to function as a base for the development of a healthy layer for vegetation. Therefore, acknowledging soil conditions by the effects on vegetation can represent site conditions [3,4].

Gholizadeh and Kopaˇcková (2019) [1] considered that conventional methods of soil health evaluation in large areas involve several expensive and time-consuming variables such as collection of field data, chemical analysis in a laboratory, and geostatistical interpolation. Alternately, several studies have shown the possibility of characterizing soils and identifying their quality by correlating both physicochemical and spectral parameters [5]. Therefore, the use of remote sensing spectrometry products in environmental evaluation studies offers a complementary alternative to in situ monitoring procedures to aid in research, control, and monitoring of the soil component. The application field for these tools in the soil component is extensive [6] through the study of soil characteristics such

**Citation:** Arciniegas-Ortega, S.; Molina, I.; Garcia-Aranda, C. Soil Order-Land Use Index Using Field-Satellite Spectroradiometry in the Ecuadorian Andean Territory for Modeling Soil Quality. *Sustainability* **2022**, *14*, 7426. https://doi.org/ 10.3390/su14127426

Academic Editors: Mario Elia, Antonio Ganga and Blaž Repe

Received: 16 May 2022 Accepted: 15 June 2022 Published: 17 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

as reflectance, degradation, and possible polluting agents with the processing of satellite images permitting the inspection and monitoring of large areas in a fixed time and place [7].

In Ecuador, the properties and pedogenetic processes of soil have been studied in terms of rock type, geomorphology, taxonomic classification, and soil order. An increasing breach between the available information on the main soils and their quality [8] leads to understanding the condition of the soil to allow for the planification of healthy and sustainable territories, as determined by goals 12 and 15 of the Sustainable Development Goals (SDG) [9]. These goals correspond to responsible consumption and production, and life on land. Therefore, it is necessary to determine the quality of soil to develop fast, feasible, and affordable estimation methods for monitoring and assessing areas.

In the study area, the predominate soils are Andisol and Mollisol, which originate from weathering of volcanic material (ash) [10]. These relatively young soils can convey high agricultural potential [11]. Andisols, also known as *páramos*, are clay loam soils capable of retaining enormous amounts of water; on the contrary, Mollisols are fertile soils with a high organic matter content that cover approximately 70% of the Cayambe canton, a political–administrative unit of Ecuador, where the research's basin is located. However, the lack of land use and occupation policies has caused the expansion of agricultural activity boundaries [12], causing the loss of the *páramo*. The main goal is to understand whether there is any relationship between the spectra measured in the samples collected in field with the corresponding bands measured by satellite, combining physicochemical analysis to quantify and model the quality of Andean soils caused by agricultural activity. This will be achieved by (i) compiling and analyzing the physicochemical parameters of soil based on quality standards for agricultural activities, obtaining indices that classify the soils based on their order (Andisol and Mollisol) and use, and (ii) determining whether the soil is associated with some of the physicochemical qualities considered, validating land use and order models based on field reflectance data, satellite reflectance, and physicochemical qualities.

There are several studies based on national models capable of predicting spectra limited in an infrared laboratory with statistical algorithm analysis [13,14]. In this research, with the use of these spectroscopic methods for the evaluation of soil quality, models were developed for estimating indicators based on the combination of soil order, land use, and physicochemical characteristics, using logistic regression analysis, linear discriminant analysis, and regression trees. The approach offers a method that derives the estimates using the ratio of laboratory/satellite spectra when the soil is well represented by the calibration samples used to build the predictive models [13]. Therefore, the performance of these local models can be used in other geographic spaces by incorporating the spectra into a dataset for that area [15].

Consequently, the combination of laboratory spectroscopy and multispectral images with environmental covariates is an adequate methodological alternative to obtain models that are adjusted for the prediction of the quality of Andean soils, independently of other methodological approaches that have been used [16–19].

#### **2. Materials and Methods**

#### *2.1. Study Area*

This study was done at Río Blanco basin, located in the Cayambe canton, Pichincha province, Ecuador (Figure 1), where agricultural activities are mainly related to the dairy– floricultural corridor via the cultivation of exportation products, such as roses and summer flowers (17.32 km2), livestock for milk production (399.201 km2), and, in smaller quantities, agricultural activity, especially high Andean crops (10.984 km2) [12]. Unfortunately, in Río Blanco basin, no studies have been aimed towards soil quality. The few studies available are related to land use and occupation, risks associated with the presence of the Cayambe volcano, agriculture [20], soil rehabilitation with Cangahua [21], and watershed management [22].

**Figure 1.** Geographical location of the study area and location of the soil sampling points (Google Maps).

The existing black and brown soils in the basin can retain high amounts of water and organic matter content, known as Mollisol and Andisol [10]. The soils have been influenced by the volcanic activity of the Cayambe during its genesis, causing slopes ranging from gentle to steep [7]. Vegetation coverage is characterized by the presence of cultivated grass (27%), herbaceous and shrubby moorland (45.6%), flowers and short-cycle crops (16.5%), and eucalyptus trees (8.1%) [10].

#### *2.2. Sentinel-2 Satellite Images*

The satellite imagery used for this research was acquired by the Sentinel-2 (S2) satellite constellation (2A and 2B Earth Observing Missions) launched in 2015 and 2016, which has been used extensively for monitoring land cover and vegetation [23,24]. The S2 satellites are identical and operate in a sun-synchronous orbit at a mean altitude of 786 km. The main S2 payload is a multi-spectral instrument (MSI), which is a push-broom sensor that registers the radiation reflected from the Earth passing through the atmosphere in 13 spectral bands distributed in four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution. Figure 2 depicts the range and spectral response functions of the S2A/S2B MSI instruments for these bands [25]. The S2 satellites have a swath width of 290 km. The visible bands (VIS) B1, B2, B3, B4 at 10 m resolution; near-infrared bands (NIR) B5, B6, B7, B8A at 20 m resolution and B8 at 10 m; and shortwave infrared (SWIR) B11 and B12 are most useful for retrieving geophysical surface parameters [26].

Meanwhile, the 60 m resolution bands are used for atmospheric corrections, which are of crucial importance for most EOS applications and permit the development and evaluation of robust atmospheric correction algorithms such as Sen2Cor. In the study area, located inside a swath overlap, the revisit frequency of each satellite is four to five days in an 11◦ forward-looking view angle; the presence of two identical satellites allows a geometric revisit time between two and three days, supporting near-continuous monitoring of vegetation and land surface processes [26].

The imagery used in this research was acquired on 16 July 2018 by the Sentinel 2B platform (see Supplementary Materials). The Level-2A product was used after the processing carried out in the Level-1C product available in the Open Access Hub of the European Space Agency (ESA) DataHUB server [27]. The identifier of the product is referenced as S2B\_MSIL1C\_20180716T153619\_N0206\_R068\_T17NRA\_20180716T202613. Furthermore, in the text, it will be referred to as S2BL2A.

**Figure 2.** S2A and S2B-MSI spectral response functions (SRF) [28].

#### *2.3. Soil Sampling*

Soil samples were collected during four field trips during the months of June, July, and August 2018. The samples were placed in hermetically sealed sleeves for physicochemical analysis (Figure 3a,b), and in two bulk density cylinders (Figure 3c) for spectroradiometry analysis. The cylinder lids were covered with geomembranes to keep the samples unaltered, as explained by Yánez and Arciniegas (2019) [7], to comply with the provisions of Section 2.5.

**Figure 3.** Soil sample collected in the field. (**a**) Shows site preparation. (**b**) Shows sample put in sealed sleeves for physicochemical analysis (**c**) Shows bulk density cylinders for spectroradiometry analysis.

A total of 36 surface soil samples (0–10 cm) were collected from the study site using core drilling, cylinders, a stainless-steel shovel, and a GARMIN GPSMAP 62sc handheld navigator (accurate to within ±4 m). After removing vegetation from the soil surface in a quadrant of approximately 30 × 30 cm, 1 kg of soil sample was collected from each sampling point. The number of samples was calculated by the type of composite sample according to Ecuadorian environmental standards (Book VI, Annex2) [29], establishing two samples per homogeneous zone, and one sample for zones ID02 and ID08, which presented collection problems.

#### *2.4. Physicochemical Parameter Measurements*

To determine the quality of soil, the physical, chemical, and biological components of the soil and their interactions must be considered; despite the different measurements possible, not all parameters are relevant for the soil in a particular scenario [30]. In this study, the physicochemical components were selected considering two criteria: The first was based on the reflectivity of the soils that are conditioned to organic matter, that interfere with the spectral curves [31], and the second according to Friedman et al. 2001 [32], who set a minimum number of indicators for agricultural activities, due to human use and management. The parameters measured were soil moisture, pH, electrical conductivity (CE), and organic matter (OM). Other parameters, such as heavy metals, could not be measured for economic reasons.

For the 36 soil samples, the analysis was carried out as follows: For soil moisture, pH, and conductivity, the methods established in NOM-021-RECNAT-2000 (Mexican official standard that establishes the specifications of fertility, salinity, and soil classification) were used [33]. The AS-05 gravimetric method was applied to soil moisture, the AS-02 electrometric method was applied to pH, and the AS-20 method was applied to electrical measure conductivity.

For the determination of OM, the soil samples were sent to the certified soil, foliar, and water laboratory of the Agency for the Regulation and Control of Phytosanitary and Zoosanitary (Agrocalidad) of Ecuador, where the volumetric PEE/SFA 09 method (Walkley Black's analytical method consisting of wet oxidation of the soil sample) was applied [34].

#### *2.5. Spectral Measurements in Laboratory and Satellite Image Sentinel-2*

Spectral analysis of the soil samples was carried out in the Ecuadorian Space Institute (IEE) laboratory, which facilitated the use of the ASD FieldSpec4 spectroradiometer (Analytical Spectral Devices). The equipment has a spectral resolution of 3 nm in the range of 350–1000 nm and 10 nm in the range of 1001–2500 nm. In this regard, the spectral bands of the MSI sensor onboard satellite S2 are within the spectral range of the ASD instrument.

In the laboratory, the soil spectra were collected using a small spectralon placed in the HiBrite MudLight device with an optical fiber connected to generate an artificial light source (Figure 4). The ASD FieldSpec4 spectroradiometer was used to obtain soil spectral data in the laboratory. The measurement protocol followed the methodology described in Figure 5 [35].

**Figure 4.** Location of cylinder with respect to HiBrite MudLight device on support.

**Figure 5.** Methodology for the generation of spectral signatures [35,36].

After the common configuration and control settings, i.e., output folders, connecting optical fiber, etc., the appropriate integration time was set given the lighting conditions to optimize the measurements. Subsequently, the dark current was also recorded. Then, a reference target or white reference (spectralon) was measured until a horizontal line with a reflectance value of 1 was presented. Laboratory soil spectra measurements followed these pre-operation phases, and the resulting measured spectra were processed with the corresponding software (ViewSpec Pro). An important issue that also needed to be addressed was that of the signal-to-noise ratio, or SNR, during measurements, which is related to the signal component. The spectra measurement procedure was performed using the principle of a continuous fiber optic cable, as specified in [36]. This technique has the advantage of significantly degrading the SNR, as it avoids interactions with other media between the recording device and the sample. Soil measurements were carried out by controlling the direction of the optical fiber to always point towards the target in order to avoid anisotropy effects. Subsequently, an average of all measurements was made to obtain the spectrum. Finally, the displays of the spectra that were selected were shown.

The S2BL2A product specified in 2.2 was used for the satellite image, with bottomof-atmosphere (BOA) reflectance, in which only the bands matching the S2 product were considered. The centroid of each pixel was determined to obtain the spectral values per band on a 20 × 20 m grid.

These laboratory spectroradiometry measurements and satellite images were used to establish indices (the table in Section 2.8.1) that compare the physicochemical parameters of the soil with the spectral bands of the S2-MSI sensor. Since it was agreed to only use the equipment supplied in the Laboratory of the Ecuadorian Space Institute, in-field spectroradiometry measurements were not possible. The measurements of the spectra were carried out only in the laboratory, as in [37–39], generating a different model to those already known to evaluate soil quality [40,41].

#### *2.6. Land Use/Soil Order Dataset*

Before the processing specified in Section 2.8, a dataset was built based on the in-field soil samples (spectra and physicochemical parameters) and reflectance per homogeneous area in the S2BL2A satellite image. These data consisted 345,408 observations as a product of the combination of two datasets and constituted the geographic population of the area under study. The data were then treated based on the combination of the variables "land order" and "land use", whose total set of observations was as follows (Table 1). The combinations shown in Table 1 allowed different models to be established based on soil order and land use.


**Table 1.** Soil samples classified according to the order of the soil and by land use.

\* A *páramo* is a fragile neotropical high mountain ecosystem. In Ecuador it has an average height of 3300 m above sea level.

Model 1 was established from the dataset by applying a simple random method of 5% of the population to compare what was provided by each sample until one dataset was left as a result of the information provided between one model and another being the same. The data were divided into training and testing groups. The training dataset consisted of 70% of the total number of observations in the sample, and the test dataset consisted of 30% of the total number of observations in the sample. With the dataset the model was elaborated and later executed.

Model 2 was developed from the dataset by selecting those corresponding to the Andisol soil order (Table 1), with 199,660 observations. Three categories were selected: Shrub, Páramo, and Pasture, leaving Forest out of the analysis due to its low frequency. Composite samples of 70% of each land use category were randomly selected, thus leaving a composite sample of 139,734 random observations, which were part of the training dataset. Therefore, the test dataset contained the remaining 30% of the 59,886 observations.

For the Mollisol order, considering the data's trend, based on the geographical distribution of the different land uses (Figure 6), it was observed that some of the soils had a particular use owing to their nature and population growth, among other characteristics of the area.

**Figure 6.** Geographical distribution of land uses.

The dataset corresponding to the Mollisol soil order (Table 1), with 145,748 observations, was considered to form Model 3 from a stratified random sample of 70% of the total. This allowed us to consider two subsets based on land use, resulting in two more models. The first subset (Model 3), called "Mollisol 1", was made up of Forest, Páramo, and Pasture; the second subset (Model 4), "Mollisol 2", was composed of Agriculture and Shrub. Each

training dataset represented a 70% stratified random sample based on the categories of Mollisol 1 (Table 2) and Mollisol 2 (Table 3). Consequently, the test dataset included the remaining 30% of the data. For Mollisol 1 it was 38,391, and for Mollisol 2, 5335.

**Table 2.** Training sample to obtain the discriminant function in the classification of land use of the Mollisol 1 order.


**Table 3.** Training sample to obtain the discriminant function in the classification of land use of the Mollisol 2 order.


#### *2.7. Homogeneous Zones*

The area of the Rio Blanco basin was extracted from the hydrological database of the Cayambe canton. Land use was extracted from the national productive systems database [10], and slope data and soil order data were obtained from a geopedology base map [10]. The multi-criteria technique was applied to create homogeneous zones [7] considering the following criteria: soil order and land use. This made it possible to establish a spatial analysis unit for correlation evaluation tests, determining the number of samples to be taken in the field. In total, within the Rio Blanco basin, 31 zones with similar characteristics were identified in previous work [7], of which 19 were analyzed for easy access in the study area (Figure 7).

**Figure 7.** Homogeneous areas analyzed in the study.

Using the physicochemical laboratory analysis results, the Thiessen polygons were calculated in each homogeneous zone from the soil-sampling points. Likewise, the topology of spatial relations (intersection and inside) was applied [42] with the spectral values per satellite band to identify the spatial relationship of the satellite spectral values concerning the physicochemical characteristics of each homogeneous zone [43].

The Mexican (NOM-021-RECNAT-2000) [33] and Ecuadorian (Book VI, Annex 2) environmental standards [29] were also considered to establish homogeneous areas as a

unit of spatial analysis. This allowed sampling points to be defined based on the criteria of slope, soil order, and land use, resulting in 19 homogeneous areas (Figure 7).

#### *2.8. Regression and Spatial Analysis*

To identify the relationship between the spectral behavior of soil in the laboratory and satellite images associated with the physicochemical qualities of soils, we proceeded to apply regression techniques based on the theory of machine learning, which makes them useful for predicting outcomes, identifying patterns, and making decisions with minimal human intervention [44]. Subsequently, discriminant analysis techniques, geostatistics, and non-parametric models (regression trees) were applied.

This study started by creating a logistic regression model (Figure 8) with the dependent variable "soil order" (Andisol, Mollisol), and as explanatory variables, the reflectance levels of the spectral behavior of soil in laboratory and satellite images, called Model 1, to later select those variables that were statistically significant, from which this model generated the soil index required (Table 4). The basic form of the logistic regression model is as follows (Equation (1)) [45]:

$$\text{logit}\,(\text{yy}) = \beta \text{o} + \beta \text{1} \times \text{1} + \beta \text{2} \times \text{2} + \cdots + \beta \text{ix} \,\text{i}\,\tag{1}$$

where

y: soil order dependent variables Andisol and Mollisol;

xi: independent variables (predictors): wavelengths of the spectroradiometer convoluted by the spectral response of soils in the laboratory (B04c, B05c, B06c, B07c, B08c, B08Ac) and bands corresponding to the MSI sensor of satellite S2 (B04s, B05s, B06s, B07s, B08s, B08As); βo: constant (intercept);

βi: coefficients of predictor variables.

**Figure 8.** Flowchart indicating the methodological steps implemented to create the model, indexes, and predictions of physicochemical parameters.


**Table 4.** Models and indices by soil order and soil order–land use.

Several logistic regression trials with the dependent variable of soil order were tested. The statistically significant variables (*p* ≤ 0.05) were verified until the most highly significant set of variables was obtained. The statistically significant model consisted of a combination of independent variables related to the reflectance levels of the soil spectra in the laboratory and satellite image.

The land use variable was related to the taxonomic classification of soil, so when considering only this variable, there was no way to separate the different land uses and generate an index that allowed us to estimate the physicochemical characteristics based on the land use variable of soil. To resolve this difficulty, the soil order variable was considered, and within each of these categories—Andisol and Mollisol—the land use variable that yielded the best classification levels by soil order was analyzed, as discussed in Section 2.6. Considering land use as the dependent variable, a statistical methodology known as linear discriminant analysis was applied (Figure 8) [46], which allowed a linear combination of variables to be identified that could be used to determine the group to which each individual belonged. In this case, the individuals were identified as homogeneous areas in which different soil samples were collected to develop Model 2, Model 3, and Model 4 (Table 4).

#### 2.8.1. Obtaining Index

The standardization of the coefficients from the logistic regression model (Model 1) followed a pattern where each coefficient was divided by the sum of its coefficients, in such a way that the sum of the coefficients of the index was equal to 1 (Equation (2)).

$$\text{index.ma.1} = \mathbf{k} + \text{cso}[\mathbf{i}] \* \mathbf{x} \mathbf{i} + \dots \tag{2}$$

where

k: constant;

cso[i] = standardized soil order coefficients;

xi: independent variables (predictors): wavelengths of the spectroradiometer convoluted by the spectral response of soils in the laboratory (B04c, B05c, B06c, B07c, B08c, B08Ac) and bands corresponding to the MSI sensor of satellite S2 (B04s, B05s, B06s, B07s, B08s, B08As).

From the linear discriminant function, the coefficients of the models were standardized by land use by soil order of the models: Model 2, Model 3, and Model 4, where each coefficient of the model was divided by the sum of its coefficients in such a way that the sum of the coefficients of the indices (Index 2, Index 3, Index 4) was equal to 1 Equation (3).

$$\text{index} = \text{cso}[1] \* \ge 1 + \dots \dots + \text{cso}[i] \* \ge \text{i} \tag{3}$$

where

cso[i] = standardized coefficients of land use by soil order;

xi: independent variables (predictors): wavelengths of the spectroradiometer convoluted by the spectral response of soils in the laboratory (B02c, B03c, B04c, B05c, B06c, B07c, B08c, B08Ac, B011c, B012c) and bands corresponding to the MSI sensor of satellite S2 (B02s, B03s, B04s, B05s, B06s, B07s, B08s, B08As, B011c, B012c).

In this case, the data were treated based on the combination of soil order and land use, and the entire set of observations is shown in Table 1.

An association analysis was applied between each physicochemical parameter and the indices, which determined no linear trend between the different pairs of variables that were compared. The geostatistical surface was created using the inverse distance weighted (IDW) method, which assumes that closer objects are similar to those far apart. Therefore, any unknown location will probably have an equal value to the nearest known locations [47]. It was possible to determine how the order of soil and land use were spatially distributed in terms of probability and to establish predictions of each physicochemical parameter through a set of non-parametric models known as decision trees [48], where the dependent variable can be categorical or numeric. The regression tree model was applied since the dependent variable was each of the calculated indices and was numerical, and the explanatory variables (covariate) were the order of the soil and the order–land use, together with each physicochemical parameter, considered as independent variables, as shown examples in Table 5, which are explained in Section 3.5. The space defined by the regression tree models, as part of a non-parametric analysis, consisted of dividing the predictor space into boxes (regions) [49]. For example, the areas were a function of Index 1 and the order of soil to estimate the physicochemical organic matter (OM) parameter to make the prediction. Consequently, a regression tree model was generated to describe the association between each index and each physicochemical parameter, as shown in Figure 8.


**Table 5.** Regression trees.

For a better understanding, the general methodological framework was divided into three parts, as shown in Figure 8. All statistical and graphical calculations were performed using RStudio software [50].

#### 2.8.2. Validation

The validity of Model 1 was tested by calculating the confusion matrix [49] to determine the classification error of the samples and the accuracy, together with the Kappa statistic to indicate the degree of agreement between the measured data and the predicted value by the model, concerning their order in Andisol or Mollisol, as well as the sensitivity and specificity, which indicate the probability of correctly classifying the soil samples from the model. The Kappa coefficient must be equal to zero. When the Kappa coefficient differs from 0, it means that the data obtained from the validation model, as predicted data, agree with the measured data used to generate the model. Sensitivity refers to the ability of a model to identify the order of soils. In contrast, specificity indicates the ability of the model to identify soil samples that do not correspond to the order of the soil to be classified.

In Model 2, among the different land uses of the order Andisol with each of the two sets of test data selected randomly, the classification errors were evaluated using the confusion matrix and the accuracy metric.

For Model 3, among land uses such as Forest, Páramo, and Pasture of soil order Mollisol, those that did not participate in the elaboration of the model needed to be classified based on Model 3 and have their corresponding confusion matrices calculated in order to determine the classification error and accuracy metric.

The same criteria applied for Model 4, between the agricultural and shrubland uses of the soil order Mollisol.

Once each model was validated, the effects of soil order and land use on each physicochemical parameter were determined; one-way analysis of variance (ANOVA) was used [49], with a dependent variable for each physicochemical parameter and an independent variable for the order and land use of the soil. Several ANOVA models were tested for soil order and other land use models within each order. For each case, the null hypothesis was that the mean of each physicochemical parameter is the same in each order of soil, or the mean of each physicochemical parameter is the same for each land use within each order, versus the alternate hypotheses indicating that at least one pair of mean values is different. The statistical decision criteria are based on a significance level of 5% (α = 0.05); thus, if the *p*-value is less than or equal to 0.05 the null hypothesis is rejected, and it can be concluded that the effect of the independent variable is significant in relation to the mean of the dependent variable. Otherwise, if the *p*-value is greater than 0.05, then the available data do not yield enough information to conclude that the independent variable is significant in relation to some variation of the dependent variable.

#### **3. Results**

#### *3.1. Physicochemical Analysis*

The different soil samples were analyzed using standardized physicochemical methods. The soil moisture results were in a range from 12.23% to 74.99%. The areas that predominated with the highest percentage of soil moisture were moors, and those with the lowest soil moisture were forest and shrub areas. The lowest water content was observed in Mollisol.

Regarding pH, the most acidic soils corresponded to undisturbed moors, whereas the least acidic soils corresponded to cultivated pastures. The range was 4.55 to 5.76, which, according to Mexican regulations, ranges from moderately acidic to strongly acidic, and according to Ecuadorian regulations, it would be out of range.

The electrical conductivity of the soils was within the limits established by both the Ecuadorian and Mexican regulations, <200 μS/cm and <1 dS/m, respectively. The areas with the lowest electrical conductivity were moors, whereas the highest electrical conductivity was observed in grasses.

For OM, the values ranged from 2.78% to 16.06%. According to Mexican standards, the zones range from very low to very high levels for soils of volcanic origin. The zone with the lowest OM percentage was forest, followed by passage areas, and the zone with the highest OM content was *páramo* areas.

#### *3.2. Analysis of Spectral Signatures*

In this section the behavior of each band was determined with respect to the intensity of reflectance of the soil samples. For each cylinder, two spectral measurements were made in opposite sections of the same tube, and for each soil sample 10 spectral measurements per section were averaged for a single representative spectrum per homogeneous area, which resulted in graphs (Figure 9a,b) as a function of reflectance and wavelength per sample in each homogeneous area.

**Figure 9.** Spectral measurements in the laboratory according to land use. (**a**) Shows the curves by homogeneous zones ID10, ID11, ID13, ID1, ID15. (**b**) Shows the curve by homogeneous zone ID16.

The reflectance of the spectra was graphically analyzed in the laboratory to determine the behavior of the soils related to their spectral signatures of the Andisol and Mollisol orders. The spectral signatures obtained in the laboratory presented a pattern related to the typical spectral signature of soils, ranging from the visible range (VNIR) to near-infrared (NIR) to short-wave infrared (SWIR).

The graph in Figure 9a shows the pasture curves, where sample ID11 of the Andisol soil order presented the same intensity of reflectance as sample ID14 of the Mollisol order, which were the highest compared to the other samples. Sample ID10 of the Mollisol order had a medium intensity of reflectance, unlike sample ID15 of the Mollisol order and sample ID13 of the Mollisol order with lower values of reflectance intensity. This variation in the curves is related to the properties and state of these soils [51], considering the variation of each of the land uses. Thus, in the graph in Figure 9b, sample ID16 of the Mollisol order, with agricultural land use, may indicate changes in the characteristics and status of agricultural use in the months of June, July, and August.

It can be said that the graphs made a difference in the behavior of the soil order based on the associated land use.

This could be related to the reflectance records of the Sentinel-2 satellite images (Figure 10a,b) to improve spectral differences by calculating soil order indices based on land use and physicochemical parameters, as explained in the next section.

**Figure 10.** Spectral signatures of Sentinel 2B. (**a**) Shows the spectral signatures of Pasture by homogeneous zones. (**b**) Shows the average spectral signature of Agriculture obtained in the homogeneous zone 16 of Mollisol soil.

#### *3.3. Development and Validation of Models Based on Soil Reflectance Levels in Laboratory and Satellite Image*

3.3.1. Model 1, by the Orders of Andisol and Mollisol Soils

From the logistic regression calculation, Model 1 was obtained, whose structure is shown in Table 6.

**Table 6.** Logistic regression of Model 1 based on the reflectance levels of the soil spectra in the laboratory and satellite image.


*p*-value of the model: *p* < 0.0001.

The coefficients of Model 1 were both positive and negative. This model was composed of explanatory variables, consisting of a combination of the spectral behavior of the soil in the laboratory and satellite image, with the particularity that there are reflectance levels related to the characteristics of red, red border, and near-infrared. Classical vegetation indices were composed, but in this case, the objective was to classify the order of the soil in Andisol and Mollisol. Furthermore, one of the independent variables was not significant (B08c), which did not influence the global significance of this logistic regression model (*p* < 0.0001).

Based on the training dataset, we obtained a confusion matrix (Table 7).


**Table 7.** Training dataset from Model 1 based on reflectance levels of soil spectra in laboratory and satellite image.

The confusion matrix indicated a training error of 3.5%, which means that the model was good for classifying soils in relation to their order in Andisol or Mollisol based on the spectral behavior of the soil in laboratory and satellite image, and satellite related to the red reflectance of any modality. The false-positive and false-negative coefficients were relatively low (Table 7), at 1.9% (2615/139,762) and 5.89% (6006/102,024), respectively. In other words, 2615 soil samples of the Andisol order were classified as Mollisol, and 6006 Mollisol soil samples were classified as Andisol. We then evaluated the model using a test dataset to describe the validation process.

Model 1 Validation

The diagnostic evaluation of Model 1, from the diagnostic statistics using the test dataset, was generally good because the accuracy, sensitivity, and specificity were above 95%. On the other hand, the *p*-value of the Kappa statistic (Table 8) was more significant than 0.05 (*p* > 0.05), indicating that the null hypothesis that the measurements obtained through Model 1 were equivalent to the real data is not rejected.

**Table 8.** Diagnostic statistics for the validation of Model 1.


3.3.2. Model 2 for Variable Land Use of the Andisol Order

As explained in Section 2.8.1, to classify land use according to the Andisol soil order, linear discriminant analysis was applied (Figure 8). The following results were obtained (Table 9).

**Table 9.** Median reflectance levels of soil spectra in laboratory and satellite image, according to the group defined by land use for the Andisol order.


The spectral values of the soil in the laboratory had greater weight in the classification of the different land uses than in the satellite image as a function of spectrometry in the laboratory and satellite image. Regardless of the sign, the coefficients of the soil spectral values in the laboratory were greater than the coefficients of the values in the satellite image. Consequently, the first component of this linear discriminant function explains that 96.5% of the total variability of the three different land uses had lower coefficients; even though the reflectance values in the satellite image were lower, these variables were important for the classification of land use as a function of the Andisol soil order. The first and second discriminant components were the linear combinations of the variables that best discriminate between the three land uses of the Andisol order, which in this case corresponded to the entire spectrum of soil in the laboratory and satellite image, respectively. Figure 11 shows the results of the soil classification based on the linear discriminant function model (Model 2).

The numbers 1, 2, and 3 represent the mean of each dataset. The means were quite separate, which implies a good classification of the land use of the Andisol order. In addition, based on the first linear discriminator, better discrimination was observed between the soils of Pasture and Páramo or between soils of Shrub and Páramo use than between the soils of Shrub and Pasture use. This situation could be because these land uses, in some cases, have relatively small neighboring units. Based on the training dataset, a confusion matrix was obtained (Table 10). In Figure 11 and Table 10, a good classification of the land uses of the Andisol order was observed, with a classification error of 0.51%.

**Figure 11.** Classification of land uses of the Andisol order.



#### Model 2 Validation

From the first group data for Shrub, Páramo, and Pasture land uses of the Andisol order, corresponding to the 30% that were not part of the model calculation, a confusion matrix was obtained (Table 11) from which a good classification of the land uses of the Andisol order was obtained, whose classification error was only 0.50% and accuracy 99.5%. Similar results were obtained for the second set of randomly selected data.


**Table 11.** Classification of land uses of the Andisol order based on Model 2 for a first test set.

3.3.3. Model for Variable Land Use of the Mollisol Order

(a) Model 3 for the variable of land use of the Mollisol order 1 from all wavelengths of the soil spectrum in laboratory and satellite image

The results were obtained from the application of LDA (Table 12).


**Table 12.** Coefficients of the linear discriminant function with dependent variable of the land use of the Mollisol 1 order and independent variables of reflectance levels of soil spectral behavior in laboratory and satellite image.

For the Mollisol 1 order, observing the coefficients of the linear discriminant function (Table 12), it resulted that the spectral behavior of soils measured in the laboratory exhibited a higher contribution to discriminate land uses Forest, Páramo, and Pasture for the Mollisol 1 order, compared to the coefficients derived from the satellite image. Consequently, the first component of this linear discriminant function (LD1) explained 70.9% of the total variability of the three different land uses, implying that although the reflectance values in the satellite image had lower coefficients, these variables were important for the classification of land use from the Mollisol 1 soil order. The first and second discriminant components were the linear combinations of the variables that best discriminated between the three Mollisol 1 land use types. Figure 12 shows a representation of the linear discriminant function for this particular case of Mollisol 1 land use, with a minimum overlap between Páramo and Pasture, with a classification error of only 0.47% and an accuracy of 99.52%.

**Figure 12.** Representation of the linear discriminant function: land use—Mollisol 1.

(b) Model 4 for the variable of land use of the Mollisol 2 order from all wavelengths of the soil spectra in laboratory and satellite image

The following results were obtained from the application of LDA (Table 13):


**Table 13.** Coefficients of the linear discriminant function with dependent variable of the use of soil of the Mollisol 2 order and independent variables of the reflectance levels of the soil spectra in laboratory and satellite image.

In relation to the coefficients of the linear discriminant function (Table 13), it was found that the soil spectral values measured in the laboratory had a higher contribution for the classification of the considered land uses compared to those of the satellite image. Regardless of the sign, the coefficients of the soil spectrum in the laboratory were higher than the coefficients of the spectrum in the satellite image. Consequently, the only component of this linear discriminant function explained 100% of the total variability of the two different land uses, which implies that even though the spectral values of the satellite image had lower coefficients, these variables were important for the classification of these land uses as a function of the Mollisol 2 soil order. Figure 13 represents a good classification of the uses of soils order Mollisol 2.

**Figure 13.** Representation of the linear discriminant function: land use—Mollisol 2.

Model 4 Validation

For the validation of Model 4, we tested 30% of the remaining data, called the test dataset, to classify the soils based on Model 4, and obtained the following confusion matrix, shown in Table 14.


**Table 14.** Classification of land use of the Mollisol 2 order based on Model 4 for the test dataset.

In Table 14, for the remaining 30%, a good classification of the land use was observed in Agriculture and Shrub for the Mollisol 2 order, considering the same behavior indicated in the training data.

#### *3.4. Index Development*

3.4.1. Index for Andisol and Mollisol Soil Orders from Model 1

According to the methodological process indicated in Section 2.8.1, the index.ma.1 (Mollisol Andisol Index) was obtained (Equation (4)):

$$\text{index.ma.1} = -0.21 - 272.64 \text{B04c} + 336.76 \text{B05c} + 328.37 \text{B06c}$$

$$-500.64 \text{B08c} - 0.13 \text{B08c} + 108.45 \text{B08Ac} + 0.20 \text{B04s} \tag{4}$$

$$+ 1.58 \text{B05s} - 2.11 \text{B06S} - 2.11 \text{B07s} + 0.29 \text{B08s} + 3.20 \text{B08As}$$

The index.ma.1 separates the soils according to their order into Andisol and Mollisol. If the index values are positive, they correspond to soils of the Andisol order; if index.ma.1 takes negative values, they correspond to soils of the Mollisol order. The descriptive statistics of index.ma.1 are shown in Table 15.



For the soils of the Andisol order, the mean level of the index was 0.2377 with a relatively low level of variability, equal to 0.1792. For soils of the Mollisol order, the mean level of the index was lower, −1.4961, presenting a higher level of variability equal to 0.5688, which can also be classified as a high level of variability. In this way, index.ma.1 classifies soils according to their order.

3.4.2. Indices Depending on the Variable of Land Use of the Andisol and Mollisol Orders

(a) Index for the Land Use of the Andisol Order from Model 2

The index obtained from the discriminant function of Model 2 was expressed as follows (Equation (5)):

$$\text{index}.2 = -22.24 \text{B02c} - 67.95 \text{B03c} - 149.75 \text{B04c} + 402.09 \text{B05c}$$

+ 438.95B06c − 705.75B07c + 108.45B08Ac + 0.20B04s

+ 29.83B11x − 23.95B12c − 2.67B02s − 2.49B03s (5)

+ 2.90B04s + 1.13B05s + 1.56B06 − 0.99B07s

+ 0.18B08s − 0.89B08As + 0.67B11s − 1.55B12s


The descriptive statistics of Index 2 are displayed in Table 16.

**Table 16.** Descriptive statistics of Index 2 for land use of the Andisol order.

For land use of the Andisol order, the mean level of Index 2 was higher in Páramo (0.54), with a level of variability equal to 0.06 (the table in Section 3.5.2). For the land use of the Pasture type of the Andisol order, the average level of Index 2 was −1.12, with a level of variability equal to 0.22 (the table in Section 3.5.2). In the use of bushland of the Andisol order, the mean level of Index 2 was −1.04, with a level of variability of 0.33. The level of variability of the groups defined according to the land use of the Andisol order was very different, representing the natural behavior of these variables.

(b) Index for the Land Use of the Mollisol 1 Order from Model 3.

The index obtained from the discriminant function of Model 3 is expressed as follows (Equation (6)).

> index.3 = 127.77B02c − 383.23B03c + 665.89B04c − 159.98B05c − 428.71B06c + 145.25B07c − 32.47B08c + 69.99B08Ac + 5.89B11c − 9.11B12c + 0.99B02s − 3.65B03s (6) + 2.51B04s + 0.92B05s − 0.54B06s − 0.64B07s + 0.16B08s + 1.014B08As − 0.84B11s − 0.21B12s

For land uses of the Mollisol 1 order, the mean level of Index 3 was higher in forest land use, at 0.79, with the highest level of variability, equal to 0.22 (the table in Section 3.5.3). For the use of páramo land of the Mollisol 1 order, the mean level of Index 5 was −0.06, with the lowest level of variability, equal to 0.12 (Table 17). The level of variability of the groups defined as a function of the land use of the Mollisol order was different, representing the natural behavior of these variables.



(c) Index for the land Use of the Mollisol 2 Order from Model 4

The fourth index obtained from the discriminant function (Model 4) was obtained by standardizing the coefficients of this model. Each coefficient of Model 4 was divided by the sum of its coefficients in such a way that the sum of the coefficients of Index 4 was equal to 1, obtaining (Equation (7)):

index.4 = − 27.40B02c + 38.90B03c + 65.66B04c − 177.93B05c + 106.03B06c + 3.05B07c + 14.15B08c − 21.51B08Ac − 0.34B11c + 0.35B12c + 0.02B02s + 0.02B03s (7) − 0.05B04s − 0.03B05s − 0.04B06s + 0.02B07s

− 0.002B08s + 0.02B08As − 0.01B11s + 0.08B12s

For land uses of the Mollisol 2 order, the mean Index 4 was higher in agricultural land use, at 0.12, with the highest level of variability, equal to 0.02 (Table 18). For the use of shrub soil of the Mollisol 2 order, the mean level of Index 6 was −0.07, with the lowest level of variability, equal to 0.01 (Table 18). The level of variability of the groups defined as a function of the land use of the Mollisol order was different, representing the natural behavior of these variables.

**Table 18.** Descriptive statistics of Index 4 for the land use of the Mollisol 2 order.


*3.5. Regression Tree Models to Define Association between Indices and Physicochemical Parameters* 3.5.1. Regression Tree Models of Physicochemical Parameters as a Function of Soil Order through Model 1 (index.ma.1)

The first model predicted the values of index.ma.1. as a function of the covariates, soil order, and physicochemical parameters. An example with soil moisture is presented, where for soils of the Andisol order, the mean of the index.ma.1 (Figure 14) is equal to −0.134. If the soil moisture (HU) is greater than or equal to 36.7 for soils of the Andisol order, the predicted value of the index.ma.1 is on average equal to 0.109 (i = 0.109) for *n* = 74,000 soil samples. However, if the soil moisture is less than 36.7, the predicted value of index.ma.1 is on average equal to 0.382 (i = 0.382), for *n* = 65,700 soil samples. The same interpretation for soils of Mollisol order.

Likewise, if the value of the index is positive, it corresponds to an Andisol soil order and negative to a Mollisol soil order. The predicted moisture value is at least 36.7.

3.5.2. Regression Tree Models of Physicochemical Parameters as a Function of Land Use of the Andisol Order through Model 2 (index.2)

As shown in Table 19, it was possible to obtain the effect of land use of the Andisol order on the physicochemical parameters.

**Table 19.** Application of one-way ANOVA to determine the effect of land use of the Andisol order on PFQ.


An example of the non-parametric regression tree model is presented below (Figure 15), with the dependent variable as index.2 and the independent variables as soil use and organic matter (OM) for soils of the Andisol order (Table 20) (ARUSAMO).

**Figure 14.** Moisture Regression Tree Model for Soil Order Andisol-Mollisol (ARMAH).

**Figure 15.** Regression tree for Index 2, Andisol soil, and OM.

**Table 20.** Classification of land uses of the Andisol order according to the mean of organic matter.


3.5.3. Regression Tree Models of Physicochemical Parameters as a Function of Land Use of the Mollisol Order through Model 3 (index.3)

In Figure 16, the regression tree model of index.3 can be observed as an example in the function of land use of soil order Mollisol 1 and organic matter, which are related based on Table 21 (ARUSM3MO).

**Figure 16.** Regression tree for index 3: land use of Mollisol 1 and OM.

**Table 21.** Classification of the soil samples of the Mollisol 1 order according to the use of the soil by the mean of OM.


OM was greater in Páramo (≥8.6%) than Forest and Pasture, with a misclassification of 22% for Forest and 17% for Pasture (Table 21).

3.5.4. Regression Tree Models of Physicochemical Parameters as a Function of Land Use of the Mollisol Order through Model 4 (index.4)

The land use regression tree model for soil order Mollisol 2 and OM, which are related based on Table 22 (ARUSM4MO), are shown in Figure 17.

**Table 22.** Classification of the soil samples of the Mollisol 2 order according to the use of the soil by the mean of OM.


**Figure 17.** Regression tree for Index.4: land use of Mollisol 2 and OM.

OM was greater in Shrub (≥6.1%) than Agriculture (<6.1%), with a misclassification of 1% for Shrub (Table 22).

#### **4. Discussion**

The results of this study allow for a description of the correlation between the physicochemical parameters with index.2 (Andisol), index.3 (Mollisol 1), index.4 (Mollisol 2), according to the soil order–land use homogeneous zones defined in Section 2.7 and based on the criteria of Zebrowski 1997 [52]. In the case of the Mollisol order soil, the prediction values for Páramo were ≥8.6% (Table 21), which maintained the characteristic behavior of the Ecuadorian Andean zone, as cited by Podwojewski (1999) [53]. On the contrary, for Forest and Pasture, the prediction models presented a behavior with a lower value in organic matter (<8.6%) (Table 21), a less acidic pH and lower soil moisture percentage, and a higher electrical conductivity [10]. This behavior shows the effects of the impact of human activity, with a lesser value of OM in the Agriculture land use (<6.1%) (Table 22).

The results presented in this study differ from other studies that compared different classification techniques using Sentinel-2 images [18,54], or considered the capacity of satellite observations to monitor and determine the state of the vegetation due to environmental stress factors by evaluating vegetation and chlorophyll indices [1].

Unlike other methodological approaches [17,55–57], this study demonstrates that the combination of laboratory spectroscopy and multispectral images with environmental covariates is an adequate alternative to establish spatial analysis models to predict the quality of Andean soils in terms of physicochemical variables such as CE, OM, pH, and HU. For this purpose, performing soil order–land use associations was revealed to be an important possible tool for assessing the accomplished predictive models.

(1) Performance of the Models

Equation (4) shows the distributions of the logistic regression coefficients in the R-NIR spectral range for soil order, with low false-positive (1.9%) and false-negative (5.89%) coefficients. For the Andisol soil order, the mean level of the index (index.ma.1) was 0.2377, with a level of variability equal to 0.1792 (Table 15). For the Mollisol soil order, the mean level of the index (index.ma.1) was −1.4961, with a level of variability equal to 0.5688 (Table 15). The index values ranged from approximately −6 to 2, with some outliers below −4 and a very low frequency of occurrence. This corroborates the potential of promoting soil studies based on laboratory spectral data and remote sensors, such as Ali Aldabaa et al. (2014) [19], who evaluated the feasibility of the methods for the prediction of soil surface salinity by visible near-infrared diffuse reflectance spectroscopy (VisNIR) and remote sensing (RS). Equations (5)–(7) show the distributions of the coefficients of

the linear discriminant function in the VIS-NIR-SWIR spectral range for the order–land use both in laboratory and S2 configurations. For land uses of the Andisol order, the level of variability of the defined groups was very different, which represents the natural behavior of these variables (Shrub, Páramo, Pasture), which, unlike previous research on the VIS-NIR, presented greater sturdiness considering SWIR.

#### (2) Predictions of Physicochemical Parameters

The prediction performance of the R-NIR model, based on the Student *t*-test with *p* < 0.0001 for OM, CE, pH, and soil moisture, shows that the mean of each parameter in the Andisol and Mollisol soil order were different, concluding that the mean of each one was lower for the Mollisol soil order, unlike CE, where its mean was higher in Mollisol.

Regarding the results obtained from the VIS-NIR-SWIR or full spectrum model, using non-parametric regression tree models, excellent results were obtained for OM, pH, CE, and soil moisture as explanatory variables of order–land use [57]. For Mollisol 1, the 95% confidence intervals for the difference in means for the set of physicochemical parameters (CE, OM, pH, HU) were negative for Pasture and Páramo, and for Forest. This means that on average the given set of parameters had higher values in Forest. For the Mollisol 2 soil type, the 95% confidence intervals for the difference in means for the considered set of physicochemical parameters were negative for Shrub and positive for Agriculture. For Andisol-type soils, the 95% confidence intervals for the difference in means for OM in Páramo are higher than the average OM in Shrub. Similarly, but in an opposite direction, when comparing the mean OM in Pasture and Páramo land uses (*p* = 0.0000 < 0.05), the 95% confidence interval for the mean difference was negative, which implies that the average OM in soils used for pasture was less than the average OM in soils used for Páramo. Very similar results were obtained in relation to the pH physicochemical parameter, and in relation to CE and HU in all pairs of established comparisons there were statistically significant differences.

In the methodological process, the nonparametric regression tree method was successfully applied to predict the values of the model covariates by soil order or land use order (Figure 8). This statistical analysis methodology differed from those applied to date, like Adeline (2017) [41], Bao (2017) [40], Soriano-Disla (2014) [17], and Ali Aldabaa (2014) [19], where it was established that soil properties were derived from reflectance spectra that can be applied from various sources of spectral measurements, such as measurements in the laboratory, in the field, or from remote sensing systems.

These regression tree models were more flexible than those presented by Hill (2011) [55], because they did not consider non-compliance with statistical assumptions such as normality or collinearity problems between predictor variables. The regression tree models allowed for approximate estimates supported by 95% confidence intervals as a measure of the variation range of each physicochemical parameter, allowing for a reading of this from the top to the final nodes and vice versa, which was not possible in other applied models (Figures 14–17; Tables 21 and 22) [19,55,56].

Soil quality and soil degradation are crucial to develop sustainable agricultural activities [58]. The usual methods for environmental soil monitoring are very labor intensive and costly to cover large areas of land [1,13]. Satellite data in this field open up new research opportunities with great applications, as large areas of land can be analyzed and soil quality can be assessed in areas that are difficult to access [6,51].

Finally, this study is very valuable for the Ecuadorian Andean region for soil sustainability. Additionally, the results obtained in this study could be adapted in future research to other geographical regions after reviewing the soil order and land use that allow the relationships observed in the proposed model indices to be confirmed.

#### **5. Conclusions**

Soil quality is an important factor in sustainable land management. Its evaluation allows for the development and implementation of sustainable agriculture management techniques. Thus, in this study, an alternative method for the prediction of the parameters OM, CE, pH, and soil moisture based on the R-NIR and VIS-NIR-SWIR models is presented to demonstrate its applicability in the Ecuadorian Andean region. For this purpose, logistic regression analysis and linear discriminant function analysis were used. This required the establishment of homogeneous zones defined by soil order and land use combinations to design and implement soil-sampling strategies and field–satellite spectral measurements. The findings of this study suggest that soil + RS spectroscopy is a useful technique to predict soil properties, presenting good potential as an impetus towards future soil studies. According to the results of this study:


Therefore, because of the achieved results, the proposed methodology might be applied to other regions and adapted to predict soil properties as a function of the site-specific soil order and land use properties. Future research should explore the variability of soil quality parameters geographically with the aim of building regional models.

**Supplementary Materials:** The Sentinel 2 dataset used in this study can be downloaded from https://scihub.copernicus.eu/dhus/ (accessed on 20 August 2018).

**Author Contributions:** S.A.-O.: conceptualization, methodology, fieldwork, software, validation, formal analysis, visualization, writing and original draft. I.M.: conceptualization, writing, review, and editing. C.G.-A.: writing, review, and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The first author thanks Maria Eugenia Jacome for the logistical support for taking soil samples in the field.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Review* **Current Status and Development Trend of Soil Salinity Monitoring Research in China**

**Yingxuan Ma 1,2,3 and Nigara Tashpolat 1,2,3,\***


**Abstract:** Soil salinization is a resource and ecological problem that currently exists on a large scale in all countries of the world. This problem is seriously restricting the development of agricultural production, the sustainable use of land resources, and the stability of the ecological environment. Salinized soils in China are characterized by extensive land area, complex saline species, and prominent salinization problems. Therefore, strengthening the management and utilization of salinized soils, monitoring and identifying accurate salinization information, and mastering the degree of regional salinization are important goals that researchers have been trying to explore and overcome. Based on a large amount of soil salinization research, this paper reviews the developmental history of saline soil management research in China, discusses the research progress of soil salinization monitoring, and summarizes the main modeling methods for remote sensing monitoring of saline soils. Additionally, this paper also proposes and analyzes the limitations of China's soil salinity monitoring research and its future development trend, taking into account the real needs and frontier hotspots of the country in related research. This is of great practical significance to comprehensively grasp the current situation of salinization research, further clarify and sort out research ideas of salinization monitoring, enrich the remote sensing monitoring methods of saline soils, and solve practical problems of soil salinization in China.

**Keywords:** soil salinization; saline soil treatment; remote sensing monitoring; model construction

### **1. Introduction**

Soil salinization refers to the accumulation of soluble salts in soil caused by certain natural factors such as climate, hydrology, and topography or caused by the combination of destructive human factors and fragile ecological environments, thus leading to the deterioration of soil quality to form saline soils [1]. Soil salinization, as a resource and ecological problem that currently exists on a large scale in all countries of the world, is one of the main types of land desertification and soil degradation [2]. It seriously restricts the production and development of the agricultural industry, the sustainable use of land resources, and the security and stability of the ecological environment. Saline soils are a collective term for all types of soils that are negatively affected by saline components. The unique physicochemical-biological properties of saline soils have a variety of negative impacts. These include reduction in soil fertility and productivity levels, reduction in crop yields and harvests [3,4], waste of agricultural resources, destabilization of the ecological environment, and other secondary hazards [5]. Therefore, strengthening the management and utilization of salinized soils, monitoring and identifying accurate salinization information, and mastering the salinization level of regional arable farmland have been important goals for scientists to research and overcome.

**Citation:** Ma, Y.; Tashpolat, N. Current Status and Development Trend of Soil Salinity Monitoring Research in China. *Sustainability* **2023**, *15*, 5874. https://doi.org/10.3390/ su15075874

Academic Editors: Blaž Repe, Mario Elia and Antonio Ganga

Received: 23 February 2023 Revised: 22 March 2023 Accepted: 24 March 2023 Published: 28 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The total area containing saline soils worldwide is currently about 1.1 × <sup>10</sup><sup>9</sup> hm2, which is widely distributed in more than a hundred countries and regions around the world, and the global soil salinization level is still showing a rising trend [6]. The total area of saline soils in China has reached 3.69 × 107 hm2, which is close to 4.88% of the available land area in China [7]. It is mainly distributed in arid and semi-arid regions and coastal areas with arid climate and little rainfall, high soil evaporation, a high groundwater table, and more soluble salts [8–10]. Examples include semi-humid regions such as the Yellow River Basin in North China, the plains in northeast China, and arid and semi-arid regions in northwest China such as Gansu, Ningxia, and Xinjiang. Therefore, study of the spatial distribution and management of soil salinity prevention as well as improvement of monitoring accuracy and early warning capability are gradually becoming a hot spot of concern in the field of saline soil research today.

In order to explore the current research status and current research hotspots of soil salinity monitoring, this paper searched in the China National Knowledge Infrastructure (CNKI) and the Web of Science (WOS) databases using "soil salinity monitoring" as the key search term. CiteSpace software was used to perform keyword co-occurrence analysis on the large number of highly relevant literature datasets obtained from the search (Figures 1 and 2). By extracting the frequency distribution of keywords that express the core content of the literature, a co-word matrix is thus generated based on the keyword matrix. The co-word matrix was visualized as a network to study the development trends and research hotspots in the field of salinity monitoring. The core nodes in the figure can fully reflect the focus and branches of research in the field in recent years. The size of the node represents the frequency of the keyword; the larger the node, the more frequently the keyword appears and the higher the relevance to the topic. Among them, Figure 1 shows the keyword co-occurrence analysis graph based on the relevant research articles in the CNKI database. The analysis shows that the nodal framework consisting of "saline soil", "remote sensing monitoring", "hyperspectral", "arid zone", "multisource remote sensing", and "salinity index" appears more frequently and has stronger correlations among the research articles published in Chinese database. Figure 2 shows a keyword co-occurrence analysis graph based on the relevant international research articles in the WOS database. It can be seen that the nodal framework consisting of "soil salinity", "model", "spatial distribution", "change detection", "prediction", and "remote sensing" appears more frequently and has stronger correlations among the research articles published in international databases. These keywords provide important information for us to analyze the progress of research on soil salinity monitoring in China, and they are the focus of our attention.

**Figure 1.** Keyword co-occurrence mapping based on the literature related to soil salinity monitoring in the China National Knowledge Infrastructure (CNKI) database.

**Figure 2.** Keyword co-occurrence mapping based on the literature related to soil salinity monitoring in the Web of Science (WOS) database.

This paper reviews the developmental history of saline soil management research in China, discusses the research progress of soil salinity monitoring, and summarizes the main modeling methods for remote sensing monitoring of saline soils. Based on a large amount of domestic and foreign soil salinization research, this paper combines the real needs and frontier hotspots of the country in related research, proposes limitations of soil salinization monitoring research in China, and analyzes the future developmental trend of soil salinization monitoring research in China. This is of great practical significance to comprehensively grasp the current situation of salinization research, further clarify and sort out the research ideas of salinization monitoring, enrich the remote sensing monitoring methods of saline soils, and solve practical problems of soil salinization in China.

#### **2. Research History and Importance of Saline Soil Management in China**

As an important actual and potential arable land resource in China, saline soils have strong development and utilization value. Different types of saline soils can be managed and improved in terms of their physicochemical and biological properties by using various types of effective soil improvement tools and other comprehensive measures, thus improving soil quality and productivity levels [11]. Theoretical and technological research on saline soil management in China has been steadily developing. The country attaches great importance to the treatment and utilization of saline soils, policy research, and technological innovation. In the 1950s, the State organized a lot of research on saline resources and mastered the situation of many different regions and different types of saline lands [12–14]. In the 1990s, researchers started to study the regional water and salt movement of saline soils and its regulation and management on the basis of the regional water table, water quality, and a soil water and salt co-forecasting model, and this helped facilitate the improvement of saline soils [15,16]. Thus, during the 20th century, Chinese research in the field of soil salinization focused on research and classification, investigation of causes, and improvement and prevention. Several provinces have conducted focused analyses for regional saline soils [17,18] and have conducted in-depth research while gaining a basic understanding of saline soils, laying the foundation for future monitoring and management of saline soils.

In the 21st century, during the 11th Five-Year Plan, the Chinese Academy of Sciences, together with the relevant domestic forces, organized and implemented the "Research and Demonstration of Supporting Technologies for the Efficient Utilization of Saline Soil

in Agriculture", which is a public welfare industry special project for the whole country, and carried out comprehensive research on saline soil management [19,20]. During the 12th Five-Year Plan period, the report of the Chinese Academy of Sciences, "The National Demonstration of Saline Land Management Technology", was received by the national leaders [21–24]. During the 13th Five-Year Plan period, China deployed the national key project of "Typical Fragile Ecological Restoration and Protection Research", which has strongly driven enthusiasm and encouraged researchers to publish related research articles [25–27]. Internationally, the theme of the 8th World Soil Day (WSD) in 2021 is "Preventing Soil Salinization and Improving Soil Productivity" [28]. This theme aims to raise awareness of soils, strengthen national research capacity, and work together to preserve the Earth's environmental carrying capacity. Summarizing the research history of soil salinization management in China since the beginning of the 21st century, it can be seen that during this period, China started to focus on advancing the theory and technology of salinization prevention and control. Comprehensive measures have been applied to manage salinized soils and improve their physicochemical and biological properties. The potential of saline soils as an important arable resource has been fully exploited.

At present, with policy support and implementation of various departments at the national level, research and development of saline soil management and monitoring technology in China has achieved certain results. The research not only covers the direction and content of international saline soil research but also highlights domestic characteristics, with a richer connotation, broader coverage, and more common interdisciplinary association [29]. This further indicates that the nation has recognized the importance and necessity of saline soil treatment to ensure food security and promote ecological stability.

#### **3. Advances in Soil Salinity Monitoring Research**

The literature in CNKI and WOS databases was searched based on the similarity of literature keywords. A keyword clustering analysis was performed on the retrieved literature datasets related to soil salinity monitoring (Figures 3 and 4). Keyword clustering analysis is the process of analyzing the set of keywords extracted from the literature into multiple categories consisting of similar objects. There are many different algorithms for clustering analysis, and this study chose Logarithmic Likelihood Ratio (LLR) as the base calculation for the analysis. The labels of each cluster are the key keywords in the co-occurrence network. Based on this, closely related keywords are clustered. The higher the ranking of the cluster number, the more keywords are included in the cluster. Conversely, the more backward the ordinal number, the fewer keywords are contained in that cluster. The modularity value of the clustering metric Q ranges 0–1. The larger the value, the better the clustering effect. Usually, when Q is less than 0.3, it indicates that the literature data set analyzed by this clustering is not well structured. In Figure 3, a value of Q = 0.8293 was obtained from cluster analysis of the literature in the CNKI database. In Figure 4, a value of Q = 0.7128 was obtained from cluster analysis of the literature in the WOS database. This indicates that the data collected in the Chinese literature database and the international literature database were reliable and the keyword clustering analysis structure was significant.

A total of 16 clustering tags were obtained from the keyword clustering analysis based on the CNKI database (Figure 3). Among the top seven clusters, three of them are related to remote sensing monitoring, namely "quantitative remote sensing", "remote sensing monitoring", and "monitoring models". A total of 10 clustering tags were obtained from the keyword clustering analysis based on the WOS database (Figure 4). Among them, both "machine learning" and "feature space" are specific remote sensing monitoring modeling methods. Based on these methods, the researcher explores the "change detection", "spatial distribution", and "evolution trend" of saline soils in China, making full use of the advanced remote sensing information technology. This demonstrates that to achieve the purpose of saline soil management and utilization on a large scale, it is an important prerequisite to use scientific means to quickly and accurately grasp the information of

saline soil distribution and to clarify the spatial and temporal variability characteristics of salinization [30]. With the decades of continuous exploration and practice of soil salinization research in China, there are more and more methods and means of soil salinization monitoring. In summary, they can be divided into two main categories: (1) traditional field investigation and experimental methods and (2) modern remote sensing information technology monitoring methods.

**Figure 3.** Keyword clustering mapping based on the literature related to soil salinity monitoring in the CNKI database.

**Figure 4.** Keyword clustering mapping based on the literature related to soil salinity monitoring in the WOS database.

#### *3.1. Field Investigations and Experiments*

The field survey is to select test areas in the field where salinity characterization exists and to obtain visual information on soil salinity from soil samples to provide an accurate and reliable data source for the study. During the sampling process, the soil sample points should be evenly distributed throughout the work area. The sample points should also be divided into typical sampling areas containing four different landscapes, vegetation cover, soil types, etc. that are more representative. For soil samples collected at different depths, soil conductivity can be measured using the EM38 Geodetic Conductivity Probe [31]. The content of each ion in the soil can also be detected, and the corresponding total soil salt content can be calculated [32]. We can also use dry weight and wet weight to determine soil water content as well as soil evapotranspiration [33], and we can use a portable spectrometer to obtain spectral curve data from different sampling points [34]. In this way, we can achieve the purpose of extracting information on soil salinization.

The field survey method of collecting soil samples in the field and then analyzing them has become a basic monitoring tool for accurate information on soil salinization [35]. Kai Deng et al. [36] established a linear mixed model based on the linear relationship between magnetic susceptibility apparent conductivity and actual measured soil salinity to assess the spatial distribution of salinity in the soil profile. Wenping Xie et al. [37] constructed a multiple regression model between magnetic susceptibility geodetic conductivity and soil multiple regression models between geodetic conductivity and soil salinity to quantitatively assess the spatial and temporal evolution of soil salinity in the estuary over the past decade. Yasenjiang Kahaer et al. [38] performed indoor hyperspectral measurements and conductivity measurements on soil samples obtained from fieldwork and established a hyperspectral estimation model of soil conductivity after screening parameters. Finally, effective monitoring of soil salinity was achieved.

By analyzing the results of scientists' investigations, we can find that field surveys and experimental methods can extract saline soil information very precisely. However, this method is mainly based on manual point-by-point examination, which is less efficient and difficult to obtain the salinity variation characteristics of large areas at a macroscopic scale [39–41]. Especially in the study area where the vegetation cover is complex and the natural environment is harsh, the number of monitoring stations is not sufficient, and the difficulty of field investigation is increased. All these problems render the traditional monitoring method of field surveys insufficient to meet research needs [42,43].

#### *3.2. Remote Sensing Information Technology Monitoring*

Among the many soil physicochemical parameters, soil salinity content is the primary parameter for measuring salinization. The higher the soil salinity, the higher the risk of soil salinization. When the accumulation of salts in the soil exceeds a certain level, it leads to weakening of the bond between soil particles, loosening of soil structure, reduction of soil fertility, and even directly affects the survival of vegetation [2,44]. At the same time, there is a close relationship between soil conductivity and soil salinity. Soil conductivity is a measure of the ability of ions in the soil to conduct electricity, and it reflects the amount of dissolved ions in the soil, including salt ions and other dissolved substances. Similar to soil salinity, conductivity can be an important factor in measuring salinity [45,46]. Moisture in the soil can also largely reflect the salinity of the soil. Salt moves with water, and soil salts are prone to shift with changes in moisture. Under strong evaporation, salts in groundwater and deep soil rise to the surface along soil capillaries and accumulate, resulting in salinization of regional soils [47–49]. Additionally, the heavy metal content in soil also interacts with soil salinization and constrains it. Some heavy metal elements, such as cadmium and lead, can affect the growth of soil microorganisms and form insoluble complexes when combined with salts. They can reduce the activity of salts in the soil, thus affecting the process of conversion and circulation of salts in the soil and aggravating the degree of soil salinization [50,51].

Different levels of soil salinity, water content, and some heavy metals can be distinguished by using the different spectral reflectance of remote sensing images for different features. The multidimensional combination of different bands in spectral images can also construct a variety of model indices that can monitor soil salinization. Therefore, the use of remote sensing to track physicochemical parameters such as soil salinity, conductivity, water content, and heavy metals can all be effective in monitoring saline soils. In recent

years, remote sensing information technology has been innovating, and the spectral resolution and spatial resolution of remote sensing images have been increasing. Efficient, convenient, and large-scale means of monitoring soil salinity have been rapidly developed. The method of monitoring salinity based on remote sensing images is gradually becoming common and is rapidly developing into an important tool for studies such as soil salinity information extraction, monitoring, and forecasting [52–54].

A keyword timeline analysis was performed on the literature datasets related to soil salinity monitoring from 1981 to 2022 and 2002 to 2022 retrieved from the CNKI and the WOS databases, respectively (Figures 5 and 6). The keywords in the figure are spread out in the clusters they belong to according to the chronological order in which they appear in the corresponding years, showing the development of keywords in each cluster. The size of the keyword node represents the frequency of the keyword occurrence, and the warm and cold colors of the node periphery represent the emergence and the duration of the keyword. The larger the keyword node, the warmer the color of the edge of the node, the more frequently the hotspot appears, and the longer it lasts. Conversely, the smaller the node, the cooler the color of the node edge, the less frequently the hotspot appears, and the shorter the duration of the hotspot. The analysis of the development of soil salinity monitoring showed that in the mid-1990s, results for "dynamic monitoring" clustering began to appear in the Chinese literature database, and attention was focused on "remote sensing monitoring" (Figure 5). In the 21st century, "machine learning" and "feature space" methods for monitoring "soil salinity" and "soil moisture" have begun to appear in the international literature database (Figure 6). While a large number of studies on soil salinity monitoring based on remote sensing have gradually emerged, keyword nodes such as "remote sensing technology", "hyperspectral", "radar remote sensing", "feature space", "inversion models", "random forest", "classification", and "index" have also begun to appear in other clusters one after another. There are a large number of keyword links between them, both within and across clusters. These remote sensing monitoring-related research terms link the whole clustering trend and become important research hotspots in domestic and international literature databases.

Based on data analysis and processing of multiple remote sensing images, Yanhua Li et al. [55] established a soil salinity monitoring index model using Landsat-TM multispectral remote sensing image data to invert and estimate soil salinity in the Weigan River-Kuche River basin. Also using field collection samples and Landsat8 image data, Mingkuan Wang et al. [56] collected soil samples in the key study area in the Yellow River Delta region in the field and acquired simultaneous phase Landsat8 image data. They constructed multiple models for remote sensing inversion of soil salinity and inversion of the spatial distribution of soil salinity in the study area based on the optimal model. The results showed that the relationship between reflectance of remote sensing images and soil salinity content is not purely linear, and the constructed salt estimation model can better simulate the relationship between soil salinity and spectral data. Yanling Li et al. [39] established a machine learning and statistical regression model based on the fusion of multispectral and hyperspectral images, which significantly improved the accuracy of salt inversion. Similarly, Wumuti Aishanjiang et al. [57] matched field-measured hyperspectral data with WorldView-2 remote sensing images to improve the prediction accuracy and mapping accuracy of soil salinity. The quantitative inversion model developed in this paper considering vegetation and moisture does not require complex parameters. To a certain extent, it meets the needs of salinity monitoring in arid and semi-arid regions. This can promote further applications of high-spatial-resolution satellites such as WorldView-2 in salinity monitoring. In the meantime, some scientists have also used radar remote sensing data combined with soil moisture and pH factors to achieve predictive inversion of soil salinity. In terms of the variability and correlation used to reveal soil properties, M. Samiee et al. [58] used geostatistical methods to simulate the spatial correlation of soil salinity, and the results were used to predict the spatial distribution of soil properties by spatial interpolation methods. In addition, more and more kinds of remote sensing

images have been applied to salinity monitoring. For example, Junying Chen et al. [59] used unmanned aerial vehicle (UAV) aerial imagery and high-resolution remote sensing imagery to construct a salinity inversion model and used an improved scale conversion method to achieve soil salinity monitoring at the ascending scale. Predictive inversion of soil salinity was achieved by using radar remote sensing data combined with soil moisture and pH factors by Zaytungul Yakup et al. [60]. The monitoring model developed in this paper does not need to consider complex dielectric constants. This can meet the needs of soil salinity monitoring to a certain extent and promotes the application of Phased Array type L-band Synthetic Aperture Radar data in soil salinity monitoring. Yumei Li et al. [61] focused on the current status of the application of lidar three-dimensional remote sensing observation technology in the three-dimensional dynamic monitoring of various natural resources. The paper provides a comprehensive analysis of the potential and limitations of lidar applications in natural resource surveys. These authors are also considering how to combine multi-source, multi-scale, and multi-platform remote sensing data with artificial intelligence. The goal is to build an integrated "sky-air-ground" natural resources survey and monitoring technology system. This is the developmental direction of future methods for three-dimensional dynamic monitoring of natural resources.

**Figure 5.** Keyword timeline mapping based on the literature related to soil salinity monitoring in the CNKI database from 1981 to 2022.

**Figure 6.** Keyword timeline mapping based on the literature related to soil salinity monitoring in the WOS database from 2002 to 2022.

New theories and technologies in image processing [62,63], machine learning [64], remote sensing monitoring [65], and other research fields are gradually developing and innovating. By summarizing a large number of domestic and foreign researchers' studies in recent years, it is clear that research using satellite remote sensing data to extract salinity information by classification or soil salinity inversion by multiple spectral indices has further deepened [66,67]. At the same time, more researchers are also focusing on the application of integration between different observation elements, different scales, and different data. We try to improve the accuracy of inversion of soil salinization information in terms of classification algorithms and model construction. This will provide reliable information for the development of saline soil management, ecological maintenance, and sustainable agricultural development in China.

#### **4. Main Modeling Approaches for Soil Salinity Remote Sensing Monitoring**

In recent years, countless domestic and international researchers have created many additional soil salinity monitoring techniques by tirelessly decoding an enormous amount of sensing image data. Keyword burst detection analysis was performed in CiteSpace software using the retrieved literature dataset related to soil salinity monitoring from 1981 to 2022 obtained by searching in the CNKI and the WOS databases (Figure 7). This analysis can be used to detect large changes in the number of citations of keywords in the literature during a certain time period. Thus, it helps to obtain the latest information on the frontiers of research in the field. The keyword burst detection mapping shows that since the information extraction and dynamic monitoring of salinized soil, many researchers have started to focus on the construction of models for soil salinization information by remote sensing. For example, "classification" appeared in 2012, "feature space" appeared in 2013, "quantitative models" appeared in 2015, "neural networks" and "random forest" appeared in 2016, "methods plsr" appeared in 2018, and "model", "machine learning", and "monitoring models" appeared in 2020. Among the 35 main keywords obtained from the analysis, a variety of keywords related to remote sensing monitoring modeling studies have begun to appear frequently in the last decade. These modeling studies are updated quickly and span a long period of time, and all of them have gradually achieved valuable research results while broadening the methods of soil salinity monitoring. This indicates that modeling studies have become a hot method in remote sensing monitoring of soil salinity.


**Figure 7.** Keyword burst detection mapping based on the literature related to soil salinity monitoring from 1981 to 2022 in the CNKI and WOS databases. Note: In the Figure, the blue lines correspond to the year in which the keywords first started to appear; the red lines correspond to the year in which the keywords started and ended as research frontiers.

The core method of quantitative assessment of soil salinity using remote sensing data is to explore the correlation between the content of relevant salinity indicators and remote sensing data [68]. Among them, salinity indexes can be obtained by bringing soil samples collected from field surveys back to the laboratory and performing measurement experiments for analysis. Pre-processed remote sensing image data are rich in spectral information of features, which are contained in the reflectance of different wavelength bands of remote sensing data. The spectral information contained in a single band is limited. Therefore, after extracting the spectral reflectance of the bands of the remotely sensed image, a combination operation between different bands can be performed. These indices are usually a priori formulas derived from existing studies (Tables 1 and 2).


**Table 1.** Salinity indexes and related calculation formulas.



A remote sensing monitoring model was constructed with the aim of establishing the relationship between soil salinity content and the abovementioned modeling factors. Therefore, the research hotspot of remote sensing monitoring of soil salinity mainly lies in the construction of an efficient and reliable remote sensing monitoring model [107,108]. Various models are used to extract salinity information mainly by remote sensing technology to obtain more accurate inversion results and to assess the regional soil salinity distribution. Among them, the most commonly used remote sensing models for salinity monitoring mainly include the following.

#### *4.1. Partial Least Squares Regression*

Partial least squares regression (PLSR) is a new multivariate statistical data analysis algorithm that combines the advantages of multiple stepwise linear regression models, principal component models, and simple linear regression models in one [109,110]. Partial least squares regression implements a combination of simplified data, structural regression modeling, and the analysis of correlations between two groups of variables. It also implements cross-validity validation of component extraction in the calculation process, reorganization and screening of information in the variable system, and selection of several new components with the best systematic explanatory power for regression modeling. This provides a great convenience for statistical analysis of multivariate data [111]. Therefore, partial least squares regression, which can automatically filter variables based on correlation and ensure model stability, has been widely used in recent years to model spectral data and achieve high-accuracy predictions.

Viscarra Rossel et al. [112] showed that the partial least squares regression method has better soil nutrient prediction capability. This method allows modeling when the number of sample points is smaller than the number of variables and provides excellent processing and analysis capabilities for complex problems. Numerous researchers have also used this model to invert soil organic matter and to filter the most important organic matter content variables from spectral data [113], resulting in better robustness of the established models. For example, Yumiti Maiming et al. [114] set the independent variable as the original spectral reflectance and the dependent variable as the soil organic matter content to construct an estimation model based on methods such as partial least squares regression. It was shown that the inverse logarithmic first-order differential transformation could help to improve the accuracy of the partial least squares regression estimation model.

#### *4.2. Support Vector Machines*

Support vector machines (SVM) are a method that better implements the idea of structural risk minimization. It can better solve practical problems such as small samples, nonlinearity, and high-dimensional data. It can be extended to other machine learning problems such as function fitting [115]. Support vector machines are a kind of machine learning method based on statistical theory, which has a strong mathematical foundation and is intuitive, stable, accurate, and efficient. Compared with traditional statistical methods, it has the advantages of strong functional expression, good generalization ability, and high learning efficiency. It is widely used in the fields of soil salinity information extraction because it is easy to combine with multiple sources of information and has relatively high accuracy [116]. However, at the same time, support vector machines are also too sensitive to the selection of parameters and functions and have shortcomings in solving multi-classification problems [117].

Using the spectral information of Landsat ETM+ images, Fei Zhang et al. [118] found that the support vector machines regression method based on texture features had an improved effect on the monitoring accuracy of soil salinity information. Similarly, Yiliyas Jiang [119] referred to the salinity classification system and used support vector machines to determine the optimal parameters, thus obtaining more accurate soil salinity classification information. In addition, Xi Wang et al. [120] selected support vector machines as a modeling method for machine learning based on the small sample size and practical situation of the study and performed remote sensing inversion of salinized soil organic matter.

#### *4.3. BP Neural Network*

BP neural network (BPNN) is a multilayer feedforward neural network trained according to the error back-propagation algorithm. It mainly contains two processes: forward propagation of signals and backward propagation of errors. BP neural networks can identify nonlinear relationships between input and output data sets in complex systems without constructing mathematical equations. It can eliminate the influence of specific values to a certain extent, has strong self-learning ability, strong adaptability, and anti-interference ability, and it is widely used by researchers in the inversion monitoring of salinity [121].

Wenzhe Feng et al. [122] simultaneously introduced BP neural network, support vector machines, and extreme learning machines to build a soil salinity monitoring model so as to improve the monitoring accuracy of soil salinity by satellite remote sensing. Among them, a three-layer BP neural network was used to build a soil salinity dynamics model according to the principle of relatively small error in training results. The results showed that the BP neural network model based on multi-source remote sensing images is more accurate. Xueli Feng et al. [123] found that the stability and prediction accuracy of the BP neural network model combining the second-order derivatives of spectral feature bands, radar backscattering characteristics, and combined surface roughness were higher than the rest of the models. This shows that the BP neural network model combined with multi-source remote sensing data can quickly and accurately monitor the distribution of soil salinization, which provides important guidance for the prevention and control of soil degradation.

#### *4.4. Random Forest*

Random forest (RF) is a new classification and prediction model that uses multiple decision trees for training and prediction of samples. The model randomly selects the training sample set with split attribute set, which is a combination of multiple decision trees and is less sensitive to outliers [124]. This gives random forest models high noise immunity and nonlinear mining ability, and the distribution of data does not need to conform to any assumptions, which performs well in the randomization training phase of samples and variables. Random forest models have been increasingly used for classification and regression in recent years because of their high accuracy in predicting results, the importance of computable variables, and the ability to model complex interactions among a large number of predictor variables [125,126].

Jie Hu [127] compared partial least squares regression and random forest regression methods using hyperspectral first-order differentiation, broad-band spectral indices, and narrow-band spectral indices as independent variables. The results showed that the random forest regression model could better predict soil salinity using spectral data, and the model for the bare soil area had the highest prediction accuracy compared to the model for the other sample areas. Meanwhile, in order to monitor the spatial variability of soil salinity as accurately as possible on a large scale, Lina Meng [128] used ordinary kriging, geographically weighted regression, and a random forest model combined with several environmental auxiliary variables to map the distribution of surface soil salinity in the study area. The results show that the random forest model has the highest prediction accuracy among the various prediction methods, indicating that the model is effective in quantitatively estimating soil salinity at the regional scale.

#### *4.5. Feature Space*

The modeling method based on the spectral feature space can also be applied to construct a soil remote sensing monitoring model [129]. The distance to a certain feature point in the two-dimensional or three-dimensional feature space of soil salinization covariates can reflect different degrees of salinity and determine the change trend between different covariates. We can analyze the scattered spatial map with practical experience

so that we can use spatial characteristic covariates of the scattered map to build the corresponding model [130]. Among them, two-dimensional feature space can clearly express the distribution pattern of factors affecting the soil salinization process. For example, the two-dimensional scatter diagram between vegetation and soil salinity has been elaborated and discussed by domestic and foreign researchers. In view of the future development path of soil salinity monitoring, two-dimensional feature space can no longer satisfy the analysis and display of multiple factors involved in salinization at the same time. With the development of remote sensing technology, the emergence of three-dimensional technology can help promote the development of soil salinity feature space research to three-dimensional or multi-dimensional space.

On the one hand, in the study of two-dimensional feature space application, Jianli Ding et al. [131] constructed a two-dimensional feature space based on a modified soil-adjusted vegetation index and moisture index to derive a remote sensing monitoring index model for soil salinization. The results showed that the two-dimensional feature space correlated well with the soil surface salinity of the oasis in the arid zone. Fei Wang et al. [132] constructed a quantitative relationship between surface-based feature vectors and the occurrence of the salinization process. It was found that the feature space model could invert the soil salinity distribution of the delta oasis in the study area more accurately. In addition, Lingling Bian et al. [133] quantitatively explored the patterns between soil salinity and surface biophysical parameters and also constructed a soil salinity eigenspace model. The results showed that the salinity index and albedo characteristic spatial model has a strong predictive ability and is most suitable for the inversion of salinization in coastal areas.

On the other hand, the use of three-dimensional feature spaces is increasingly developed. The relationship between soil salinization, albedo, and the modified type of soil adjusting the vegetation index was used by Juan Feng et al. to construct the feature space [134] The monitoring model constructed by surface albedo and soil-adjusted vegetation index was found to have a high correlation with soil salinity, which can better quantify and monitor the degree of soil salinization in the study area. Not only that, XuePing Ha et al. [135] used the relationship between salinity index, surface albedo, and soil salinity based on three-dimensional feature space to efficiently and accurately extract salinization information of the oasis in the study area.

#### **5. Discussion**

The extraction of soil salinization information by remote sensing monitoring technology has recently become a hot spot in the field of remote sensing research. Many researchers domestic and abroad are gradually developing new technical means and research methods to expand research in the field of high-precision monitoring of regional soil salinization.

Various models including those summarized above have been widely used for remote sensing monitoring of soil salinity. Many of these improved and optimized mathematical models have been continuously applied to the study of the spatial distribution of soil salinity (Table 3). In addition, the back trajectory model can be used to track the trajectory of saline sands and salt dusts. The vertical distribution characteristics of sands and dusts in the atmosphere are detected by using satellite remote sensing data or UAV remote sensing data combined with the back trajectory model for the purpose of monitoring the time-series trajectory of saline soils [136,137]. Many of these models, which have achieved high accuracy, provide a favorable basis for spatial and temporal analysis and prediction of regional salinization. At the same time, however, this paper argues that there are still some shortcomings in the current research on soil salinity monitoring in China that need to be further discussed and addressed by researchers.


**Table 3.** Multiple modeling approaches for remote sensing to monitor soil salinity.


**Table 3.** *Cont.*

First of all, the most important shortcoming is that the universal applicability of monitoring models needs to be further improved. Saline soil resources in China are distributed in different climatic zones, and there are large differences in soil types, water and heat conditions, and planting methods. Therefore, the physical and chemical properties of soils in the current study area, temporal and climatic conditions, relevant environmental factors, and model application preferences considered by researchers in constructing monitoring models can lead to a generally low applicability of the optimal models derived from the final experiments. However, this can invert the soil salinization information of the current study area with high accuracy. Nevertheless, it is difficult for it to be applied to other seasons in the same study area or to other study areas with different environmental states.

Secondly, sometimes there are more types and a greater abundance of salt-tolerant vegetation cover in the study area. The soil salinity monitoring model constructed in the study area needs to be adjusted because of the large differences in spectral characteristics between different vegetation types and bare soil. Therefore, the model can be applied to areas with different vegetation growth stages and areas with different arable, grassland, and woodland coverage to eliminate the influence of vegetation on monitoring results as much as possible. This not only increases the difficulty of designing and executing the distribution of soil sampling points during fieldwork but also incorporates more variables into the construction of the model. This will lead to a decrease in the accuracy of monitoring models to invert salinity. Therefore, how to construct a salinity monitoring model that can efficiently and accurately reflect the salinity of vegetation cover areas will be a major problem to be faced in this field in the future.

#### **6. Research Perspectives on Soil Salinity Monitoring in China**

A synthesis of previous studies carried out on soil salinization shows that most monitoring methods are based on the spectral response characteristics of salinized soils. Researchers have combined the spectral information obtained from different remote sensing data with non-remote sensing parameters to establish inverse models for regional soil salinity monitoring. However, due to the high correlation between water and salt in soil salinization, salt moves with water, and soil salinity is easily shifted with changes in moisture. In the monitoring of salinity, not only the salt but also the water will be considered

to further develop water–salt synergistic monitoring so as to achieve the purpose of high monitoring accuracy and effective monitoring.

For the scale studies of salinized soils, researchers have mostly conducted a single scale or chosen the administrative district as the scale indicator for analysis. Some results have been obtained for soil salinity monitoring at different scales, such as field and regional scales. However, the correlation between different scales and their transformation have been less studied. Various reasons, such as surface heterogeneity, complexity of the salinization process, and significant variability in the dominant factors affecting the spatial variability of soil salinity at different scales [176], can lead to large differences in the autocorrelation of the same variable at different scales. There are limitations in studying salinization at a single scale.

The variability of soil salinity shows obvious scale effects with spatial scales. If the analysis is performed only at large scales, it may cause the spatial structural features at small scales to be obscured, making it difficult to analyze the structural features of soil spatial variability in depth. Soil salinity research based on multi-scale methods as well as scale-transformation methods can solve this problem well, which will provide new research ideas for soil salinity monitoring.

#### **7. Conclusions**

The proper management of saline soil resources in China is related to national food security and ecological stability. The leaders of the Party and the State attach great importance to the management and utilization of saline soils. As an important land resource in China, saline soils of different types, vast areas, and great potential provide unique research conditions for our researchers. Efficient and accurate monitoring of salinization information along with the management and development of unused saline soils provides more room for development to expand the country's arable land resources and expand the path of agricultural development.

With the continuous progress of remote sensing technology, research on soil salinization information extraction methods has developed over a long period. In order to obtain more accurate inversion data to characterize the interrelationship between soil salinity status and its influencing factors, the construction of remote sensing monitoring models has gradually become a research hotspot in the field of soil salinity monitoring. In future research, the exploration of soil salinization information extraction will become more extensive, will generate more data, and will be more accurate in judgment [176]. In general, according to current scientific needs and national demands, soil salinization research in China will play an important role for national food security [177], arable land security [178], saline land improvement [179], land use protection [180], ecology, and sustainable agricultural development. In the face of today's increasingly serious soil salinization situation, it is important to seek more efficient, reliable, accurate, and economical soil salinization monitoring technologies.

**Author Contributions:** Conceptualization, methodology, formal analysis, investigation, data curation, and writing—original draft preparation, Y.M.; writing—review and editing, supervision, project administration, and funding acquisition, N.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was carried out with the financial support provided by the National Natural Science Foundation of China "Study on the optimal scale for soil salinization water–salt remote sensing monitoring", grant number 41761077.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in the China National Knowledge Infrastructure at https://www.cnki.net (accessed on 12 January 2023), and the Web of Science at https://www.webofscience.com/wos/woscc/basic-search (accessed on 17 March 2023).

**Acknowledgments:** We extend our heartfelt gratitude to the anonymous reviewers of this manuscript for their constructive comments and helpful suggestions, which strengthened the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*Sustainability* Editorial Office E-mail: sustainability@mdpi.com www.mdpi.com/journal/sustainability

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9478-1