**Table1.**Listofdriver'sdatasetsforthedevelopmentofarchetypes.

120




2.4.3. Linking Archetypes of Land-Degradation Drivers to State Administrations and Land Status

In a previous study by [11], LD status was captured using rainfall-corrected vegetation greenness as a proxy (Figure 3, Box 4). By overlaying the archetypes cluster (Figure 3, Box 3) with the LD status of the area (i.e., degraded, stable, and improvement) (Figure 3, Box 4), the percentage share of each archetype per LD status was determined. This enabled the grouping of the nine archetypes as undergoing large-area or small-area degradation, respectively. Thus, archetypes with area coverage <10% of the total degraded area have a small-area degradation, while archetypes with area coverage ≥10% of the total degraded area are classified as largearea degradation clusters [56]. Through spatial overlay [23], the linking of the archetypes with the states' administrative boundaries to determine the share of each archetype per state was possible (Figure 3, Box 4). The emerging results were used to explain and discuss the implications for land governance and SLM in the NGS (Figure 3, Box 5).

#### **3. Results**

#### *3.1. Land-Degradation Status*

Using Normalized Difference Vegetation Index (NDVI) as a proxy for degradation status, Figure 4 shows the spatial distribution of LD in the NGS by [11]. About 38% (251,401 km2) of the NGS is degraded, while 14% (91,258 km2) and 48% (319,470 km2) show improvement and remain stable, respectively. While improved and stable areas are mostly found in the north of the NGS and to a certain extent in the south of the NGS, large-area degradation is predominantly found in the centre of the NGS, ranging from its north-western to its eastern border.

**Figure 4.** Land-degradation (LD) status for the Nigerian Guinea Savannah (NGS) (Source [11]) using Normalized Difference Vegetation Index (NDVI) as a proxy after correcting for rainfall.

#### *3.2. Land-Degradation Archetypes*

In this section, the archetypes and how they can improve the understanding of LD in the NGS are presented. Figure 5 displays each archetype according to the percentage contributions of driver categories, while Figure 6 shows the spatial distribution of the archetypes. Based on the 12 input drivers, nine archetypes of LD drivers were identified (Figure 5). Five archetypes were dominated by land-use management practices (NGSA 1, NGSA 5–8), and three dominated by socio-economic drivers (NGSA 2–4), while NGSA 9 was dominated by environmental drivers (Please refer to supplementary material, Table S1 for a description of the archetypes).

#### *3.3. Spatial Distribution of Archetypes*

The spatial distribution of the nine archetypes of LD drivers is shown in Figure 6. In Table 2, a brief description and share of each archetype in the NGS is provided.

From Table 2, six clusters (NGSA 2–5, NGSA 7 and NGSA 9) with individual total areas greater than 10% cover 78.5% of the total area, while the remaining three archetypes with individual total areas smaller than 10% of the area cover 21.5% of the NGS.

**Figure 5.** The Z-score normalized values of drivers characterizing the nine archetypes of LD drivers, (zero depicts the mean for the NGS; the percentage contribution of driver categories into archetype clusters is presented in the boxes: red: land-use management practice; blue: socio-economic; green: environmental drivers).

**Figure 6.** Spatial characterization of archetypes of LD drivers. See supplementary material, Section 6 for the full description and ranking of the archetypes in relation to administrative and land status.



**Table 2.** *Cont.*

*3.4. Categories of Archetypes According to State Administrative Boundaries and LD Status* 3.4.1. Degree of Land-Degradation Status per Archetype

Figure 6 links the archetypes with the LD status (degradation, stable, improvement; Figure 4), to highlight their proportions and potential interplay (Figure 7). Four archetypes with very high population density (NGSA 2), moderate–high information/knowledge access (NGSA 3), and moderate–high poverty level (NGSA 4), as well as NGSA 5—very remote from a major town—are associated with 61.3% of the large-area LD, while the other archetypes account for 38.7% of small-area degradation (Figure 7). Six archetypes—NGSA 2 to NGSA 5, as well as NGSA 7, with very high livestock density, and NGSA 8 with dominant land-use management practices and nearly level terrain—are responsible for 78.4% of the large-area stable status and the other archetypes for 21.6% of the small-area stable status (Figure 7). For the large-area improvement, six archetypes, NGSA 2 to NGSA 4 and NGSA 7 to NGSA 9 with rugged terrain, i.e., very high slope and moderate elevation, covered 78.7% while other archetypes account for 21.3% of small-area improvement (Figure 7). For the complementary table, see supplementary material, Table S3; and Section 6 for the full description and ranking of the archetypes in relation to LD status.

**Figure 7.** Archetypes in percentage of associated LD status (For full percentages, see supplementary material, Table S3).

#### 3.4.2. Share of Land-Degradation Archetypes per State Administration Unit

State administrations manage land within their jurisdictions, hence they influence land-use decisions in Nigeria. Figure 8 shows the percentage share of the archetypes within a state's boundary (for the complementary table, see supplementary material, Table S2). Seven states, comprising Bauchi, Kaduna, Kwara, Nasarawa, Niger, Plateau, and Zamfara have all the nine archetypes. While four states, namely Benue, Kebbi, Taraba, and the Federal Capital Territory (FCT) are covered by eight of the nine archetypes. The remaining states have an uneven combination of all archetypes, with the portion of Abia state found within the NGS only embodying one archetype (i.e., NGSA 2). The five states comprising Niger (40.5%), Oyo (29.6%), Kwara (24.4%), Nassarawa (18.6%), and Ekiti (17.6%), have the largest shares of the archetype of NGSA 4, i.e., a moderate–high poverty level. The supplementary material Table S2 contains the grouping of the archetypes according to the state administrative units.

**Figure 8.** Share of LD archetypes by states in the NGS. Note: shares outside the NGS, that were not analyzed, are in grey. (For full percentages, see supplementary material, Table S2).

#### **4. Discussion**

#### *4.1. Understanding the Archetypes of Large-Area Degradation*

Areas identified to be under large-area degradation are archetypes with more than 10% of their areas experiencing biomass degradation (NDVI). Four major archetypes of large-area degradation were thus identified (Figures 5 and 7). Out of these, NGSA 3, with prolonged cases of fire occurrence, and NGSA 5, with rural remoteness from a major town (Figure 5), highlight land-use management practices as the drivers of large-area degradation in the NGS. NGSA 3 thus confirmed studies that have implicated fire-related activities such as charcoal making, farming, and hunting with bush burning as causes of LD [57,58]. The dominant characteristics of NGSA 5, on the other hand, contradicts the notion that land areas closer to major towns are more prone to LD than those farther

away [16,17,29]. NGSA 5 rather reveals the rural areas and natural resource users in remote areas who adopt unsustainable land-use practices (of NGSA 3), such as continuous bush burning and deforestation, trigger LD [59].

Apart from archetype NGS 5 with high remoteness from a major town (Figure 5), the other three, i.e., archetypes with very high-density population (NGSA 2), moderately high information/knowledge access (NGSA 3), and moderate–high poverty level (NGSA 4), are dominated by high percentages of socio-economic drivers. Thus, socio-economic factors are major underlying causes that indirectly push other proximate drivers of largearea degradation in the NGS [29,60,61]. With the low population density of three (NGSA 3–NGSA 5) out of the four archetypes experiencing large-area degradation, high poverty, and low literacy, this factor can be inferred to be associated with large-area degradation in the NGS context [59,61,62]. This confirms studies such as [62,63], who reported that poverty intensifies a tendency to change vegetation cover, as many people deplete natural vegetation for fuel, food, and as a source of income because of fewer or no alternative livelihood options. This invariably points to the areas with high poverty and low population density, i.e., rural population, covering the northwest central and northeast of the NGS, encompassing the states of Kebbi, Niger, parts of northern Kwara, FCT (mainly around Abuja, see the area illustrated in Figure 5), and parts of Nasarawa, Plateau, Taraba, Kaduna, and Adamawa states, which have experienced extensive degradation [11]. Although, NGSA 2 with a very high-density population, hints that urban areas, with their high population density, are also associated with large-area degradation in the NGS, the degradation is not as extensive as in the low-population-density remote areas. Therefore, this result deviates from the general notion that a high-population density is the main cause of LD in Nigeria [13,33]. Hence, a high population density alone does not drive LD without certain complementary factors, such as poverty, illiteracy that restricts information or knowledge access, and poor national policies that are prevalent in Sub-Saharan African contexts [64]. Therefore, the three socio-economic sub-drivers—poverty, literacy and population density—are core interrelating drivers in large-area LD, that require attention in addressing LD in the NGS [29,62,65].

While the percentages of land-use management practices and socio-economic factors dominate as potential drivers of large-area LD, specific environmental drivers of the archetypes also underpin the large-area LD (Figure 5). The nearly level terrain condition, i.e., low-elevation–flat terrain of the four large-area archetypes, is known to encourage land cultivation in Nigeria [13]. In addition, the characteristics of low bulk density of NGSA 2, NGSA 3 and 5, and the high bulk density of NGSA 4 with moderate–high poverty level also highlight soil characteristics that encourage large-area degradation [42,43]. The low bulk density archetypes signify few areas in the southern part of the NGS with suitable soil for cultivation. The archetypes characterized by high bulk density on the other hand correspond to areas with the highest impact of agricultural management practices, such as machinery and high cropping impacts [42]. This represents 23% of the large-area degradation archetypes, and in turn reflects the widespread LD due to the high agricultural engagement by the rural dwellers in the zone [63].

#### *4.2. Understanding the Archetypes of Small-Area Degradation*

Five archetypes of small-area degradation were identified, that is, archetypes where degraded areas are less than 10% of the archetype area. LD and their drivers thus differ locally and are context-specific [5,60], and an archetype approach can help identify the socioecological contexts [66]. From the five small-area archetypes, three archetypes identified with very high presence of protected areas (NGSA 1), that are very remote from a major road (NGSA 6), have a very high livestock density (NGSA 7), and have high percentages of land-use practices (Figure 5). While the NGSA 1 reflects its conservation and restricted use status, the additional association of NGSA 1 with high and rugged elevations like NGSA 9, further explains its small extent of degradation. However, with its low proximity to major roads and major towns, degradation in protected areas as captured in NGSA 1 call for an

investigation into the specific activities driving degradation in protected areas, such as encroachment by human activities [11,63], despite government regulations, particularly around communities that host protected areas [67,68].

In archetypes NGSA 6 and NGSA 9, the small-area degradation is highly driven by non-proximity to a major road, without much influence of other sub-drivers (Figure 5). Nearness to major roads is a measure of infrastructural development that influences accessibility and the spread of land-use management practices, including information [69]. Thus, NGSA 6 represents remote areas with restricted access, which can hamper the propagation of sustainable land-management initiatives [29,38,49]. As in NGSA 1 and NGSA 6, the very high livestock density configuration of NGSA 7 (Figure 5) is also associated with small-area degradation (Figure 7). The Guinea savannah in particular is currently under pressure from high grazing activities because the Sahel and Sudan savannahs have been extensively degraded by overgrazing [15,70], leading to competition for grazing resources and conflicts in the region [71]. Overgrazing has been associated with the disappearance of the typical savannah vegetation and the emergence of the Sudan–Sahelian Savannah in the NGS [15,72]. Thus, from the combination of drivers above, there is a critical need for an improved management of grazing resources, protected areas, and the governance of land resources in the NGS [11,36,68]. While all small-area archetypes are mostly dominated by the land-use management drivers, NGSAs 6 and 8 are the only small-area archetypes that are distinctly driven by the socio-economic drivers characterized by areas with low population density, i.e rural population with corresponding moderate information/knowledge access. Unlike other factors, poverty (high or low) is not a distinctive feature of these archetypes (Figure 5), hence this study cannot confirm the notion that 'the higher the poverty, the more the degradation' held by many studies of small-area degradation [38], as noted Table 1. Considering that poverty is widespread in the NGS, there is a need for integrating other social-demographic and social-relational data for a better understanding of the interactions between poverty and LD.

#### *4.3. Archetypes and Policy Insights*

As multiple factors are associated with LD, policy interventions aimed at achieving SLM need to be inter-sectoral. However, many policies in Nigeria, such as the Nigerian National Agricultural Policy, focus on single sectors and often do not have LD reduction as a primary objective [10]. Key policy topics related to these findings are sustainable use and management of natural resources, poverty reduction, environmental awareness and education, strategy to reduce dependence on land and natural resources for livelihoods, and inclusion of LD in land-use planning.

With the extensive LD in the NGS, policies for the sustainable management of natural resources including water, soil, and biodiversity, as well as their coherence, are essential [73]. While several response programmes such as the Nigeria National Policy on Environment are promulgated to address activities that cause LD [74], they remain reactive without an effective scaling up to tackle the drivers of LD. For example, the proposed national policy on the rediscovery of grazing routes and reserves remains unprepared to address LD because the advocates focus on the profit and the pressing need to tackle the farmers and herders clashes in Nigeria [67,71], without a recourse to the fact that spatial developments in Nigeria have overtaken several historical grazing spaces [67]. The polarized nature of Nigeria between sectional and ethnic divides further raised several counter notions with socio-political undertones to such policy moves by the government [75]. This subsequently affected the acceptance of related policies such as the 10-year National Livestock Transformation Plan (NLTP), ranching plan, and open grazing [67]. While it is obvious that degradation induces competition and tension among natural resources users, policy decisions on land-based issues require a special focus on LD and land restoration.

Although results did not explicitly show that only low/high poverty is associated with large-/small-area degradation, poverty contributes to the different archetypes identified. Nigeria has about 83 million (40%) of its population living below \$1.90 per day, comprising 52% rural dwellers whose livelihoods are predominantly tied to agricultural activities [76,77]. About 30 million more Nigerians are expected to be added to the national population living in extreme poverty by 2030 [77]. Many of the poor depend on livelihood activities such as charcoal making and hunting with bush burning that promote LD [57,58]. Archetype NGS 4, which covers the largest proportion of the NGS (20%) and has moderate–high poverty and a low level of both male and female literacy, is characterized by large-area degradation. Two broad groups of the country's poverty alleviation programmes (PAPs) have been identified in Nigeria [78]: (a) the Core Poverty Alleviation Programmes (CPAPs) such as Better Life Programme for Rural Dwellers (BLP) and Family Economic Advancement Programme (FEAP), and (b) the Non-Core Poverty Alleviation Programmes (NCPAPs), which include the National Agricultural Land Development Authorities (NALDA). Such policies had no long-lasting effects [78,79] and did not focus on the intersections between poverty and LD. Other policies such as the Agricultural Development Projects (ADPs) and Vision 20-2020 are in a dying state [78,79], with most interventions aimed at increasing farmer revenue and reducing poverty, and mainly focus on improving input supply without giving attention to improving land management.

Nigeria has about 56.9% adult illiteracy [80], with variations across states and regions including urban and rural areas (i.e., urban 74.6% and rural 48.7%). NGSA 3 shows the linkage of low male and female literacy with large-area degradation. Most farmers and herders in Nigeria did not finish primary education and are less likely to access and understand the little knowledge and information disseminated through extension services or lack the resources to access this information themselves [79,81,82]. While the use of mobile phones is promoted to improve access to information, little or no information is provided on sustainable land-use and management. According to [83], technology is necessary to scale up the adoption of initiatives amongst resource-poor users. With the widespread poverty, weak industrial presence to absorb the increasing population, and the reliance on the primary sector, the quest to exploit environmental resources supersedes interest for environmental protection and management [84]. Hence, pathways to improving land management need to be sought both outside agriculture (e.g., creating employment opportunities outside agriculture) as well as within agriculture through improving farmers/herders' access to sustainable land management practices as well as motivating their adoption [85].

With the growing LD, effective policy on land-use planning is critical for degradation response in Nigeria as the current land-use policies and practices do not adequately consider sustainable land management [10,86]. The historic lapses in the National Land Use Act (LUA) of 1978 persist, whose focus only recognizes land ownership and promotes land access without a sustainable land-use plan or governance to cater for the pressure from the growing population. Calls to review the LUA to give room for a more sustainable policy for land-use planning and governance in Nigeria [10,13], remain unheeded. A challenging question is thus: what opportunities can be identified for promoting SLM [10]?

#### *4.4. Archetypes and Sustainable Land Management (SLM)*

Based on literature and the study results, sustainable fuelwood management/energy efficiency, reforestation and afforestation, sustainable pastoralism, and structural land management measures are potential interventions to address LD.

In Nigeria, over 70% of the population rely on wood fuel for cooking, which is an underlying driver of deforestation and associated LD [63]. In recognition of the reliance on fuel wood, sustainable fuelwood management (SFM) is being promoted under the United Nations Development Programme (UNDP)/Global Environment Facility (GEF) project on management of fuel wood in mitigating the effects of climate change such as in Kaduna State, Nigeria [87,88]. While such initiatives have made some progress in establishing woodlots, producing energy-efficient cooking stoves, and establishing local forest management committees, low community buy-in, land tenure, and governance remain key constraints [88]. A review of such initiatives can provide insights on how to improve states' SLM outcomes

and out-scale them to other states in the NGS. Thus, SFM has some potential for promoting landscape stewardship in the region [87,89].

Tree-based programmes also hold potential to reduce LD and remain the principal focus of restoration programs [90]. Reforestation, afforestation, and agroforestry have been found advantageous and successful around the world and in Nigeria, particularly in LD response [91]. In Nigeria, successive governments at all levels have worked collaboratively to encourage and implement various afforestation projects [92]. For instance, among the frontline states of the Great Green Wall Afforestation Programme, tree planning campaigns with eucalyptus species and shelter belts for sand dune and degradation fixation are commonly practiced [74,92]. Therefore, land users in Nigeria can be incentivised to participate in tree-based initiatives, which have recorded successes elsewhere, such as in Kenya [64], to reawaken interest in combating LD. Such initiatives can focus on Niger, Nassarawa, Kwara, and Kogi, including Kaduna and Oyo states, due to the prevalence of large-area LD archetypes. Agroforestry, a multifunctional practice of cropping with trees and shrubs on arable land, is also a potential SLM practice that can improve land productivity, is a low-cost and adaptable tree-based initiative [91,92], that contributes to food security and land resource conservation [91].

With the evidence of high livestock grazing activities in archetype NGSA 7 (Figure 5), traditionally, pastoralism is predominantly practiced in northern Nigeria, with southward movement following the rain and in search of pasture and water during the dry season. Overgrazing causes LD, and indiscriminate overgrazing has caused negative stereotyping and fuelled tensions between pastoralists and non-pastoralist-actors, causing loss of lives and properties as well as communal crisis [67,71]. In some cases, overgrazing by livestock and excessive open grazing lead to the failure of afforestation programmes, including severe violation of protected areas across West Africa [67,93]. While pastoralism under a proper management system is ecologically, economically, and socially viable [94,95], climate change and poor land management in the face of growing national population and pressures from neighbouring herding countries make traditional pastoralism unsustainable in Nigeria [71,93,94]. Studies thus call for controlling open grazing to check indiscriminate overgrazing and secure livestock production in Nigeria [75,96].

In view of several environmental consequences of the archetype driven by terrain characteristics, avoiding degradation-prone rugged terrain is key to maintaining the remaining biomass of the zone. Investing in SLM structures such as land levelling, terracing, and contour farming are critical to tackling LD [97,98], particularly on agricultural landscapes like the NGS, where engagement in farming remains necessary for livelihood sustenance [12,13], and biodiversity and natural conditions are threatened largely by agricultural expansion [13]. Terracing and high-altitude afforestation for erosion control, for example, have been recognized to reduce loss of soil and LD on sloped terrain [99]. Similarly, contour farming and staggered contour trenching, which involves planting of crops across a slope based on elevation contour lines are also effective for managing degradation on rugged–steep terrain [99].

#### **5. Conclusions**

This study identified nine archetypes of LD drivers in the NGS, which are mostly dominated by social-economic, land-use management practices, and a slight influence from environment drivers. Specifically, four archetypes characterized by a very high-density population, moderately high information/knowledge access, and moderate–high poverty level, as well as remoteness from a major town, account for 61.3%, 78.4%, and 78.7% of total degraded, stable, and improvement areas, respectively. LD is mostly evident in states bordering the northwest to the central and northeast of the NGS, such as Niger state, which have predominantly large rural farming communities. Besides revealing the LD drivers, the archetypes characteristics provide a basis for determining and prioritizing relevant SLM policies and practices such as poverty reduction, creating environmental awareness and promoting sustainable pastoralism as well as robust land-use planning to strengthen

land governance in Nigeria. Despite the limitations of spatial data on the driving factors, the outputs from this study provide a useful guide on how archetypes can serve as a tool for progressing Nigeria's LDN through SLM. Like most unsupervised classification techniques, field validation of the archetypes results is necessary because of the adopted self-organizing mapping techniques. However, this could not be conducted because mobility limitations and scarcity of spatially explicit data limited the number of variables that could be used for this study. As more spatially explicit data on Nigeria and Africa become available, they need to be integrated in future studies of archetypes as well as validating them with field observations.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2072-4 292/13/1/32/s1, Figure S1: Correlation matrix between the selected drivers of land degradation archetypes, Figure S2: Training process for determining the nine cluster separations that are suitable for the land-degradation archetypes, Figure S3: The plot showing the aggregated profile of each neuron as synthesized by the selected drivers of land degradation, Figure S4: Quality of the nine clusters in the codebook, Figure S5: Neighbor distances between the neurons for the clusters, Table S1: Calculated Z-score normalized values of drivers characterizing the nine (9) distinctive archetypes of land degradation, Table S2: Archetypes and percentage share per state administration unit, Table S3: Archetypes and share of land status.

**Author Contributions:** A.A.A.: Conceptualization, Methodology, Software, Data curation, Writing— Original draft preparation. C.I.S.: Conceptualization of the research project, Supervision, Writing— Review & Editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** Ademola. A. Adenle is funded by the UniBe International 2021, Initiative of the Vice-Rectorate Development, University of Bern, Switzerland and acknowledges the small grant funding from the Rufford foundation for the research project [27153-1] in Nigeria as well as the Small Equipment Grant from IDEA WILD toward the acquisition DJI Mavic Air Quadcopter with Remote Controller.

**Acknowledgments:** Ademola A. Adenle acknowledges the support from the UniBE International 2021, Initiative of the Vice-Rectorate Development, University of Bern, Switzerland. We are also grateful to the reviewers whose constructive feedback helped to improve this paper. This study contributes to the Programme on Ecosystem Change and Society (www.pecs-science.org) and the Global Land Programme (www.glp.earth).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Using Sentinel-1 and Sentinel-2 Time Series for Slangbos Mapping in the Free State Province, South Africa**

**Marcel Urban 1,\*, Konstantin Schellenberg 1, Theunis Morgenthal 2, Clémence Dubois 1, Andreas Hirner 3, Ursula Gessner 3, Buster Mogonong 4, Zhenyu Zhang 5, Jussi Baade 6, Anneliza Collett <sup>2</sup> and Christiane Schmullius <sup>1</sup>**


**Abstract:** Increasing woody cover and overgrazing in semi-arid ecosystems are known to be the major factors driving land degradation. This study focuses on mapping the distribution of the slangbos shrub (*Seriphium plumosum*) in a test region in the Free State Province of South Africa. The goal of this study is to monitor the slangbos encroachment on cultivated land by synergistically combining Synthetic Aperture Radar (SAR) (Sentinel-1) and optical (Sentinel-2) Earth observation information. Both optical and radar satellite data are sensitive to different vegetation properties and surface scattering or reflection mechanisms caused by the specific sensor characteristics. We used a supervised random forest classification to predict slangbos encroachment for each individual crop year between 2015 and 2020. Training data were derived based on expert knowledge and in situ information from the Department of Agriculture, Land Reform and Rural Development (DALRRD). We found that the Sentinel-1 VH (cross-polarization) and Sentinel-2 SAVI (Soil Adjusted Vegetation Index) time series information have the highest importance for the random forest classifier among all input parameters. The modelling results confirm the in situ observations that pastures are most affected by slangbos encroachment. The estimation of the model accuracy was accomplished via spatial cross-validation (SpCV) and resulted in a classification precision of around 80% for the slangbos class within each time step.

**Keywords:** shrub encroachment; slangbos; land degradation; Earth observation; time series; Sentinel-1; Sentinel-2; Synthetic Aperture Radar (SAR); Soil Adjusted Vegetation Index (SAVI); machine learning

#### **1. Introduction**

Increasing woody cover and overgrazing in open semi-arid ecosystems are known to be one of the major factors driving land degradation [1]. In the context of this study, land degradation is defined as "the many human-caused processes that drive the decline or loss in biodiversity, ecosystem functions or ecosystem services in any terrestrial [ ... ] ecosystems" [2] (p. 28).

During recent decades, woody cover encroachment in open ecosystems has significantly increased in southern Africa, which have led to crucial environmental, land cover

**Citation:** Urban, M.; Schellenberg, K.; Morgenthal, T.; Dubois, C.; Hirner, A.; Gessner, U.; Mogonong, B.; Zhang, Z.; Baade, J.; Collett, A.; et al. Using Sentinel-1 and Sentinel-2 Time Series for Slangbos Mapping in the Free State Province, South Africa. *Remote Sens.* **2021**, *13*, 3342. https://doi.org/ 10.3390/rs13173342

Academic Editor: Elias Symeonakis

Received: 18 June 2021 Accepted: 6 August 2021 Published: 24 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and land-use changes [3–7]. Venter et al. [8] concluded that the intensification of woody cover is not only connected to rising CO2 concentrations in the atmosphere, as a result of the human-induced global climate change, but it is also noticeable on regional aspects (e.g., extinction of large herbivore herds, and fires). However, CO2 fertilization improves water-use efficiency, due to the reduced stomatal conductance of woody plants [9]. Consequently, even with constant water ingestion, rising CO2 concentrations would lead to increased growth rates of woody biomass in relation to grass communities. Nevertheless, water availability and temperature are the key constraints on woody plant growth and are the crucial parameters in explaining encroachment patterns [5].

This study focuses on analyzing the spreading of the slangbos or bankrupt bush (*Seriphium plumosum*) in a selected test region in the Free State Province of South Africa. Though indigenous to South Africa, slangbos has been documented to be the main encroacher on the grassvelds (South African grassland biomes) in the provinces of the Free State, North West, Mpumalanga, Eastern Cape and Gauteng [10,11].

The rainfall optimum of *Seriphium plumosum* is between 620 and 750 mm per year [12], which lies between the semi-arid and mesic biome [13]. The shrub reaches a height and diameter of up to 0.6 m and has small light green leaves, which makes them perfectly adapted to long dry periods, compared to grass communities and the leaves are unpalatable to grazers due to their high oil content [14]. However, du Toit et al. [15] found that cattle graze young slangbos plants, which were regrown one year after active fire control measures. Slangbos prefers to grow on low fertile loamy soils, which makes hilltops and dry areas in the Free State vulnerable to this encroacher [16]. The encroachment of slangbos is responsible for a decrease in the grasslands productivity, pastures and livestock carrying capacity. The root system of *Seriphium plumosum*, which reaches a depth of up to 1.8 m and an extent of 1 m2 around the plant [14], competes with the surrounding grass for water and the availability of nutrients, which leads consequently to a reduction in the grass layer [17,18]. As slangbos is unpalatable for grazers, the shrub puts bigger pressure on the existing open grassland, which becomes vulnerable to overgrazing, increasing the potential of land degradation [16]. Moreover, slangbos is found to have high allelopathic potential, which impedes other plant species from finding suitable growing conditions in the surrounding area of a slangbos [19]. This poses great challenges to farmers on the one hand and for the local biosphere on the other hand. Approximately 11 million hectares of rangeland could become unsuitable for grazing if measures to eradicate the plant do not succeed. This could result in an annual loss of about ZAR 760,000 (South African Rand) (about EUR 44,000) for a 1000 ha livestock farm with a slangbos infestation of 50% of the pasture area [20].

Field observations detecting shrub distribution are cost intensive, time consuming and often do not address the spatial heterogeneity of encroachment patterns [21]. Earth Observation (EO) data from different sources and across different wavelengths (e.g., from ESA's Copernicus Sentinel Programme) provide a suitable tool for mapping the extent and the velocity of woody encroachment [7,22–25]. The freely available optical and radar satellite data from the Copernicus Programme have short revisiting times (5 to 12 days) as well as high spatial resolution (10 m). Hence, they allow for monitoring woody cover encroachment in quasi near real-time and enable transferability and reproducibility in other regions. Bush encroachment mapping in southern Africa utilizing various sources of EO information (e.g., optical and radar) was investigated by recent studies [8,26–31]. These studies analyzed shrub cover increase, using high to low spatial resolution EO data investigating different approaches (e.g., land cover classification [27,31], random forest [8,26,29], trend analysis [28,30]). However, few studies have focused on one specific shrub species only. In general, an intensification of woody cover was present in all studies, especially in the open rangelands. Local studies revealed that protected areas with large herbivore populations (e.g., elephants) occasionally show a loss in woody or tree cover [27].

Remote sensing techniques are not likely to replace field measurements completely, as the validation of EO approaches deriving woody vegetation composition is still of

high importance [21,32]. However, they allow for continuous wall-to-wall monitoring on a larger spatial extent of the parameter woody cover, which is considered an essential biodiversity variable [33].

The goal of this study is to monitor the slangbos encroachment on rangeland and pastures in the Free State Province, South Africa, between 2015 and 2020 by synergistically combining Synthetic Aperture Radar (SAR) (Sentinel-1) and optical (Sentinel-2) EO information. Both optical and radar satellite data are sensitive to the different vegetation properties and surface scattering or reflection mechanisms caused by the sensor specific characteristics. This study aims to (1) investigate the sensitivity of optical and radar remote sensing toward *Seriphium plumosum* and (2) to use combined SAR and optical dense time series to perform encroachment mapping on a regional scale in the South African grassland biomes. Utilizing time series, particular features in hyper-temporal radar and spectral data are examined to conclude the temporal variations observing *Seriphium plumosum*.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The monitoring of slangbos encroachment was carried out on an approx. 90 km by 50 km (4500 km2) area located in the Free State Province between the towns of Ladybrand in the east and Botshabelo in the west (Figure 1).

**Figure 1.** The study area is located in the Free State Province, South Africa, close to the border of the Kingdom of Lesotho (in white). (Source: National Land Cover Classification of 2018 [34], Roads of South Africa [35]).

The region is part of the Highveld grassland ecosystem (1300 m–1700 m a.s.l.), which is characterized by a continental climate with a mean annual temperature of around 14 ◦C. However, days with temperatures below the freezing point are common during the winter season. The mean annual precipitation ranges from 500 mm to 700 mm, and the majority of rain occurs in the summer season between November and March [11].

The study area is known for extensive plains, cultivated areas and plateaus, which form the relief. The region is sparsely populated and mostly used for cattle and arable farming as well as water reservoirs. The primary land cover types are rangelands for grazing, cultivated land, shrubland and open grassland. The Highveld grassland biome is influenced by three major land degradation phenomena, namely (1) changes in plant composition, (2) loss of vegetation cover with a subsequent increase in wind and water erosion occurrence and (3) bush encroachment [11]. Moreover, the study area is one of the regions with the highest impact of invasive alien plant species throughout all of South Africa [36].

#### *2.2. Data*

#### 2.2.1. Sentinel-1 and Sentinel-2

The Sentinel-1 and Sentinel-2 missions of the European Space Agency (ESA) Copernicus Programme have led to the increased availability of open access EO information covering both the optical and the microwave spectra. This opens new possibilities for the analysis of data with high spatial as well as temporal resolution for various applications, e.g., agricultural monitoring, and vegetation change analysis. The synergetic use of these EO data is especially valuable, as both satellites acquire data in parallel but measure different properties of the Earth's surface. In comparison to Sentinel-2, Sentinel-1 acquires undisturbed images of the Earth's surface, regardless of atmospheric effects and sun illumination.

For this study, Sentinel-1A C-Band SAR (5.405 GHz—approx. 5 cm wavelength) dual-polarized (VV–vertical/vertical, and VH–vertical/horizontal) scenes were utilized, covering the time period between 2015 and 2020 with a spatial resolution of up to 10 m. Sentinel-1 has a revisit time of a few days (3 to 12 days), depending on the geographic location and acquisition of both Sentinel-1A and -1B. In South Africa, the image acquisition repetition is twelve days [37]. In the Free State Province, only Sentinel-1A data from the ascending orbit are available, collecting images at around 5:00 p.m., local time. The Sentinel-1 footprint is represented by relative orbit numbers 14 and 116, which were used jointly.

The Sentinel-2 constellation consists of two satellites, namely Sentinel-2A (start of acquisition: November 2015) and Sentinel-2B (start of acquisition: August 2017). Since Sentinel-2B became operational, the revisit time has been approximately five days. In this study, data acquired between 2016 and 2020 from the optical Sentinel-2 constellation were used. Each of the two satellites (Sentinel-2A and -2B) has 13 bands covering the spectrum from the visible to the Short-Wave-Infrared (SWIR) and a maximum spatial resolution of 10 m. The images are acquired at around 10:30 a.m. local time [38].

In this study, Sentinel-1 Single Look Complex (SLC) and Ground Range Detection (GRD) images were utilized. Whereas the SLC data comprise the phase information, which is used to derive the interferometric coherence, the GRD data contain only the amplitude and are already multilooked. In total, 313 Sentinel-1 GRD, 145 Sentinel-1 SLC (144 coherence pairs) as well as 503 Sentinel-2 images were utilized. For the coherence estimation, Sentinel-1 SLC scenes from the relative orbit 14 were used.

#### 2.2.2. Agricultural Statistics 2014–2018

Spatial information for the different crop types planted in the study area was provided by the Department of Agriculture, Land Reform and Rural Development (DALRRD) [39]. The dataset contains information of crop types planted between 2014 and 2018, where each dataset represents an individual crop year. A crop year is defined from June until May of the next year, and includes both winter (June–September) and summer (October–May) planting seasons.

In general, the Free State Province has the highest number of farming units, which are summed up to almost 8000 entities, which represent approximately 20% of the national total. Approximately 460,000 km<sup>2</sup> are used for agriculture of which 80% were used for grazing and the remaining 20% were used mainly for dryland crop production [40].

Between 2014 and 2018, the cultivated land in the study area was dominated by pasture vegetation (Figure 2). Maize, sunflower and soybeans can be identified as major crops during the same period, whereas the amount of wheat and sorghum is negligible. However, fallow land areas, which are certainly prone to a slangbos invasion, have shown a significant increase between 2014 and 2018, with the largest expansion between 2016/17 and 2017/18.

**Figure 2.** Area for different cover types of the cultivated land within the study area for the crop years between 2014 and 2018 (Source: [39]).

In 2014/15, croplands (approx. 380 km<sup>2</sup> (33%)) and pasture (approx. 730 km<sup>2</sup> (63%)) covered most of the cultivated land within the study area. At this time, the study region was characterized by 23 km2 (2%) fallow land. In 2015/16, the area covered by fallow land increased by 7% to an area of 110 km2, where both, pasture and cropland declined by equal percentages. Soybeans showed the largest decline during that time, which was followed by a slightly increasing trend toward 2018. In 2016/17, the statistics show that especially cropland turned into fallow land, which increased by approximately 7% to an area of 190 km2. Sunflower showed the largest decline during that period (60 km2 (5%)). In 2017/18, the class fallow showed the largest increase to 43%, which equals around 480 km2. At this time, the area, which was covered with pasture beforehand, decreased by about 280 km2 (23%), whereas the number of croplands had no significant loss. However, a large decline for maize was found in this year (50 km<sup>2</sup> (4%)), as these areas were changed to soybeans or sunflowers.

#### 2.2.3. Reference Data

The essential ground reference sites were exploited via field exploration, aerial photo documentation and local expert knowledge. In order to scale the amount of ground validation, the Google Earth high-resolution time series imagery [41] and the National Geospatial Information (NGI) very high resolution aerial photos [42] were used. The previously identified slangbos sites were used as a blueprint for creating a set of labeled fields in the area, utilizing manual mapping in cooperation with local partners at DALRRD [39]. The ground references consist of binary spatial polygons indicating the occurrence or absence of slangbos. An oversampling of the critical slangbos and grassland land cover was performed to draw the focus of the classification algorithm to this discrimination task. Finally, the labeled dataset totals up to roughly 14 km2, which makes up 1.2% of the agricultural area and 0.3% of the entire study area.

#### *2.3. Methods*

#### 2.3.1. Sentinel-1 Pre-Processing

The Sentinel-1 GRD data were pre-processed using *PyroSAR* [43], which is designed for large-scale SAR satellite data processing within a Python framework. It offers a complete solution for organizing and processing SAR data for different historical and current satellite missions, with additional functionalities, which are available after the pre-processing of the SAR satellite images (e.g., mosaicking and resampling images to common pixel boundaries suited for time series analysis). *PyroSAR* offers the possibility to utilize the open-source ESA's Sentinel Application Platform (SNAP) as well as the GAMMA Remote Sensing software for licensed users.

The Sentinel-1 GRD data were pre-processed using GAMMA (Software Version: July 2018, GAMMA Remote Sensing AG, 3073 Gümligen, Switzerland) [44]. (1) To convert from digital numbers (DNs), which are recorded by the sensor, to physical units, a radiometric calibration was applied for each dataset. (2) The orthorectification of the data was carried out, using the precise orbit state vectors (precise orbit ephemerides, POE) as well as height information from the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) [45] at 30 m spatial resolution. (3) The conversion of beta naught (β0) to gamma naught (*γ*0) values was achieved by a terrain flattening [46], utilizing the SRTM DEM.

The interferometric coherence (*γ*) represents a measure of the correlation between two complex SAR images and can be described as follows:

$$\gamma = \frac{E\left[\boldsymbol{\mu}\_1 \,\boldsymbol{\mu}\_2^\*\right]}{\sqrt{E\left[\left|\boldsymbol{\mu}\_1\right|^2\right]}\sqrt{E\left[\left|\boldsymbol{\mu}\_2\right|^2\right]}}\tag{1}$$

where *u*<sup>1</sup> and *u*<sup>2</sup> are two complex (SLC) acquisitions, |*u*1| and |*u*2| are their amplitudes, and *E*[*x*] is the expected value of the considered variable *x*. The coherence takes values between 0 and 1, where 0 represents no correlation and 1 represents a perfect correlation between the image phases [47,48]. Depending on the backscattering properties of the surfaces and their variation over time, the coherence is variable; therefore, it is useful for land cover classification [49,50], and biomass estimation [51,52], as well as for change detection (e.g., mapping deforestation) [53,54].

The Sentinel-1 SLC data were pre-processed, using SNAP. The coherences were only calculated for the co-polarized mode (VV) because of the stronger signal and thus higher signal-to-noise ratio. The coherence estimates were created for all adjacent date pairs with a temporal baseline of 12 days. However, in 2015 and 2016, eight interferometric coherence pairs (four in each year) were based on a 24-day temporal baseline, which was caused by the interruption in the acquisitions due to maintenance or other technical related issues at the early lifetime of the satellite.

(1) The first processing step was to apply the orbit state vector files to the SLC data, using the precise orbit state vectors (precise orbit ephemerides, POE). (2) As the SLC images are acquired in three sub-swaths (one image per swath and polarization) in TOPSAR mode (Terrain Observation with Progressive Scans SAR), the TOPSAR split function was applied to calculate the coherence based on each sub-swath. (3) In order to co-register the SLC swaths of an image pair, the back-geocoding function was accomplished utilizing the height information from the SRTM DEM [45] at 30 m spatial resolution. (4) The coherence between both SLC datasets was calculated using the interferogram module in SNAP. (5) TopSAR merge was utilized to merge the individual sub-swath before performing (6) the terrain correction, which also utilized the SRTM DEM dataset as the height information.

#### 2.3.2. Sentinel-2 Pre-Processing

Sentinel-2 L1C products consist of TOA (top of atmosphere) reflectances in cartographic geometry. The Sentinel-2 data are provided in tiles with a fixed coverage of about 10,000 km<sup>2</sup> (100 by 100 km) projected in UTM/WGS84 along a single orbit [38].

In order to obtain the bottom of atmosphere reflectance, the Sen2Cor processor [55] was utilized with additional parameter settings, such as aerosol optical thickness and water vapor, but without the correction for cirrus and terrain. Each band of the pre-processed Sentinel-2 L2A products has the identical spatial resolution as the original L1C input band, namely 10 m for bands 2, 3, 4, 8, 20 m for bands 5, 6, 7, 8a, 11, 12 and 60 m for bands 1, 9, 10. In addition, the Python application [56] of the FMask algorithm [57–59] was used to produce cloud masks with a resolution of 20 m from the Sentinel-2 L1C data.

To assess the sensitivity of the satellite image data concerning vegetation processes, the NDVI (Normalized Difference Vegetation Index) [60] and SAVI (Soil Adjusted Vegetation Index) [61] were retrieved from each Sentinel-2 L2A scene with a spatial resolution of 10 m:

$$\text{NDVI} = \frac{(N - R)}{(N + R)} \tag{2}$$

$$\text{SAVI} = \frac{(N-R)}{(N+R+L)}(1+L) \tag{3}$$

where *N* corresponds to the near-infrared (NIR) channel of Sentinel-2 (band 8), *R* corresponds to the red channel of Sentinel-2 (band 4) and *L* is a factor, which is used to minimize the influence of the soil brightness in the SAVI (here *L* = 0.5) [61].

Vegetation dynamics for a specific area are often analyzed using time series constructed from indices. Here, time series were constructed with the entire Sentinel-2 data archive from 2016 to 2020. To avoid spurious data due to atmospheric distortions, each index file was masked with the corresponding FMask product before stacking. In some regions, depending on the weather conditions in combination with the topography, each time series along a pixel stack can have substantial gaps lasting up to eight weeks. However, with the introduction of Sentinel-2B, gaps in areas with intensive cloud cover are considerably shorter, with at least one suitable acquisition each month.

#### 2.3.3. Combined Time Series Analysis

For enhancing the interpretability of the dense time series, we employed the smoothing algorithm, Friedman's "super smoother" [62], an adaptive, variable-span linear smoother. It provides a smoothing prediction due to the adaptive spans and utilization of the parameter of least residuals. The default setting provided in the R-implementation was found to be the most suitable for the interpretation of the time series.

#### 2.3.4. Predictive Modeling and Interpretation

In order to predict the spatial distribution of slangbos on cultivated land, a random forest model was fitted for each crop year and validated using spatial cross-validation (SpCV) [63]. Random forest models are widely used in remote sensing applications due to their robustness to overfitting and flexible approach to recognizing non-linear structures in feature spaces [64,65].

In this study, the predictor set consists of optical indices from Sentinel-2 (NDVI, SAVI), co-polarized (VV) and cross-polarized (VH) Sentinel-1 backscatter as well as coherences (VV) comprises 67, 154 and 119 features for each investigated crop year 2015/16, 2016/17 and 2017/18, respectively. The model was trained on a labeled set of 5000 randomly selected pixels, indicating slangbos and non-slangbos fields. Due to the anticipated spatial autocorrelation effects, the model was cross-validated with 100 iterations and 5 folds of k-nearest-neighbor spatially segmented test data sets in order to derive accuracte metrics of the model predictions presented.

The random forest models were assessed, using the overall accuracy (OA) throughout all model runs. The OA includes the model performance on both slangbos and nonslangbos sites and is, therefore, prone to an overoptimistic classification performance. Therefore, the averages of recall (how much slangbos were missed) and precision (how much slangbos were falsely assigned) were also estimated in order to assess the capacity of

the model to distinguish slangbos. An independent validation approach, utilizing in-situ data, which are not part of the machine learning methodology, was not applicable, as no such additional ground data were available. Hence, this study relies on the results of the internal spatial cross-validation of the random forest approach, which is a feasible approach and was utilized by other studies to reduce the overoptimistic model performance when using only random distributed points instead of spatially separated folds [66].

Since the hyperparameter settings for random forests are relatively stable [67–69], no separate tuning was set up. The number of trees *ntrees* was set to 300 (large and stable averaging possible) with *mtry* <sup>=</sup> √P, where P is the number of features in the model [70]. These parameters were optimized prior to model fitting, as they were found to be sensitive with respect to model accuracy and computation time. The so-called model tuning was carried out to identify the best parameter set for the SpCV and training [71,72]. Other parameter settings were left as the default.

The random forest feature importance determines how much each feature decreases the impurity weight when kept out of the model. It was proven to be a reliable means of assessing feature importance on the model predictions while accounting for all other predictors in the model [73]. It must be mentioned that correlated predictors take away their ranks as features akin to compensate for the loss of information. The model setup, the retrieval of feature importance and the SpCV are implemented in the R software package mlr<sup>3</sup> [74] with bindings to *sperrorrest* [63].

#### **3. Results**

#### *3.1. Combined Time Series Analysis of Sentinel-1 Backscatter and Coherence and Sentinel-2 NDVI and SAVI*

The input variables for the random forest approach classifying the spreading of slangbos in the study area were (1) the Sentinel-1 backscatter (S1 VV and S1 VH), (2) Sentinel-1 coherence (S1 VV), (3) Sentinel-2 NDVI and SAVI time series. This section highlights the temporal profiles of these variables for different land cover classes (slangbos, grassland, woodland and cultivated areas) (Figure 3) in order to investigate which parameter could be best suited to distinguish between the slangbos class and the other classes. In addition, precipitation information from the CHIRPS (Climate Hazards Group Infrared Precipitation with Stations) product [75], which represents rainfall estimates from rain gauge and satellite observations, was utilized to account for the characteristics of the dry and wet season to the time series signal.

Areas with slangbos encroachment are characterized by a clear increasing trend of the cross-polarized VH backscatter intensities. This is especially true between 2016 and 2017. Afterwards it seems to be stabilized, as it is also noticeable for the other classes cultivated and grassland. At the beginning of the time series, the severe drought of 2015/2016, which had a vast impact on the vegetation development during these years [76–78], is clearly visible. The sparse precipitation indicated in the CHIRPS product in 2016 can be seen as an additional indicator of the drought. InSAR coherences similarly show low amplitudes in the first two years of the time series. Yet, this trend fades in years that were not drought-affected. Therefore, the drought may have affected a stronger slangbos growth. The coherence only shows increase and stabilization, compared to the other culture types where the coherence decreases between 2015 and 2016 and then increases afterwards. The SAVI time series is characterized by low amplitudes mainly in ranges between 0.1 and 0.2, while NDVI ranges between 0.2 and 0.6 at high amplitudes. During the drought, the amplitude of either optical index was lower. In comparison, the NDVI has higher values as well as higher amplitudes during the entire time series when compared to the SAVI.

Grasslands exhibit lower cross-polarized VH backscatter values, compared to the slangbos areas. These circumstances are due to the higher proportion of volumetric scattering effects in the slangbos shrubs compared to the vertically oriented grasses. However, the impact of the drought is also visible at the beginning of the time series. Thus, dry grasslands or grasslands fallen barren due to water scarcity result in reduced backscatter. The amplitude of the coherence is similar during the entire period when compared to

slangbos but shows larger dynamics when related to the other classes. The SAVI indicates slightly higher temporal dynamics in comparison to areas infected by slangbos growth during the entire time series.

**Figure 3.** Temporal dynamics of the Sentinel-1 VH backscatter, Sentinel-1 VV coherence, Sentinel-2 NDVI and SAVI time series. In addition, precipitation information from the CHIRPS (Climate Hazards group Infrared Precipitation with Stations) product [75].

The woodland areas reveal higher backscatter values, compared to the other classes, which is a result of the volumetric scattering mechanism in the crones of upper tree canopies. The amplitude of the cross-polarized backscatter is low, which reflects almost no seasonality in the woodland areas. The temporal profile of the coherence is quite low (decorrelation) with also low seasonality.

The cultivated areas show the largest Sentinel-1 VH backscatter dynamics, which is caused by the harvest cycles. This is also true for the NDVI signal. The SAVI is somewhat comparable to the grassland class, with lower seasonal dynamics. The impact of the severe drought in 2015/2016 is visible between 2016 and 2017, which could be identified as repercussions of the drought to the vegetation growth.

The comparison of the temporal signature of the four different parameters for the classes slangbos, grassland, woodland and cultivated areas, have indicated that the Sentinel-2 derived SAVI index, as well as the Sentinel-1 VH backscatter, shows the highest potential in separating between the class slangbos from the others. Figure 4 shows the comparison between the NDVI and the SAVI as well as Sentinel-1 VH backscatter and VV coherence for the four classes. While the NDVI confirms no separability between slangbos, grasslands and cultivated areas, the SAVI provides a more potent means of discriminating these classes. In addition, the Sentinel-1 VH backscatter time series for slangbos delineate from the other classes when compared to the Sentinel-1 VV coherence.

**Figure 4.** Separability of the class slangbos from other land cover classes for the variables Sentinel-1 VH backscatter, Sentinel-1 VV coherence, Sentinel-2 NDVI and SAVI time series.

#### *3.2. Classifying Slangbos using Random Forest*

#### 3.2.1. Variable Importance

The variable importance was calculated for the crop years between 2015 and 2018 (Figure 5). The results show that the Sentinel-2 derived SAVI has the highest importance for the classification algorithm (refer to Section 3.2, Figure 4). This is especially true for the crop years 2015 to 2016. However, this year needs to be analyzed with caution, as the Sentinel-2 data are only available from early 2016, and thus are not available for the entire crop year. During the same crop year, the NDVI, as well as the cross-polarized Sentinel-1 VH data, have important contributions to the classification result. The Sentinel-1 VV coherence results in the lowest variable importance for all observed crop years. During 2016 and 2018, the variable Sentinel-1 VH had less significance for the classification algorithm when compared to the Sentinel-2 derived NDVI. This importance must be handled with care, as a higher abundance of a feature subset, e.g., the denser time series of Sentinel-1 compared to cloud-free Sentinel-2 acquisitions result in a relatively lower overall importance of that feature subset. However, the optical indices and cross-polarized SAR still dominate the bulk of the model performance. The inter-comparison between the years is further dominated by the number of predictors used for modeling, e.g., the crop year 2017/2018 shows the lowest importance, as the highest number of predictors was used.

#### 3.2.2. Spatial Cross-Validation (SpCV)

The spatial cross-validation was carried out for each of the crop years between 2015 and 2018. Figure 6 shows the accuracies of the binary classification of the categories slangbos as well as non-slangbos. The overall accuracy exceeded 90% for the entire period. The precision measure indicates to what extent the slangbos was correctly classified (around 80%). The recall measure is even higher, which is an indicator of how much slangbos was missed during classification (13% to 16%). Slightly lower precisions were found for the years 2016/17 and 2017/18.

**Figure 5.** Variable importance for the input parameters Sentinel-1 VV coherence (CO), cross-polarized Sentinel-1 backscatter (VH) as well as Sentinel-2 derived NDVI and SAVI.

**Figure 6.** Classification accuracy metrics derived from spatial cross-validation for each of the crop years between 2015 and 2018.

3.2.3. Slangbos Probability Measures

The probability measure is an output of the random forest algorithm and describes the potential assignment of the individual pixel to the class slangbos between 2017 and 2018 [79] (Figure 7, left).

The figure gives a detailed insight into the different characteristics of agricultural fields being potentially infected by slangbos growth. The dark field in the center of the image is surrounded by fields with high probabilities for slangbos encroachment, which is also visible as brownish areas in the optical Sentinel-2 image (Figure 7, right). In combination with the Sentinel-2 images, it can be illustrated how the probability index reflects the heterogeneity within each of the fields. This information might be utilized to identify areas that need to be prioritized for slangbos-clearing investigations within a slangbos encroachment monitoring system.

**Figure 7.** An example of the probability measure mapping slangbos encroachment in the project area. The comparison with an optical image illustrates how the probability index reflects the heterogeneity within fields. Left: probability measure for the assignment of individual pixels to the class slangbos between 2017 and 2018. Right: Sentinel-2 Short-Wave Infrared (SWIR) image from 30.08.2017 (RGB = bands 12-8A-4). (Field boundaries source: [39]; contains modified Copernicus Sentinel data [2017–2018]).

#### *3.3. Mapping*

#### 3.3.1. Regional Scale Analysis

Figure 8 shows the distribution of the areas which were identified to be infected by slangbos encroachment for the entire study area between 2015 and 2020. The presented areas were not used for model training. Areas in red are dominant and relatively homogeneously distributed in the entire study area. These areas indicate that slangbos were prevalent during the beginning of the time series in 2015/2016. Blue areas are spread between the southern and the eastern part of the study area. The patches indicate slangbos and shrub cover during the second half of the time series, after 2017. These areas are likely to be those that were infected by the growth of slangbos or shrub encroachment during that time period. Few areas are classified as green, which indicate slangbos only found in 2017, which are often adjacent to orange areas, indicating slangbos or shrub cover also in 2015. These regions might be areas in which slangbos and shrub cover were cleared either by hand or due to fires.

Figure 9 highlights four areas, showing the classification result within the rangeland areas to describe the spatial pattern in more detail. Area 1 is a heavily encroached prone site. The local partners from LandCare confirmed this site to be cleared during our observation period. In particular, the red areas in the rangelands were cleared between 2015 and 2019. On the other hand, regrowth in recent (since 2017) years is found in other areas, which are shown in cyan. In the center, some white areas are visible, indicating slangbos infection during the entire period. In Area 2, a fire occurred in September 2017, which resulted in a massive loss of shrub cover (green areas) [80]. On the other side, shrub encroachment on native rangelands is found on the eastern part of the plateau (red areas). Area 3 is a large managed site, where also shrub clearing took place during the second half of the observed period. Area 4 is characterized by intensive shrub encroachment in the rangeland sites since 2017/2018. The classification algorithm was able to even detect these small areas.

**Figure 8.** Spatial distribution of the regions affected by slangbos encroachment for the entire study area between 2015 and 2020. This map represents each crop year in a specific layer of the RGB composite (R: 2015/16, G: 2017/18, B: 2019/20). The numbered boxes show the zoom-in for a detailed analysis within the next sections. (Roads: [35], Map data source: Esri, DigitalGlobe, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community).

**Figure 9.** Subset of the boxes 1 to 4 (Figure 8) showing slangbos infected areas in the rangelands (R: 2015/16, G: 2017/18, B: 2019/20). (Map data source: Esri, DigitalGlobe, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community).

3.3.2. Field Boundary Scale Analysis

Figure 10 highlights four areas within the cropland boundaries, which were provided by the DALRRD [39]. The region inside the grey dashed line in area 5 is identical to the subset shown in Figure 7, which was utilized to show the classification results of the model. This region shows heavily encroached pastures, where some of them were found to be fallow. Large areas are shown in white, indicating that slangbos were growing on the rangelands during the entire time period. Area 6 illustrates areas that were rapidly encroached by slangbos in 2017/2018 (turquoise and blue). Green areas are likely caused due to misclassifications, as it would mean that slangbos only occurred in 2017/18 and suddenly disappeared. However, man-made clearing management or fires might also be causing this spatial pattern. Area 7 represents a vast shrub control side, where large areas were infected by slangbos during the entire time (white areas); some areas show the effect of clearing actions after 2015/16 (red areas). Area 8 shows a diverse spatial of different slangbos encroachment dynamics. Some areas indicate slangbos infections during the entire period. Large areas show slangbos clearing activities, which are shown in red (cleared in 2015/16) and yellow (cleared after 2017/18). Small areas indicate slangbos infections during the later stage of the time period (turquoise), which might be fields, which were cleared a few years before and when already affected by slangbos encroachment.

**Figure 10.** Subset of the boxes 5 to 8 (Figure 8) showing slangbos infected areas in the field boundaries by crop years (R: 2015/16, G: 2017/18, B: 2019/20). The grey dashed line in subset 5 represents the region shown in Figure 7. (Map data source: Esri, DigitalGlobe, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community).

The crop statistics were used to identify which crop type classes are affected the most by the encroachment of slangbos. Figure 11 indicates the relative area covered by slangbos on cropland, fallow land and pastures between 2015 and 2018. It needs to be mentioned that only time period for which crop statistics were available is covered. Thus, recent years were not accessible through the DALRRD, as these data were still classified for internal use only at the time of writing.

**Figure 11.** Relative area (in %) statistics on slangbos encroachment of different agricultural land types (cropland, fallow land and pastures).

The statistics on slangbos encroachment of different agricultural land types clearly demonstrate that pasture areas are highly affected by the encroachment of slangbos. Fallow areas also show some sensitivity to infection by the growth of slangbos. Croplands reveal almost no sign of being affected by the encroachment of slangbos, which is attributed to intensive management (i.e., plowing) throughout harvest cycles. These findings are an indication for the reliabilty of the classification approach, as we did not expect slangbos encroachment on croplands, due to the intensive management.

#### **4. Discussion**

This study focused on the classification of slangbos encroachment on agricultural land, using ESAs Copernicus Sentinel-1 and Sentinel-2 time series between 2015 and 2020 for a test area in the Free State Province, South Africa. The classification accuracies of over 80% indicate a solid approach for slangbos encroachment mapping, using optical and radar time series information from ESAs Sentinels data. Moreover, the random forest classifier was used as a framework for the multi-temporal and multi-sensor classification.

The paper investigated slangbos encroachment on a comparatively small study area of around 4500 km2, where slangbos is known to be the main encroacher. The agricultural statistics revealed that fallow areas increased dramatically after 2015 and doubled until the end of 2018. In this year, fallow areas covered more than 50%, with the remaining areas being almost evenly distributed amongst croplands and pastures. This abandonment of agricultural land is a known phenomenon of the land-use change in South Africa, which is driven by climate as well as socio-economic factors [81]. The statistics on slangbos encroachment of different agricultural land types revealed that this process is more recent in abandoned crop areas as compared to regions near settlements [82].

Upcoming research activities might focus on larger areas, where other bush communities or shrub types represent the main invader. As an example, it is worth mentioning the black wattle (*Acacia mearnsii*), which also imposes great pressure on grassland communities in South Africa [83,84]. Hence, future studies could build upon this methodology to aim for the discrimination between different shrub communities in regions, where a mixture of different encroachers is present. If the goal is to classify different shrub communities, the presented binary approach has limitations in the separation of different encroachers. This might need additional data (e.g., hyperspectral [85]) for the training of more suitable machine-learning models.

However, classifying different shrub communities is likely to introduce other issues, such as spectral and scattering uncertainties, which can be attributed to the canopy spacing

of the individual plants that need to be considered in future studies. Depending on the spatial resolution of the data, the influence of grass and soil between shrubs need to be analyzed in more detail, as it has major impacts on the reflectance and backscatter. Further research will increase the knowledge of those scattering mechanisms of slangbos and other bush communities in contrast to grassland. Regarding the classification of slangbos, it is likely that the Sentinel-2 derived SAVI as well as the Sentinel-1 VH backscatter are of higher potential than the NDVI and Sentinel-1 VV coherence. In the case of the SAVI, it might be attributed to the fact that slangbos patches grow more sparsely, and thus benefit from the bare soil consideration in the SAVI estimation, as the influence of the soil brightness is reduced in comparison to the NDVI [86]. As droughts might favor bush encroachment [87,88], it is of high importance to integrate information on precipitation dynamics in future studies. Here, we utilized coarse resolution CHIRPS data for analysing the yearly precipitation dynamics. Since rainfall events might occur locally, information from climate stations or data products with higher spatial resolution could be of great value for upcoming investigations.

To make the model performance more flexible and effective as well as applicable for investigations in other or larger regions, the training input should be limited to only important model features. In this study, more than 500 features were used as model training input, with around 100 for each individual crop year. These might be not applicable when transferring the methodology to other larger regions. To enhance the model performance and based on the known variable importance, Sentinel-1 VH backscatter, Sentinel-2 SAVI and NDVI time series might solely be utilized as input variables. Methods for interpretable machine learning and feature selection are desirable for future research, making the models comprehensible and transparent. In addition, the use of statistical metrics from hypertemporal EO data might be a feasible solution to analyze time series in a cost-effective and descriptive way. Simple descriptive statistics, such as median, standard deviation or quantiles in conjunction with regression functions or temporal filters might improve the model performance as well as classification results.

The availability of agricultural statistics for the entire observation period is crucial for the identification of classes that were infected by slangbos encroachment. In this study, the statistics were available between 2014 and 2018 only. Hence, statistics for two crop years (2019 and 2020) were missing.

Future investigation might utilize different measures to assess the accuracy. In this study, we measured the accuracies based on a binary classification (slangbos and nonslangbos), which integrates issues related to the sample size of both classes. The overall accuracy might be not reliably interpretable in this case, whereas recall and precision are more valuable measures. The overall accuracy is prone to the imbalance of the sample size between the slangbos and non-slangbos classes [89]. The recall marks the rate of "missed" slangbos pixels, while the precision is the rate of falsely detected slangbos areas. Both numbers can be low, as they are only based on true positive, false negative and false positive values. The true negative sample size is extremely large, as is common in most remote sensing classifications since they include all classes, e.g., water, urban, and grassland, pixels.Upcoming investigations should emphasize the potential of mapping slangbos encroachment, using cloud-based solutions (e.g., Google Earth Engine) to minimize the data processing times for the users. This might result in a bush encroachment monitoring system, where products are directly accessible by the users without any data interaction. Such a monitoring system is likely to result in an early warning system using near-real-time classification approaches, advances in the potential of current methodologies identifying infected areas, as well as helping to plan optimal clearing strategies supporting sustainable land management strategies.

#### **5. Conclusions**

The objective of this paper was to monitor slangbos encroachment between 2015 and 2020 in a test region in the Free State Province, South Africa. The slangbos classification was

carried out utilizing a synergetic combination of Sentinel-1 and Sentinel-2 time series within a machine-learning framework, applying a random forest classifier. Field inventory and high-resolution image analyses as well as spatial crop statistics and slangbos verified areas, which were provided by the DALRRD, were used for training and spatial cross-validation.

The time series analysis of the Sentinel-1 and Sentinel-2 data has shown that the Sentinel-1 VH (cross-polarization) and the Sentinel-2 SAVI (Soil Adjusted Vegetation Index) carry the highest separability between the shrub and other land cover classes. Moreover, random forest permutation-based feature importance showed that these parameters provide the largest contribution to the classifiers when accounting for the other variables in the model. The spatial interpretation revealed that the slangbos infected areas are well captured for the different crop years between 2015 and 2020. Pastures are particularly prone to slangbos encroachment, whereas cultivated areas are less affected. Even small patches of slangbos growth and the resulting heterogeneity on different fields could be identified with the used Sentinel-1 and Sentinel-2 data. This knowledge is an essential information source for a slangbos-encroachment-monitoring system to identify areas that need to be prioritized for slangbos-clearing investigations. The estimation of the classification accuracy was performed via spatial-cross validation and resulted in an overall accuracy of around 90% for each crop year, with a positive predictive value (precision) of around 80%. These accuracies indicate large potential for the transferability to other regions for monitoring shrub encroachment.

The study has shown the potential of using high-resolution optical and radar Earth observation time series to classify slangbos encroachment on pastures in the Free State Province of South Africa. Future studies might utilize these findings in other regions of shrub and bush encroachment.

**Author Contributions:** Conceptualization, M.U., K.S., T.M., C.D., and C.S.; methodology, M.U., K.S., and C.D.; data analysis and interpretation, all authors; validation, M.U., K.S., and T.M.; writing original draft preparation, M.U.; writing—review and editing, all authors; visualization, M.U., and K.S.; project administration, J.B., U.G., and C.S.; funding acquisition, J.B., U.G., and C.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors received funding from the German Federal Ministry of Education and Research (BMBF) in the framework of the Science Partnerships for the Assessment of Complex Earth System Processes (SPACES2) under the grant 01LL1701 (South African Land Degradation Monitor (SALDi)).

**Acknowledgments:** The authors would like to thank the Department of Agriculture, Land Reform and Rural Development (DALRRD) for providing spatial information for the different crop types between 2014 and 2018 for this study. We acknowledge the support of Makzine Ranthimo, who unfortunately passed away in May 2021. She was instrumental as a very knowledgeable and warmhearted contact person for our field studies in the Ladybrand region.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Spatial Heterogeneity of Vegetation Response to Mining Activities in Resource Regions of Northwestern China**

### **Hanting Li 1, Miaomiao Xie 1,2,\*, Huihui Wang 1, Shaoling Li <sup>1</sup> and Meng Xu <sup>1</sup>**


Received: 4 September 2020; Accepted: 2 October 2020; Published: 6 October 2020

**Abstract:** Aggregated mining development has direct and indirect impacts on vegetation changes. This impact shows spatial differences due to the complex influence of multiple mines, which is a common issue in resource regions. To estimate the spatial heterogeneity of vegetation response to mining activities, we coupled vegetation changes and mining development through a geographically weighted regression (GWR) model for three cumulative periods between 1999 and 2018 in integrated resource regions of northwestern China. Vegetation changes were monitored by Sen's slope and the Mann–Kendall test according to a total of 72 Landsat images. Spatial distribution of mining development was quantified, due to four land-use maps in 2000, 2005, 2010, and 2017. The results showed that 80% of vegetation in the study area experienced different degrees of degradation, more serious in the overlapping areas of multiple mines and mining areas. The scope of influence for single mines on vegetation shrunk by about 48%, and the mean coefficients increased by 20%, closer to mining areas. The scope of influence for multiple mines on vegetation gradually expanded to 86% from the outer edge to the inner overlapping areas of mining areas, where the mean coefficients increased by 92%. The correlation between elevation and vegetation changes varied according to the average elevation of the total mining areas. Ultimately, the available ecological remediation should be systematically considered for local conditions and mining consequences.

**Keywords:** spatial heterogeneity; vegetation trends; mining development; geographically weighted regression (GWR); Sen's slope; Mann-Kendall; arid and semi-arid areas

### **1. Introduction**

Vegetation dominating terrestrial ecosystems connects the material circulation and energy flow of the biosphere [1] and plays a critical role in supporting ecosystem services and functions [2,3]. Vegetation changes, thus, have increasingly become an inevitable indicator in global climate changes and regional eco-environmental assessment [4,5]. Changes in natural conditions and strong human activities involve ecological elements and ecological processes, and alter the regional environment [6]. As intensive human activities, mining activities have an impact on 11 out of the 17 United Nations Sustainable Development Goals (SDGs) [7], and are a constraint for achieving sustainable development [8]. Mining activities, especially extractive ones, directly destroy vegetation and indirectly lead to environmental problems, including air and water pollution [9], heavy-metal pollution [10], groundwater loss [11], soil erosion and degradation [12]. These problems profoundly change the environment of vegetation growth, and, in turn, disproportionately damage broader range of vegetation coverages and show spatial differences on vegetation changes. The vegetation changes representing local ecosystem health are severely disturbed by mining activities [13]. Research on the effect of mining activities on vegetation is essential for further ecological construction and achieving the SDGs.

Analyzing the mechanism of mining activities on vegetation growth in mining areas provides significant insight for constructing ecological coal mining [14]. Researchers have made many findings through field surveys and experiments focusing on soil parameters [15], microorganisms [16], root environments [17], toxicological effects [18,19], colony symbiosis and photosynthesis [20], heavy-metal pollution and enrichment in vegetation [21,22], the extinction of major dominant species [23], and biodiversity loss [24]. Related studies have revealed that mining approaches impacting vegetation growth are diversiform on a local scale and more complicated on a regional scale [25]. However, mining development in resource regions is not a single sporadic mine pit, but a complex and systematic industrial chain [26]. This chain involves a wholly integrated process and establishes diversified industrial branches from mining excavation, transportation, preprocessing, and deep processing, to material consumption and utilization [26,27]. The successive impacts are constantly accumulated by the aggregation of one or more activities on receptors [28,29]. The difference in the spatial accumulation degree over time and space causes different responses of various vegetation types to mining development on a regional scale, resulting in significant spatial heterogeneity. Understanding how mining impacts accumulate, and change over time is the key issue for assessing and monitoring vegetation response to mining activities.

The regional ecological impact of mining development could be revealed through large-scale observation [30]. Recent achievements include that coal mining is an important driving factor resulting in serious regional vegetation degradation, especially in China's Mongolia Plateau and alpine areas [29–31]. Vegetation disturbance caused by mining is evident on a large scale [30], and much more significant in arid and semiarid areas [25,31]. The combined effect of climate conditions and ecological restoration activities also make vegetation changes more volatile and show vast spatial differences [30,32]. In relation to the regional scale, the relationship between mining development and vegetation changes during the aggregation progress of mining development and the typical region where mining activities influence more significantly, are still not well-understood. Establishing a mathematical coupling model between vegetation trends and human activities is essential in a complex system under the coupling of natural conditions and human activities [33].

Spatial analysis provides an advantage in understanding the variation in the impact of mining on vegetation [34]. Traditional multivariate statistical analysis and simple spatial analysis, such as ordinary least squares (OLS) models, usually assume that spatial relationships between variables are stable in the entire study area and reflect any variation of spatial characteristics with difficulty [35]. Geographically weighted regression (GWR) constructs local regression equations from any given geographic location to represent accurate quantitative characteristics of spatial relationships, thereby avoiding the problems of spatial non-stationarity, heterogeneity, and autocorrelation [36]. Computed correlation coefficients in the GWR model quantitatively express the spatial relationships at each location. Geographically weighted regression models are widely used in urban landscape pattern analysis [35,37,38], PM2.5 concentration estimation [39], carbon emissions [40], and ecosystem services [41,42]. Sawut et al. [43] also estimated the heavy metal arsenic (As) contents of an open-pit coal mine in soil on the basis of GWR.

Arid environments occupy more than 47% of Earth's landmass with constant expansion throughout the world [44]. Exploitation of mineral resources has had extensive environmental and social consequences [45]. China is the leading country in energy production and consumption [46]. More than 70% of coal reserves are distributed in arid, semiarid, and fragile ecological regions, with high-strength exploitation activities [23]. Analyzing the relationship of vegetation and mining development provides practical guidance and reference for the development of mineral resources and ecological construction in the Belt and Road Initiatives.

As a representative resource-based city of China, Wuhai is not only a city that has maintained coal exploitation for decades, but also an important ecological function zone in Inner Mongolia. In the context of simultaneous ecological destruction and construction, setting Wuhai and its surroundings as the research area was of great significance to regional ecological security and harmonious development. The purpose of this article is to determine the spatial variability of mining impact on vegetation changes. There were two detailed objectives: (1) To identify the mining development pattern and associated vegetation dynamics in different periods, and (2) to explore the spatial variability of vegetation response to mining development.

#### **2. Study Area and Data Sources**

#### *2.1. Study Area*

The study area (106.36◦E–107.05◦E, 39.15◦N–39.52◦N) mainly comprised the whole city of Wuhai, and parts of Alxa League and Ordos according to the planning (2010–2030) of Wuhai and its surrounding areas. The whole study area is located in the middle of Inner Mongolia with six districts (Figure 1a) and surrounded by three deserts—the Uulan Buh, the Kubuqi, and the Maowusu [47,48]. The north–south-oriented Yellow river runs through the whole city and forms irrigation districts of about 175 km2 with a narrow river beach wetland and an agricultural oasis [47]. Topographically, the study area is low-lying in the northwest, and high-lying in the middle and east (Figure 1b). The study area belongs to the middle-latitude temperate continental climate zone, a region affected by the East Asian monsoon belt [49]. Annual precipitation is 160 mm, and annual evaporation is 20 times that of rainfall [47]. The main vegetation types in the study area are grassland and shrubland. The combination of the Yellow river and the complex natural environment gives the entire region a unique desertification ecosystem, including national wetland parks and an extremely precious plant, *Tetraena mongolica* [48].

**Figure 1.** (**a**–**b**) Location, administrative divisions, land-use and land-cover map (a), and topography (b) of study area. Land-use and land-cover map was monitored at 2017, provided by the Institute of Geographic Sciences and Natural Resources Research in China. Note: districts of -1 , Uulan Buh; -2 , Mengxi; -3 , Wuda; -4 , Haibowan; -5 , Etuoke Banner; and -6 , Hainan.

Wuhai is a typical resource-based city where mining development comprises many mining activities, including industrial base, surface mining, and waste dumping, and they play a dominant role in social–economic development [49]. Industrial base mainly consists of coal washing, coal storage, and primary and deep processing sites, while surface mining and waste dumping are the main activity sites for mineral mining and disposal [48]. There are three industrial bases distributed across the study area: the Wuda industrial base in the northwest, the Hainan industrial base in the midland, and the Mengxi industrial base in the Mengxi district. Mining areas are attached to the Shendong coalfield in Inner Mongolia and adjacent to the Ningdong Energy and Industrial Base, one of China's largest coal bases [23]. As the development progresses, decline transformation, and deepening of social economic reforms, mining, coal, and chemical industries were introduced in this region over the course of 30 years by enterprises with severe pollution and an extensive development model from the developed eastern part of China [49]. Under the pressure of the inherent irreconcilable conflict between social-economic development and ecological protection, it is more and more urgent to recognize the internal relationship between ecological degradation and mining development

#### *2.2. Data Sources*

#### 2.2.1. Landsat Data and Mining Maps

Normalized difference vegetation index (NDVI) values of all clear-sky Landsat images during the growing seasons from April to October of 1999–2018 were obtained to composite interannual maximal sequence to detect vegetation variation trends. Growing seasons included the vegetative and reproductive phases of vegetation growth [50]; the maximal value of NDVI in the arid and semiarid areas represented the best state of vegetation in a year. All Landsat data were obtained from the United States Geological Survey (Table 1). Land-use maps were used to present the spatial distribution of mining activities and calculate the distance from vegetation areas to mining areas. The maps were extracted from land-use maps monitoring at 2000, 2005, 2010, 2017, respectively, from the Institute of Geographic Sciences and Natural Resources Research. All maps were accurately interpreted on the corresponding historical Google Earth images. Topography data at 30 m spatial resolution from the digital elevation model in ASTER GDEM 2 (http://www.gscloud.cn/) were used to reveal the relationships between vegetation dynamics and terrain features. All data were converted into a common coordinate system (WGS1984, UTM Zone 49N), and raster data were resampled into 1000 × 1000 m.


**Table 1.** Sources of remote sensing data.

#### 2.2.2. Boundary Data in Vector Format and Climate Dataset

The boundary of the research area was set according to the coal industry planning (2010–2030) of Wuhai and its surrounding areas, which was made by the government of the Inner Mongolia Autonomous Region. Basic geographic information was provided by the National Geomatics Center of China (http://218.244.250.94:9003/English/html/1/), including a set of regional boundaries, major roads, and river basins. The boundary of the conservation zone in the study area was drawn on the basis of the Western Ordos national nature reserve [51]. The observed annual precipitation and average temperature datasets were downloaded by the National Meteorological Information Center of China

(http://data.cma.cn/en), to describe the impact of climate conditions on vegetation changes. This dataset, comprising monthly observations, was obtained from 5 meteorological reference stations around the research area in 1999–2018.

#### **3. Methodology**

The purpose of the article was to analyze the relationship between vegetation changes and mining development on the basis of remote sensing data and the GWR model. Vegetation changes were described by interannual NDVI trends (1999–2018), and the spatial distribution of mining activities were obtained via four land-use maps (2000, 2005, 2010, 2017, respectively). Considering the intensity of the potential influence of mining activities relying on distance [21,52], all data were divided into 1 km units to calculate the distance from vegetation units to mining units on the basis of Euclidean distance in ArcGIS 10.2. Minimal distance emphasizing the ecological impact of a single mine and summary distance emphasizing regional mining impact on vegetation were differently analyzed. Minimal distance was the shortest one of distances of central point between a vegetation unit and mining units. The summary distance was the sum of distances of central point between a vegetation unit and mining units. Topography was a limiting factor affecting vegetation changes in geographical conditions, such as water and radiation balance. Elevation was regarded as an important factor in the analysis of vegetation response to mining activities.

The methodology framework was divided into three steps (Figure 2). The first step was to identify vegetation dynamics. Vegetation changes were divided into three stages, 1999–2005, 1999–2010, and 1999–2018, to correspond to the cumulative effect of mining development in three periods, where the starting year was set to 1999 to ensure the initial stability of the NDVI sequence. The second was to present the spatial distribution of mining development, and calculate the minimal distance and summary distance from vegetation areas to mining areas in units in different stages. The third was to quantify the spatial relationships between two kinds of distances, elevation, the combination of distance and elevation, and vegetation changes in the GWR model. Removing the improvement areas of vegetation in the 1999–2018 period was to highlight the cumulative effects of mining development. Detailed descriptions are provided in the following sections.

**Figure 2.** Framework of data processing flow.

#### *3.1. Trend Analysis of Vegetation Changes*

NDVI is an indispensable indicator for mapping green biomass to describe vegetation dynamics because they are closely related with biophysical and biochemical variables [53]. The calculation method was detailed in a study by Maneja et al. [54,55]. All NDVI series were synthesized according to the maximal value of the growing season in a year to eliminate interference caused by vegetation changes, clouds, and the atmosphere. Sen's slope is calculated by the median of the linear rate of change between any two points in the sequence, which accurately expresses the trend and relatively reduces noise interference [56,57]. The Mann–Kendall trend test is a quick and effective method for detecting significance level with the advantage of not requiring time distributions and being insensitive to outliers [58]. Sen's slope estimator was used to first detect the direction and magnitude of vegetation changes, and the Mann–Kendall trend test was then applied to quantify the significance level. Therefore, vegetation trends could be estimated by the combination of Sen's slope and the Mann–Kendall trend test in light of the NDVI series.

Sen's slope equation is shown in Equation (1) [57]:

$$\text{Absope} = \text{Median}\left[\left(\text{NDVI}\_{\text{j}} - \text{NDVI}\_{\text{i}}\right) / (j - i)\right] / \forall j > i \tag{1}$$

The Mann–Kendall test is shown by test statistic *S* in Equation (2) [59,60]:

$$S = \sum\_{i=1}^{n=1} \sum\_{j=i+1}^{n} \text{sign}(\text{NDVI}\_j - \text{NDVI}\_i) \tag{2}$$

where signal *sign NDVIj* − *NDVIi* is;

$$\text{sign}\{\text{NDVI}\_{\bar{j}} - \text{NDVI}\_{\bar{i}}\} \quad = \begin{cases} 1 & \text{(NDVI}\_{\bar{j}} - \text{NDVI}\_{\bar{i}} > 0) \\ 0 & \text{(NDVI}\_{\bar{j}} - \text{NDVI}\_{\bar{i}} = 0) \\ -1 & \text{(NDVI}\_{\bar{j}} - \text{NDVI}\_{\bar{i}} < 0) \end{cases} \tag{3}$$

The test statistic *Z* is defined as;

$$Z = \begin{cases} \left(S - 1\right) / \sqrt{V(S)} & S > 0\\ 0 & S = 0\\ \left(S + 1\right) / \sqrt{V(S)} & S < 0 \end{cases} \tag{4}$$

where variance *V*(*S*) is;

$$V(S) = n(n-1)(2n+5)/18\tag{5}$$

where θslope is the annual variation rate of the NDVI trend on a pixel scale, and *NDVIi* and *NDVIj* represent the maximal NDVI values of monitoring years *j* and *i*, respectively; *V*(*S*) is the variance. A positive value of θslope indicates an upward trend for vegetation, and a negative value means a downward trend. Moreover, the appropriate statistical test in the process of inferring significance is determined through the n values of the time-series lengths; when n < 10, the bilateral trend test was used to directly show a slight upward or downward trend by test statistic *S*. When n >= 10, test statistic *S* obeyed standardized normal distribution. Given confidence level α = 0.05, whether the trend changed significantly depended on |*Z*| ≥ 1.96. Four kinds of classification were obtained through trend and significance analysis: θslope ≥ 0&|*Z*| ≥ 1.96 denoted significant improvement, and θslope ≥ 0&|*Z*| ≤ 1.96 indicated slight improvement, whereas θslope ≤ 0&|*Z*| ≥ 1.96 denoted significant degradation, and θslope ≤ 0&|*Z*| ≤ 1.96 meant slight degradation.

#### *3.2. Relationship between Vegetation Changes and Mining Development in GWR Model*

The GWR model was explored to examine the relationship between mining development and vegetation changes, and their spatial variability. OLS is a global regression model, and parameter estimates are consistent throughout the study area. The GWR model makes important improvements in solving non-stationary spatial relationships and cross-space spatial autocorrelation by estimating local parameter characteristics and geographic map variability in the association between results and predictors [34,61]. This regional exploratory analysis technique can measure a set of local parameters that could be mapped, estimated, and analyzed in each unit to provide new insights about window movement and the global correlation of variables in a single modeling frame [62]. The GWR model is expressed in Equation (6) [34],

$$\log y = \beta\_0(\mu\_j, \nu\_j) + \sum\_{i=1}^k \beta\_i(\mu\_j, \nu\_j) \chi\_{ij} + \varepsilon\_j \tag{6}$$

where μ*<sup>j</sup>* and υ*<sup>j</sup>* denotes the spatial coordinates of sample points *j* and *i*; β<sup>0</sup> μ*j*, υ*<sup>j</sup>* indicates the intercept of location *j*; β*<sup>i</sup>* μ*j*, υ*<sup>j</sup>* denotes the local estimated coefficient of independent variable χ*i j*; and ε*<sup>j</sup>* is the error term.

Local parameter estimation was conducted through a spatial weight matrix by a distance decay weighted function in GWR modeling. The function was spatially modified by kernel function bandwidth. Kernel function bandwidth determines the scope of spatial dependence, which means the total numbers of neighborhood points. The Akaike information criterion (AIC) determined the optimal bandwidth. More details about GWR were shown in Alahmadi et al. [62]. GWR analysis was performed in the GWR tools of ArcGIS 10.2. All data were normalized by a standardized min–max method before regressions.

The multicollinearity of the explanatory variables was excluded by the variance inflation factor (VIF) value of the running OLS model [63]; all values were less than 7.5, which indicated that slight or no collinearity existed in the explanatory variables. The performance of the GWR and OLS models was compared using the values of AIC and R2; these two values were used to determine the predictive capacity of the model. The higher the R<sup>2</sup> was, the more reliable the independent variable's explanation of the dependent variable. AIC estimated the accuracy of the estimated value, and lower values could better describe the observed data.

#### **4. Results**

Vegetation has been considerably degraded as the mining development rapidly expanded according to Sen's slope and the Mann–Kendall test. Spatial correlation of GWR expressed significant spatial differences between minimal distance, summary distance, elevation, and vegetation changes.

#### *4.1. Temporal Trends and Spatial Distribution in NDVI*

The tendency of vegetation changes to first rise and then quickly decline appeared in the whole study area. The proportion of the significant degradation area increased (Figure 3) according to vegetation trend analysis of Sen's slope. Clear improvements of vegetation changes accounted for the majority of the study area (74%) in the initial stages, especially a significant improvement gathering in the south of the study area (17%) with the NDVI value increasing nearly by 100% (Figure 3b). Initially degraded areas were mainly distributed in the mining areas and eastern mountainous areas, and 85% of them turned degradation into improvement during 1999–2010. Positive growth conditions drove an upward vegetation trend in 1999–2010. Nevertheless, in 1999–2018, the overall trend of vegetation had deteriorated, and degraded areas accounted for more than 80% of the total study area (Figure 3c). Severely degraded vegetation areas (27%) were distributed in the north of the study area and mining areas. Most significant improvements in the south of the study area were lost,

and continuous degradation occurred in the western areas. The few improvements that were gathered in the central town may result from ecological construction.

**Figure 3.** (**a**–**c**) Spatial distribution of vegetation changes by Sen's slope and Mann–Kendall method. (**a**) 1999–2005; (**b**) 1999–2010; (**c**)1999–2018.

#### *4.2. Spatiotemporal Distribution of Mining Development*

Mining development rapidly expanded over the past 20 years, and established a connected spatial pattern in three major industrial bases. As shown in Figure 4, the total area of the industrial base was 9.57 km<sup>2</sup> in 2000 and 184.98 km<sup>2</sup> in 2017. Open pits gradually expanded with a uniform growth rate of 9.17 km2/a around the core industrial base, the distribution of which was in a narrow pattern along the terrain of the valley in the middle, and an aggregate pattern in the northwest of the study area. The waste dump was staggered with open pits and expanded from 2.85 km2 in 2000 to 69.36 km2 in 2017. Areas of mining activities expanded from 55.83 km2 in 2000 to 453.78 km2 in 2017, accounting for more than one-tenth of the research area and 9 times the production scale in 2000. The average expansion rate was 16%, 32%, and 7% per year in 2000–2005, 2005–2010 and 2010–2017, respectively, with the highest expansion in the period of 2005–2010, as the market price of coal was at historic highs.

**Figure 4.** (**a**–**d**) Components and spatial distribution of mining activities in study area.

*4.3. Mining Development and Elevation of Influencing Vegetation Changes*

4.3.1. Correlation between Minimal Distance, Summary Distance, and Vegetation Changes

Spatial variation, mapping the relationship between distances and vegetation changes, was clearly shown in Figure 5. The positive coefficient indicated that vegetation changes moved towards an upward trend as the increase in distance to mining areas in the minimal and summary distance models. The depth of colors expressed the level of the correlation coefficient and fitness to match the variables.

For minimal distance, as distance increased, the impact of single mines on vegetation was shrunken, but dominated around areas of mining activities. Areas with positive coefficient (above 0.01) was 1418, 969, 866, and 733 km2, respectively, with a continuous decline trend of 48.31%. The mean coefficient (above 0.02) continued growing by 20% (0.025, 0.026, 0.03, 0.033, respectively). Positive coefficients were significantly higher around mining areas than those of other areas, and a clear shift from negative to positive correction constantly occurred in the mining areas (Figure 5a–c). The spatial pattern of areas with higher positive correlation (above 0.01) was consistent with spatial pattern of mining development, especially in the middle of study area after removing the areas with an NDVI slope of >0.

For summary distance, in the agglomeration process of mining development, the impact of multiple mines gradually shifted from the outside to the inside and continued to be increasingly concentrated in overlapping areas of mining activities. Areas with positive correlation (above 0.01) were 1129, 704, 587, and 588 km2 respectively, showing a downward trend. The higher ones (above 0.02) maintained a downward trend in the east but an increasing trend in the west, with an area of 186, 0, 393, and 373 km2. The average coefficient continuously increased by 92% from 0.026 to 0.050. The increase in both the area and coefficient of positive correlation had emerged and gathered in the west overlapping areas of three industrial bases. This was more evident in the increase in areas with positive correlation after removing areas with an NDVI slope of >0, with mean coefficients of up to 0.056. The comprehensive impact of coal bases is more influential for regional vegetation changes.

**Figure 5.** (**a**–**h**) Spatial patterns of correlation coefficients between distance and vegetation changes. (**a**–**d**) Minimal and (**e**–**h**) summary distance in 1999–2018.

#### 4.3.2. Correlation between Elevation and Vegetation Changes

Significant spatial differences were shown in the relationship between vegetation changes and elevation (Figure 6). The positive coefficient indicated that vegetation degradation improved with the increase in elevations, and the negative coefficient meant that vegetation degradation was worsening with the elevation's increasing.

Spatial relationship between elevation and vegetation changes varied according to the average elevation of the total mining areas. Negative correlation in higher-elevation areas and positive correlation in low-altitude areas were expressed in the relationship between elevation and vegetation changes. The two were approximately bounded by the average elevation of the total mining areas. Most areas with negative correlation were distributed in the middle and southwest of the study area, with the proportion gradually decreasing from 56%, 62%, 47%, to 38%. Low-elevation areas in the south gradually changed from negative into positive correlation as mining activities continuously expanded, indicating that the degradation of low-elevation vegetation was more serious with the decrease in elevation. Furthermore, this may partly explain the disappearance of extremely significant improvement areas in southern areas as the Hainan industrial base agglomerating. Three positively correlated aggregation areas of about 362 km<sup>2</sup> were located in the middle, north, and south around the Yellow river and constructed areas. The boundary between positive and negative correlations (–0.03 to 0 and 0 to 0.03) was consistent with the 1200–1230 m contour line, shown in Figure 6c,d.

**Figure 6.** (**a**–**d**) Spatial patterns of correlation coefficients between elevation and vegetation changes in 1999–2018.

4.3.3. Correlation between Minimal Distance, Summary Distance, Elevation, and Vegetation Changes

Proper spatial stationarity was enormously maintained in maps of distance factors and vegetation changes after the combination of elevation and distances (Figure 7).

For minimal distance, quantity disappearance in positively correlated areas and expansion of negatively correlated areas, especially in the western areas, were clearly shown in the combination of minimal distance and elevation (Figure 7a–d). Such disappearance illustrated that the ecological impact of a single mine on vegetation and its action pathway were not closely related to elevation. As minimal distance to mining areas increased, elevation and distance had disproportionately opposite effects on vegetation changes at different elevation levels, as per Sections 4.3.1 and 4.3.2.

For summary distance, the spatial distribution of positively correlated areas was relatively stable and showed no significant difference in whether elevation was involved in the relationship of summary distance and vegetation changes. Areas of positive coefficients continued growing with a proportion of up to 86.36%, and the quantities of higher positive coefficients (above 0.02) in 1999–2018 had increased by 17% compared with Figure 5e–h. Consequently, the comprehensive impact of summary distance on vegetation changes was relatively stable and not determined by the elevation, but by the spatial pattern of mining development. Moreover, after removing areas with an NDVI slope of >0, significant improvements of the positive coefficients occupied the majority, and less than 14% of the negative areas were distributed in the middle of the study area. The comprehensive ecological influence of summary distance was constantly strengthened with the systemization of mining activities.

**Figure 7.** (**a**–**h**) Spatial patterns of correlation coefficients between distance and vegetation changes. (**a**–**d**) Minimal distance and elevation, and (**e**–**h**) summary distance and elevation in 1999–2018.

#### **5. Discussion**

#### *5.1. Importance of Applying GWR in Studying Spatial Heterogeneity of Vegetation*

Significant spatial heterogeneity of the relationship between mining development and vegetation changes was revealed by comparing the GWR and OLS models (Tables 2 and 3). The adjusted R2 of the GWR model was in the range of 0.19 to 0.62, which was higher and better than that of the OLS model (all less than 0.1). This showed that the GWR model could greatly explain the impact of mining on vegetation changes. Therefore, it was concluded that the spatial relationship between mining development and vegetation changes was almost not linear, but showed great spatial heterogeneity. All R<sup>2</sup> of GWR also gradually increased with the expansion and aggregation of mining development, indicating that mining development had increasingly significant impact on vegetation changes.

However, the question was why vegetation changes in the resource regions showed significant spatial heterogeneity. Numerous studies suggested that vegetation greening rate was elevation-dependent by the different sensitivity levels to precipitation and temperature changes in arid and semiarid regions [64–66]. In mining areas, Liu et al. [67] found significant positive correlation in the relationship between NDVI and elevation factor, while Li et al. [68] discovered that as the elevation increased, the area covered by medium and high vegetation gradually decreased. We concluded that the vegetation changes were closely related with the average elevation of mining activities, and the higher degradation rate occurred away from the average elevation of mining activities due to the gravity convergence of the basin in low elevation, and fragile conditions of vegetation growth and high external sensitivity in high elevation.


**Table 2.** Comparison in adjusted R2 of geographically weighted regression (GWR) and ordinary least squares (OLS) models.

Adjusted R<sup>2</sup> G: adjusted R2 of GWR; adjusted R2 O: adjusted R2 of OLS; 1999–2018R: 1999–2018 after removing areas with normalized difference vegetation index (NDVI) slope of > 0.


**Table 3.** Comparison of Akaike information criterion (AIC) from GWR and OLS models.

Extensive human activities, including ecological management, are other drivers to determine vegetation changes. Areas with negative correlation, where vegetation improved with the decline of summary distance to mining areas (Figure 7h), are mainly distributed in the natural protected areas with strong ecological management, though these are at the shortest distance in the summary distance model, and should be the region with the highest degradation according to the assumption. The effect of ecological protection could also be seen in the relationship between vegetation changes and elevation (Figure 6d), such as the eastern, southeastern, and northeastern areas of positive correlation. The significant improvement areas of vegetation were mainly distributed around the central town because of ecological-engineering activities. Ecological activities are important measures to maintain and improve vegetation growth, and become important factors to increase the spatial heterogeneity of vegetation changes. Furthermore, all-natural protection areas were disproportionately degraded, so existing protected areas must be strengthened, and the cumulative regionwide effects of mining activities must be mitigated [69–71].

#### *5.2. E*ff*ect of Distance on Vegetation Disturbance in Mining Areas*

The results suggested that the ecological impact of a single mine was continuously strengthened around the mining areas. A nationwide survey in China concluded that the distance of environmental impact from mining sites varied from a few hundred meters to 10 km [72]. Previous studies have shown that the influencing range of ecological disturbance in a single mine was mostly between 1000 and 3200 m [73,74]. In areas with poor vegetation growth conditions, the range of mining disturbance is much greater [74]. The average distance of vegetation disturbance in large mining sites is greater than that of small ones [75]. Empirical evidence from the Mongolian Plateau shows that this disturbance range of large coal exacting areas had increased to over 5 km [75], where the ecosystem could maintain stable, sustainable, and rich ecosystem services [76]. The aggregated development of mining activities enhances the range of ecological disturbance of individual mines.

For the comprehensive impact of mining activities, it is difficult to judge the distance threshold of mining disturbance. Our research results showed that its impact on vegetation changes was determined by the comprehensive spatial pattern of current and future mining development. Related scholars made some preliminary explorations. Cheng et al. [77] suggested that, in large coal bases, the source and distribution of heavy-metal pollution have significant spatial heterogeneity, there is a hot spot in the overlapping area of multiple coal mines, and the dispersion of pollutants is higher than that of single mines. Moreover, the distance of its heavy-metal pollution far exceeds the capacity of a single mine, reaching more than 15 km. In the aggregation process of mining development, the centralization effect of mining activities emerged, and was continuously enhanced and stabilized.

#### *5.3. Response of Vegetation Changes to Climate Conditions*

A warming–wetting trend occurred in the study area that provided excellent conditions for vegetation growth, and the increase in precipitation showed spatial differences during the last two decades (Figure 8). In the study area, the precipitation in the north was much lower than that in the south, in which the precipitation growth reached the peak of about 7 mm/a on average during 1999–2010. This strongly explained overall improvements of vegetation in the study area, especially in the south, where almost all significant improvements had emerged in 1999–2010. However, the degree of vegetation degradation was relatively high in 1999-2018, although there was a significant vegetation improvement in the previous period and a constant precipitation increase throughout the period. This properly meant that the impact that resulted from mining development was far greater than the impact of positive climate factors on vegetation during 2010–2018.

**Figure 8.** (**a**–**c**) Spatial distribution of precipitation slope by Kriging interpolation. Precipitation data in 1999–2018 were acquired from five meteorological reference stations: Hanjinqi, Linhe, Huinong, Etuokeqi, and Taole. Precipitation slope was obtained by linear regression based on interannual precipitation data, and then imported into ArcGIS to obtain changes in the study area by Kriging interpolation.

Under the context of global warming, the research area experienced notable climate change. Prior to 2010, temperature changes were relatively stable and even slightly decreased, and the average temperature then rose rapidly by about 1.5◦C. Increased temperature promotes the germination of vegetation in spring and improves the growth of vegetation; on the other hand, it increases the transpiration and evaporation of plants in summer, which limits vegetation growth in arid and semiarid regions. Ma et al. [78] reported that evapotranspiration variation was consistent with changes in vegetation coverage, with a marginally increasing trend of about 0–5 mm/a during 2000–2010 and no significant increasing trend during 2011–2015 in the northwestern Loess Plateau. Numerous studies also showed that climate warming is one of the main driving factors of greening in northern China by enhancing photosynthesis and increasing vegetation activity [66,79–81].

In summary, climatic conditions were very favorable for the growth of vegetation during 1999–2018. It is almost impossible that climate changes led to vegetation degradation in such a water-scarce area with huge evapotranspiration, instead promoting vegetation growth.

#### *5.4. Limitations*

Although reliable and extensive data were used in this research, there were inevitable uncertainties or limitations. The interannual series of the maximal NDVI can characterize the dynamic changes of regional vegetation, but the shortage of data in some years and the image time deviation of the growing season had a specific effect on the accurate assessment of vegetation changes. More accurate time-series data and error analysis of calculation should be implemented in future studies. In addition, studies on the spatial differences of dominant areas by single mine and regional mining impacts, and their dominant areas conversion in resource regions should be strengthened in future research. Vegetation changes in resource regions are a comprehensive manifestation of indicators, such as vegetation types, climate conditions, groundwater depth, mining activities, and ecological management [30]. In relation to effect analysis of factors on vegetation, there may have been uncertainties brought about by other indicators. The contribution of other indicators to vegetation changes is notable and should be strengthened in future research.

#### **6. Conclusions**

Spatial heterogeneity is a great challenge in exploring the correlation between vegetation changes and mining development. Through spatial correlation based on the GWR model, three dominating factors were detected to quantify the correlation between vegetation changes and mining development across time and space. Our analysis indicated that incremental and combined mining activities could reverse the incremental trend of regional vegetation, leading to 86% degradation in the entire study area. Vegetation experienced a trend first of growth and then decline in the aggregation process of mining development. The scope of influence for single mines on vegetation had shrunk by about 48%, and the mean coefficients increased by 20%, closer to mining areas. The scope of influence for multiple mines on vegetation gradually expanded to 86% from the outer edge to the inner overlapping areas of mining areas, where the mean coefficients increased by 92%. Elevation dependence of vegetation changes varied according to the average elevation of total mining areas and played an important role in causing the spatial heterogeneity of mining impact on vegetation. Ecological measures should be implemented according to local conditions to achieve sustainable vegetation ecology.

**Author Contributions:** Conceptualization, H.L. and M.X. (Miaomiao Xie); methodology, H.L.; software, H.L., H.W. and S.L.; validation, H.L., H.W. and S.L.; formal analysis, H.L.; investigation, M.X. (Meng Xu); resources, H.W., S.L. and M.X. (Meng Xu); data curation, H.L.; writing—original draft, H.L.; writing—review editing, M.X. (Miaomiao Xie), H.W. and M.X. (Meng Xu) visualization, H.W. and S.L.; supervision, S.L. and M.X. (Meng Xu); project administration, M.X. (Miaomiao Xie); funding acquisition, M.X. (Miaomiao Xie); All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by the National Key R&D Program of the Ministry of Science and Technology of China (Grant number 2017YFC0504401).

**Acknowledgments:** The authors would thank the editors and four anonymous reviewers for their suggestions. We also appreciate professor Qing Chang and professor Yanxu Liu for improving the study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Vegetation Resilience under Increasing Drought Conditions in Northern Tanzania**

**Steye L. Verhoeve 1,\*, Tamara Keijzer 2, Rehema Kaitila 3, Juma Wickama <sup>4</sup> and Geert Sterk <sup>1</sup>**


**Abstract:** East Africa is comprised of many semi-arid lands that are characterized by insufficient rainfall and the frequent occurrence of droughts. Drought, overgrazing and other impacts due to human activity may cause a decline in vegetation cover, which may result in land degradation. This study aimed to assess drought occurrence, vegetation cover changes and vegetation resilience in the Monduli and Longido districts in northern Tanzania. Satellite-derived data of rainfall, temperature and vegetation cover were used. Monthly precipitation (CenTrends v1.0 extended with CHIRPS2.0) and monthly mean temperatures (CRU TS4.03) were collected for the period of 1940–2020. Eight-day maximum value composite data of the normalized difference vegetation index (NDVI) (NOAA CDR—AVHRR) were obtained for the period of 1981–2020. Based on the meteorological data, trends in rainfall, temperature and drought were determined. The NDVI data were used to determine changes in vegetation cover and vegetation resilience related to the occurrence of drought. Rainfall did not significantly change over the period of 1940–2020, but mean monthly temperatures increased by 1.06 ◦C. The higher temperatures resulted in more frequent and prolonged droughts due to higher potential evapotranspiration rates. Vegetation cover declined by 9.7% between 1981 and 2020, which is lower than reported in several other studies, and most likely caused by the enhanced droughts. Vegetation resilience on the other hand is still high, meaning that a dry season or year resulted in lower vegetation cover, but a quick recovery was observed during the next normal or above-normal rainy season. It is concluded that despite the overall decline in vegetation cover, the changes have not been as dramatic as earlier reported, and that vegetation resilience is good in the study area. However, climate change predictions for the area suggest the occurrence of more droughts, which might lead to further vegetation cover decline and possibly a shift in vegetation species to more drought-prone species.

**Keywords:** drought impacts; NDVI; drought adaptation; drought index; vegetation resilience; drought vulnerability; standardized precipitation evapotranspiration index; AVHRR; land degradation

#### **1. Introduction**

The main component of terrestrial ecosystems is vegetation, which has a direct link to many ecosystem services, such as food production, soil retention, climate regulation, water purification and disease management [1]. The value of these services could decline or disappear with an increasing pressure on vegetation resources. Not only natural influences such as wildlife grazing and the weather, but anthropogenic pressures can also have a negative influence on the productivity of vegetation [2].

Land degradation is defined as changes in land use from productive to unproductive due to natural or human-made factors [3]. Land degradation is one of the world's

**Citation:** Verhoeve, S.L.; Keijzer, T.; Kaitila, R.; Wickama, J.; Sterk, G. Vegetation Resilience under Increasing Drought Conditions in Northern Tanzania. *Remote Sens.* **2021**, *13*, 4592. https://doi.org/10.3390/ rs13224592

Academic Editor: Elias Symeonakis

Received: 14 September 2021 Accepted: 22 October 2021 Published: 15 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

major socio-economic and environmental problems, affecting two-fifths of humanity [4]. Agricultural expansion has led to severe land degradation all over the world, particularly when accompanied by high water consumption and the conversion of natural landscapes into cultivated lands [5–7]. Land degradation undermines the land's productivity and contributes to the degradation of ecosystem services. Land degradation disproportionately affects the poor and is sometimes the decisive component that causes poverty and social conflict [8,9]. The loss of productive land is part of a vicious circle for many rural people in developing countries in which land degradation can be both the cause and the effect of poverty [10].

Semi-arid ecosystems in particular are under pressure from the rising demand for natural resources and an increase in weather extremes. This is caused by an increasing human population and by climate change, respectively. More frequent and severe droughts have been forecasted in the 21st century, particularly in the mid-latitudes [11]. Increases in drought occurrences are driven by a decrease in precipitation and/or an increase in evapotranspiration due to higher daily temperatures [12]. As water availability acts as the main driver of vegetation distribution and productivity in arid and semi-arid regions [13], droughts impose a serious risk on the livelihood of many people [14].

Previous studies show that Eastern Africa has been suffering from an increase in temperature and more frequent droughts, which have continued in the 21st century [15]. In this region a browning trend of the vegetation has occurred in the past 40 years [16,17], which is among the most notable vegetation browning in the world. Some reports also show a decline in vegetation productivity and an increase in land degradation [1,3,18]. On the other hand, other remote-sensing-based studies have shown that areas in East Africa have experienced fluctuations in vegetation cover, which were largely driven by variations in soil moisture [19]. This can be explained by the quick response of vegetation in arid and semi-arid biomes to rainfall fluctuations. Plant species have adopted mechanisms that allow them to rapidly adapt to changing water availability and are also able to withstand water deficits [20]. These mechanisms suggest a strong revival of vegetation health during periods of water abundance [21]. However, the way vegetation responds to drought on different time scales remains largely unknown because of the different response times and vulnerability that species have to drought. By knowing this, the severity of degradation can be assessed, and an estimate can be made on the importance of applying measures.

The semi-arid zone in northern Tanzania is an example of an area that suffers from increasing droughts and enhanced soil degradation [22]. The area close to Lake Manyara, covered by the Monduli and Longido districts, is primarily comprised of savanna and rangeland, which is widely used for livestock grazing by the local Masai herders. According to Wynants et al. [23], 2.0% of this area is degraded while there has been a serious increase in the soil erosion risk from 1988 to 2016. Masai herders in the area complain about more frequent droughts and a lack of sufficient grazing resources. Part of the problem faced by the Masai is an increase in livestock numbers, which results in more pressure on grazing resources [18,24,25]. Using Landsat satellite imagery from the Google Earth Engine, Verhoeve [24] studied land use/cover changes in the Monduli and Longido districts over the period of 1985–2018. The results showed widely fluctuating land use/cover classes over time, and neither revealed any significant changes in land cover nor provided evidence for large-scale vegetation degradation. These results contradict previous studies that showed overall degrading vegetation cover in the study area [17,23,25,26]. According to Verhoeve [24], vegetation cover is largely influenced by the amount of precipitation and the occurrence of drought. However, it was also clear that vegetation resilience in the study area is high, with a good recovery of vegetation cover following a drought year. Similar observations were made for Sahelian West Africa, where the vegetation recovered following the devastating droughts of the late 1970s and early 1980s [21].

In this study we used satellite-derived hydrometeorological data for drought analysis and the normalized difference vegetation index (NDVI) as a proxy for vegetation response to drought. The main objective was to investigate the resilience of vegetation to drought

over different time scales in northern Tanzania during 1981–2020. To achieve this objective, we first investigated the long-term hydrometeorological data and the occurrence of drought in the region. Next, we analyzed the long-term interannual NDVI trends in the region. Finally, we determined the short-term effects of drought on vegetation health and recovery at different time periods.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area comprises the districts of Monduli and Longido in Arusha Region, northern Tanzania (Figure 1) [25,27]. The total area covers approximately 16,000 km<sup>2</sup> and has some 282,000 inhabitants [28]. The majority of the area lies in the East African Rift Valley and is bordered by a high (~1300 m a.s.l) escarpment in the west. The valley floor is about 300 m lower in the western part and gradually rises towards the east, where there is no clear escarpment. Several small mountains are scattered throughout the Rift Valley, which are mostly volcano remnants. It is an important area for wildlife conservation, including or bordering the Lake Manyara, Arusha, Tarangire, Mount Kilimanjaro and Serengeti National Parks, as well as the Ngorongoro Conservation Area.

The climate of the study area is semi-arid and has four climatic seasons [29]. In general, the short rains season from November to January (NDJ) is followed by a short dry season (Feb) until the long rains start, which typically occur from March to May (MAM). From June to October (JJASO), a long dry season can be identified with cooler temperatures. Because of high interannual variability, the NDJ season often continues into February. In wet years the NDJ and MAM seasons often overlap in an almost continuous rainy season. The annual rainfall is between 450 and 1200 mm, averaging around 750 mm. The lower lying areas receive a mean annual rainfall of about 650 mm, whereas in the higher parts the annual rainfall ranges from 1000 mm to 1200 mm on average [30,31]. The average temperature is between 20–25 ◦C, with a minimum of 11 ◦C in July to September and a maximum temperature of 31 ◦C in January and February [32,33].

The physical characteristics of the area, such as its morphology, geology and soils, are strongly influenced by tectonic activities and volcanism [3]. These characteristics have influenced the rainfall distribution, vegetation types and wildlife of the area. The Monduli district is part of the Lake Manyara catchment. Lake Manyara, part of an endorheic basin, is the southernmost lake within the eastern arm of the East African Rift System. The lake is shallow and saline, and is situated at 960 m a.s.l.

In the districts are multiple large volcanic mountains, both active and inactive. These mountains stand out in the dominantly flatter landscape, and often have higher rainfall on or near their slopes. Apart from the forests on the slopes of the mountains, savanna is the major land cover type. Savannas are generally on the transition area between tropical rainforests to deserts, which in this area is represented by the forests of the Monduli mountains, Mt. Meru and the Ngorongoro Conservation Area, and the drier, more arid regions of Simanjiro District and Dodoma Region in the south. The savannas have been managed extensively by the Masai through fire and grazing by their livestock, suppressing the growth of bushes and trees [34,35].

**Figure 1.** Study area: the Monduli and Longido districts within Arusha Region, northern Tanzania. Source: [27].

#### *2.2. Data*

The NDVI is an indicator of the vitality and density of vegetation of a remote sensing image pixel [36]. It is regarded as a reliable indicator for land cover conditions and variations, and over the years it has been widely used for vegetation monitoring [37]. The NDVI produced from historical satellite image archives captures long-term changes in vegetation health and density, enabling the measurement of responses to climate variability [38].

For this study we used the NOAA Climate Data Record (CDR) of AVHRR NDVI, Version 5. This dataset contains daily measurements of surface vegetation cover, gridded at a resolution of 0.05◦ and computed globally over land surfaces [39]. The AVHRR provides data on a long-term basis (1981–current day) with a moderate spatial resolution. Other datasets could have a higher spatial resolution, but they were not suitable because they start providing data of the study area at a later date. The online platform Google Earth Engine (GEE) was used to extract the daily NDVI values of the study area. GEE is a high-performance cloud-based platform that gives access to a vast and growing amount of earth observation data and provides the processing power necessary to analyze the data [40]. The daily NDVI was used to compute 8-day maximum value composites to filter out cloud irregularities. Of these maximum value composites, the mean NDVI of the study area was used in this study.

The NDVI time series runs from 24 June 1981, the start of the AVHRR mission, until 24 June 2020. In 1988 a series of 51 negative NDVI values were measured, which coincides with the service start of a new AVHRR satellite (NOAA-11), and were therefore left out from further analysis. From week 36 in 1994 to week three in 1995 no data were available due to sensor malfunctioning [41,42].

The hydrometeorological data included the monthly precipitation and temperature of the study area. Limited in situ data were available as the study area is poorly gauged. However, reanalysis and satellite-based techniques can provide continuous hydrometeorological data. Monthly precipitation was obtained from the CenTrends v1.0 extended with the CHIRPS-2.0 dataset. The CenTrends dataset was developed for East Africa in particular to overcome the precipitation data gaps and to enable the analysis of seasonal and decadal fluctuations within a centennial context [43]. The CenTrends dataset is available from 1900. CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) is the state-of-the-art observational daily precipitation dataset for East Africa. CHIRPS uses additional infrared satellite data, and is therefore available from 1981. CenTrends and CHIRPS are non-independent datasets as they are based on a similar assimilation technique and underlying observational data for their overlap period. They are highly correlated (0.95), justifying the extension of the CenTrends dataset with monthly averaged data from CHIRPS [44]. The combination provides information about both trends in the past and present. The combined dataset was obtained via the KNMI Climate Explorer and has a spatial resolution of 0.2◦.

The temperature data used in this study were obtained from the CRU TS4.03 monthly mean temperature. This dataset uses observations interpolated into 0.5◦ latitude/longitude grid cells combined with existing climatology to obtain absolute monthly values [45]. The CRU was validated with the nearest available in situ data of a meteorological station. The in situ data were only available as mean monthly maximums. Therefore, the CRU TS4.03 mean monthly maximum temperature (CRUmax) was validated with the available in situ dataset. With the use of the Pearson's *r*, the correlation of the datasets was tested. This statistical test was used because the datasets were continuous and normally distributed (Shapiro–Wilk normality test: W = 0.97 and W = 0.95 for, respectively, the in situ and the CRUmax datasets, at *p* < 0.05). The station mean monthly maximum temperatures measured at Arusha and CRUmax are in close correspondence (r = 0.96, R<sup>2</sup> = 0.91), but show relatively high deviations in terms of their magnitude (RMSE = 1.84 ◦C) for 1979 to 2018. The strong correlation indicates a good representation of the annual temperature cycle. An overestimation of the CRUmax data was determined at T < 30.62 ◦C and an underestimation above this value was determined.

As both the CHIRPS and the CRU TS4.03 datasets are delivered grid-sized, the means of the monthly precipitation and the temperature of all grid cells were calculated for the area between coordinates ~2–4◦S, ~35–37◦E, which includes the study area. Drought and trends in precipitation and temperature were studied from 1940–2020, which is twice the temporal range of the NDVI data.

#### *2.3. Trend Analysis*

The health of vegetation and the corresponding NDVI value is dependent on many anthropogenic and natural factors. The most important natural factors are temperature and precipitation [46]. Therefore, the variations in NDVI and hydrometeorological data were determined over time for the two rainy seasons (NDJ and MAM) and the hydrological year (September–August). All the datasets showed a non-normal distribution over the research period (Shapiro-Wilk normality test, *p* < 0.00). Therefore, the Mann-Kenndall (MK) test was used to determine the direction and significance of trends. The MK test is a non-parametric rank-based test method which is widely used to assess the presence of trends in a time series of climatic, environmental or hydrological data [47–50]. The MK test results in a measure of the rank correlation of Kendall's τ (tau) and the significance (*p*-value). The magnitude of the trend determined by the MK test was computed with Sen's slope. This test calculates both the slope and the intercept of a linear rate of change [51].

#### *2.4. Drought Analysis*

Due to the lack of data on stream flow, groundwater and soil moisture in the study area, only the occurrence of meteorological drought was investigated. The standardized precipitation evaporation index (SPEI) was used to determine drought. The SPEI is an extension of the standardized precipitation index (SPI), which uses only precipitation anomalies to determine drought [52]. To determine hydrological anomalies the SPI uses only precipitation, while the SPEI uses both precipitation (P) and potential evapotranspiration (PET) to determine drought. It takes into account the impact of changing temperatures on water demand. This is important as evapotranspiration influences soil moisture variability and therefore vegetation water content [53]. PET was calculated in this study using the Thornthwaite equation [54], available in the SPEI package of the "R" language. It requires mean temperature and latitude as input values. Other equations (e.g., Hargreaves or Penman) require variables for which no data were available for the study area. The mean temperature data of CRU TS4.03 were used. The SPEI focusses on the anomalies, and therefore the CRU data are assumed to be useful because of the high correlation with in situ data despite the reported overestimation. As the study area lies between 2–4◦S, a latitude of −3◦N was used in the Thornthwaite equation.

The SPEI measures P-PET anomalies based on a comparison of observations for a period of interest (e.g., 1, 3, 6, 12 and 48 months) with the long-term historical record of that period. It requires monthly data, preferably continuous and for 30 years or longer. For each month a SPEI value is calculated using the month itself and a previous number of months, which are together equal to the period of interest. For instance: when calculating the SPEI of March with a period of interest of 3 months, the cumulative P-PET of January, February and March is used. This value is then compared with the long-term record of cumulative January–March P-PET. The period of interest of the SPEI represents typical time scales for water deficits to affect different types of water sources. For example, the 1- or 3-month SPEI represents short droughts and indicates immediate impacts, such as reduced soil moisture, while the 12- or 24-month SPEI represents long droughts, causing, for instance, changes in reservoir storages [52]. In this study the 3-month SPEI (SPEI-3) was used to represent short-term droughts, while the 12-month SPEI (SPEI-12) values were used to represent annual (medium-term) and multi-annual (long-term) droughts. Furthermore, the SPEI-3 was used to indicate dry/wet seasons and the SPEI-12 was used to indicate dry/wet years within the study period.

To calculate the SPEI, the P-PET record is fitted to a probability distribution (loglogistic) function. It is then transformed into a normal distribution with a mean of zero and a variance of one. The result is the SPEI, which represents the number of standard deviations from the mean. Positive SPEI values indicate anomalous wet periods, and negative values indicate dry periods [52,53,55]. The magnitude of the SPEI gives a probabilistic measure of drought/wetness intensity. For instance: an SPEI-3 equal to −2 in January–March of a certain year means that the cumulative January–March P-PET of that year is 2 standard

deviations smaller than the long-term average of cumulative January–March P-PET. Events were defined according to the drought intensity classes of McKee et al. [52]: a drought event was classified when the index was below −1 and a wet event was classified when the index was higher than 1 (Table 1). Some studies suggest a denotation of −0.5 to 0.5 for normal conditions [56,57]; however, due to the adaptation of the vegetation to semi-arid conditions the effects of a mild drought on the vegetation are assumed to be neglectable.

Both seasons (NDJ and MAM) and years were classified as dry, wet or normal using the SPEI. Here, the SPEI-3 of January and May were considered for, respectively, the NDJ and MAM seasons. To determine whether a hydrological year was wet, dry or normal, the SPEI-12 of August was used as this value is based on the previous September–August P-PET values.

**Table 1.** SPEI-based classification of drought. Based on temperature and precipitation during a given moment at a specified latitude, the SPEI gives anomalies of that given moment compared to other years. Source: [52].


#### *2.5. Vegetation Resilience*

The resilience of vegetation was tested for multi-annual (long-term), annual (mediumterm) and seasonal (short-term) responses to drought. The 8-day maximum value composited NDVI and the SPEI values were used to investigate the effects of drought on vegetation cover and resilience.

The long-term effect of changing climate conditions on vegetation cover is represented by changes in the NDVI values over the years. Such changes over time in the NDVI values represent a change in vegetation cover and vegetation health [58]. This was tested with the MK test of 8-day maximum value composited NDVI from 1981 to 2020 and compared to the long-term precipitation and temperature in the region. With the use of Sen's slope [51], the linear rate of change was calculated.

The annual (medium-term) dynamics in vegetation cover and its response to drought were quantified by separating the years into different classes based on hydrological conditions. Dry, normal and wet years were classified with the use of the SPEI-12. Additionally, the year following a dry year was classified as a "recovery-year", regardless of the hydrological conditions of that year. The data of each of those four classes were than fitted by using local polynomial regression (LOESS) to provide a smooth curve through a set of datapoints. The response of the NDVI throughout the seasons was then compared for these four different hydrological conditions. In a second step, vegetation resilience over time was evaluated. For this purpose, three time periods were selected: 1991–2000; 2001–2010; and 2011–2020. If the resilience was not affected over the long-term, similar intra-annual NDVI values were expected during dry years in each time period. The time periods were chosen to have an approximate equal number of years classified as dry (SPEI-12 in August < −1). The years 1981–1990 were left out because no drought had occurred during these years.

Finally, the seasonal (short-term) effect of drought on vegetation cover and resilience was evaluated. The dynamics of vegetation response to seasonal droughts were tested by comparing intra-annual NDVI trends during dry and non-dry periods. It was assumed that if the vegetation has adapted to the semi-dry environment, it will withstand droughts by reviving as soon as the conditions permit [59]. This would mean that the regrowth of the vegetation is not dependent on the severity of a drought, as indicated by the SPEI values. Hence, it was expected that the NDVI values of the subsequent non-dry season do not deviate substantially from other normal (non-dry) years. Seasonal resilience was tested by comparing the sinusoidal curve of the intra-annual NDVI pattern of dry years and the

following normal or wet year. Moreover, the effect of the timing of a drought during a dry year was assessed. The timing of the drought was determined with the use of the SPEI-3. The moment of drought was characterized as an SPEI-3 smaller than −1 during the first or the second rainy season (NDJ or MAM, respectively). Four situations were compared, in which both seasons were dry (two occurrences), the first or the second season was dry (four occurrences for both situations) or both seasons were normal or wet (thirteen occurrences). The curves of the NDVI during these different timings of the drought show the response of vegetation to the seasonal drought, and thus provide a proxy of vegetation resilience.

#### **3. Results**

#### *3.1. Rainfall and Temperature*

Variations in precipitation during the NDJ and MAM rainy seasons as well as the hydrological year are shown in Figure 2. The rainfall in the study area is characterized by a high but mainly homogeneous variation in seasonal rainfall (Figure 2A,B) and annual rainfall (Figure 2C). Overall, the NDJ season has a higher variation than the MAM season (CVNDJ = 0.49; CVMAM = 0.32), but on average rainfall in the NDJ season is lower compared with the MAM season (226 mm vs. 327 mm) (Table 2).

The precipitation has a slightly decreasing trend in the MAM season and annual rainfall, and a small increasing trend in NDJ rainfall (Figure 2). However, all trends are insignificant according to the MK test (Table 3), which means it cannot be concluded that the amount of rainfall has changed in the study area over the period of 1940–2020. Temperature on the other hand does show a significant (α = 0.05) increasing trend (Figure 3). This positive trend was observed both for the yearly averaged temperature as well as for the seasonally averaged temperatures (NDJ and MAM) (Table 3). Deviations from this trend in the form of relatively warm (e.g., 1951–1952) and cold (e.g., 1967) years are also visible. Overall, the yearly average temperature increased by 1.06 ◦C between 1940 and 2020.

**Table 2.** Statistics of seasonal and annual precipitation and temperature during the period of 1940– 2020 in northern Tanzania. Trend based on Sen's slope. During the warmer NDJ season there is, on average, less precipitation with a higher level of variance compared to the MAM season.


1, coefficient of variation.

**Table 3.** Seasonal and annual Mann–Kendall test trend results of rainfall and precipitation over the period of 1940–2020 in northern Tanzania. At α = 0.05 the temperature is significantly increasing, but the precipitation is insignificant.


**Figure 2.** Total amounts of rainfall in northern Tanzania during the period of 1940–2020 based on the CenTrends v1.0 dataset extended with CHIRPS-2.0. (**A**) The NDJ (November, December and January) rainy season, (**B**) the MAM (March, April and May) rainy season and (**C**) the hydrological year (Sept–Aug). Linear trend lines (black) are based on Sen's slope. Insignificant at α = 0.05. The NDJ-season has a higher variability compared to the MAM-season (CV = 0.49 and 0.32, respectively), but a lower seasonal mean (226 and 327 mm, respectively).

**Figure 3.** Average annual temperature in northern Tanzania during the period of 1940–2020 based on monthly mean data of the CRU TS4.03 dataset. Linear trend (black) based on Sen's slope. Significant at α = 0.05, increase of 0.13 ◦C/decade.

#### *3.2. Drought Occurrence*

The SPEI was calculated for periods of 3 months (SPEI-3), representing short-term dry or wet seasons, and 12 months (SPEI-12), representing medium-term drought/wet years. Figure 4 shows SPEI-3 and SPEI-12 time series over the time period of 1940–2020. As expected, fewer drought events are identified with the SPEI-12 compared to the SPEI-3 time series. Multiple smaller events identified by the SPEI-3 can either be flattened out or cumulated into one event of the SPEI-12. The latter effect is known as pooling [60]. Relatively wet and dry years can be distinguished, with either largely positive SPEI-12 values (e.g., the 1960s) or negative SPEI-12 values (e.g., the period of 1999–2006).

Both SPEI time series show a significant decreasing trend (SPEI-3: tau = −0.157, *p*value = 4.1 × <sup>10</sup><sup>−</sup>13; SPEI-12: tau = −0.148, *<sup>p</sup>*-value = 7.4 × <sup>10</sup><sup>−</sup>12). This means that a drying trend is present in the study area, which is mainly the effect of the increasing temperature, given the non-significant changes in rainfall in the study area. Higher temperatures result in higher potential evapotranspiration values, which lead to overall more negative SPEI values. The drying trend since 1940 is also reflected by the relatively large area below zero (=dry) compared to the area above zero (=wet) in recent decades (1990–2020).

Zooming into the time period of the NDVI data (1981–2020), the 1980s was a decade which barely shows long or extreme dry and wet events. The 1990s and 2000s are characterized by more frequent and longer droughts. The period of 2001–2010 in particular suffered from extended and severe droughts, with some SPEI values going below −2 (extreme drought). The differences in drought severity between decades is reflected by the occurrence of low SPEI-3 and SPEI-12 values presented in Table 4. Assuming a normal distribution of the SPEI, the occurrence of SPEI < −1 or SPEI > 1 would occur 15.9% of the time and the mean would be 0. However, every decade since 1980 has fewer wet events, and since 1991 more dry events have occurred than expected. The 2001–2010 decade was the driest period, represented by a high occurrence of droughts and a low mean SPEI value over time, while the 2011–2020 decade experienced similar droughts as the 1991–2000 decade. During the 1981–1990 decade the study area experienced a low number of moderate to extreme events, which is also reflected by the steady rainfall values over this time period (Figure 2). Before 1980, several periods of serious drought occurred, for instance during 1953–1956 and 1975–1977 (Figure 4).

**Table 4.** Percentage of total SPEI values per decade and mean SPEI values per decade. Based on the definition of SPEI, the mean over the entire research period is 0, and the total of moderate to extreme wet/dry seasons (−1 > SPEI > 1) should not exceed 15.9% of the time (Table 1). During the four most recent decades this is not the case.


**Figure 4.** SPEI-3 (**A**) and SPEI-12 (**B**) values of 1940–2020. Blue indicates relatively wet conditions, while red indicates relatively dry conditions in the indicated (3 or 12) antecedent time period in months. Significant (α = 0.05) negative trends were found in both the SPEI-3 and SPEI-12.

#### *3.3. Vegetation Cover and Resilience*

The effects of droughts and seasonal rainfall on the resilience of vegetation in the area were assessed by comparing the NDVI time series with the SPEI-3 and SPEI-12 values. Trend analysis was applied to the long-term NDVI time series and a visual comparison was applied to the intra-annual variations.

The NDVI values over the years (1981–2020) show a seasonal pattern, with relatively high NDVI values in the period of the two wet seasons, and low values during the dry season (Figure 5). The higher NDVI values indicate healthy vegetation with a high cover, while the lower values indicate bare soil or low vegetation cover. Figure 5 also shows that the NDVI values are generally lower during dry periods, as indicated by the negative SPEI-3 values.

**Figure 5.** Interannual normalized difference vegetation index (NDVI) series, composed of 8-day maximum value composited NDVI. The NDVI values were classified according to SPEI-3 values, which indicates drought (strong negative SPEI-3 values) or wet (strong positive SPEI-3 values) conditions. A trend line (black) was fitted through all data based on Sen's slope (significant at α = 0.05).

The long-term NDVI has a significant downward trend (tau = −0.0623, *p*-value 8.38 × <sup>10</sup><sup>−</sup>5, Sen's slope = −1.02 × <sup>10</sup>−5). This downward trend results in a decrease of 0.017 NDVI points (from 0.175 to 0.158) between 1981 and 2020. Therefore, over the period of analysis, on average, vegetation cover in the study area has declined. Several possible reasons for this vegetation cover decline can be given. The first is the conversion of grazing land to arable land for crop production [23,24]. The second possibility is land degradation due to overgrazing in the area [23,25]. The last reason could be the increased temperatures and drought as exemplified by the SPEI values (Figure 4).

Figure 6 shows the same 8-day NDVI values as in Figure 5, but in this figure the values have been plotted versus the hydrological year (Sept–Aug). The polynomial regression lines of NDVI values for dry (red; SPEI-3 < −1), wet (blue; SPEI-3 > 1) and normal years (black; −1 < SPEI-3 < 1) are also plotted. An additional regression line (yellow color) shows the NDVI values for a year immediately following a drought year. Obviously, the wet years have higher NDVI values than normal years, and thus better vegetation cover. During dry years, the NDVI values are substantially lower than in normal years, indicating less vegetation cover or less healthy vegetation. However, in the years following a drought year, the NDVI values return to normal values, which indicates a high resilience of vegetation in the study area. Apparently, the drought is affecting the vegetation temporarily, but during the period of study (1981–2020) it has not led to dramatic vegetation degradation, apart from the slightly negative long-term trend that was detected (Figure 5).

The intra-annual NDVI time series has been split into three decades: 1991–2000, 2001– 2010 and 2011–2020 (Figure 7). In each decade, two or three years were classified as dry years (SPEI-12 < −1). The polynomial regression curves show that in the first (Figure 7A) and last decade (Figure 7C) the NDVI values in a year following a drought year quickly return to normal values. In the decade 2001–2010, which experienced the most droughts, the NDVI values following a drought year also return to nearly similar levels as the longterm normal. The normal year NDVI values in this decade are higher than in the other two decades, which is surprising given the more severe drought conditions in this decade (Table 4). Apparently, the rainfall was well-distributed during the normal years in the period of 2001–2010, which resulted in good vegetation growth during those normal years.

**Figure 6.** Intra-annual NDVI series, based on 8-day maximum value composited NDVI from 1981 to 2020. Based on the SPEI-12 of the hydrological year, the lines represent the NDVI values belonging to normal (black), dry (red) or the year subsequent to a dry hydrological year (yellow) with the use of locally estimated scatterplot smoothing (LOESS).

**Figure 7.** Intra-annual decadal NDVI series, composed of 8-day maximum value composited NDVI from (**A**) 1991–2000, (**B**) 2001–2010 and (**C**) 2011–2020. The lines represent the NDVI values belonging to normal (black), dry (red) or the year subsequent to a dry hydrological year (yellow) with the use of locally estimated scatterplot smoothing (LOESS).

In the last analysis, the impacts of seasonal droughts on NDVI development were evaluated. The short-term effects of drought were tested with the use of the SPEI-3 for

the NDJ and MAM seasons. The NDJ season was classified as dry when the SPEI-3 of Jan was below −1. The MAM rainy season was considered dry when the SPEI-3 of May was below −1.

The timing of the drought during the hydrological year has an effect on the pattern and magnitude of the NDVI values (Figure 8). During a year in which both the short rainy season (NDJ) is normal and the long rainy season (MAM) is normal the NDVI reaches its peak of ~0.20 in early March (dark-green curve in Figure 8). On the other hand, the occurrence of drought during the NDJ, MAM or both seasons impacts the development of the NDVI over time. If the NDJ rainy season is normal the NDVI will pass the curve of two normal seasons at first but will decline more quickly during a dry MAM season (yellow curve). A year with a dry NDJ but normal MAM seasons (blue curve) shows a delay in the development in the NDVI but reaches similar peak values as in a normal year. The peak values of NDVI are reached about 1.5–2 months later than in a normal year, before subsequently declining again. This either indicates that the vegetation recovery from the dry NDJ season requires some time, or that vegetation cover in a normal year reduces more quickly due to heavy grazing, which can start much earlier in a good rainfall year.

The duration of increased NDVI levels is similar if one of the rainy seasons is a dry season. However, if both seasons are classified as dry (purple curve), this time period is shorter. The purple curve shows that despite both seasons being classified as dry, the MAM season still has enough rainfall to enable vegetation growth, albeit not as good as during a normal or wet MAM season. The minimum amount of rainfall in the MAM season is ~150 mm (Figure 2B), and in the two years that comprise the purple curve in Figure 8 the MAM precipitation was 210 mm (2004) and 235 mm (2017). This explains the relatively good vegetation cover in the MAM rainy season that on average has 78 mm more precipitation than the NDJ season (Table 2).

**Figure 8.** Seasonal NDVI response to different drought regimes between 1991 and 2020. The dark-green line represents a normal season, both during the NDJ and the MAM. The purple line represents two dry seasons, blue represents a dry NDJ followed by a normal MAM season and yellow represents a normal NDJ followed by a dry MAM season.

#### **4. Discussion**

#### *4.1. Rainfall and Temperature*

The total annual rainfall and the rainfall during the NDJ or MAM rain season in the study area did not change significantly (α = 0.05) during the period of 1940–2020 (Table 3). These results contradict the results of earlier research which indicated decreasing East African rainfall due to lower MAM rainfall since the early 1980s [61–63]. The reason given for the declining rainfall is the rapid warming of the Indian Ocean, which leads to stronger convection and more rainfall over the Indian Ocean and less rainfall in East Africa. Our data for Monduli and Longido districts do not confirm those reported results, as the rainfall trends are all insignificant.

The rainfall data of the study area (Figure 2) are characterized by a high interannual variation in amounts of rainfall. As in other semi-arid regions, mean annual rainfall does not often occur; many years had much lower or much higher amounts of rainfall. This is also reflected by the seasonal amounts of rainfall. In most years the NDJ rainfall was below average, and in only a few years it was well above the average (e.g., 1962, 1998 and 2007), which leads to a positively skewed distribution (skew = 1.33). The MAM rainfall was less variable than the NDJ rainfall and more evenly distributed around the mean (skew = 0.56). The hydrological years (Sept–Aug) in which the amounts of rainfall were below average are usually caused by a lack of rainfall in one of the rainy seasons. During only 12 out of 80 years both rainy seasons were more than 25% below average. For the NDJ season this occurred in 28 of the 80 years, and for the MAM season this occurred in 21 years. For the rainfall in the hydrological years, the contribution of the MAM season varied from 24 to 76%, but on average it was 50%. The NDJ season contributed between 14 and 64% of rainfall to the hydrological year. On average this was 37%.

Unlike the amounts of rainfall, the temperature in the study area increased significantly (α = 0.05) by 1.06 ◦C over the period of 1940–2020. A highly significant increasing trend was determined and can only be caused by global warming [64]. According to [65] the warming in East Africa started in the early 1980s, but our data series (Figure 3) shows that a more or less steady increase in temperature had already started since 1940. Only the 1960s were relatively cool, but since then the increase in temperature was again steady and approximately 0.12 ◦C per decade. This warming may have resulted in a more erratic rainfall pattern with higher rainfall intensities due to the stronger convection [66]. However, the rainfall data used in the study do not provide any information on the rainfall character, and thus it cannot be confirmed that the rainfall has actually become more extreme.

#### *4.2. Drought*

Occurrence of drought in the study area was analyzed using the SPEI-3 (short-term droughts) and SPEI-12 (long-term droughts). Both SPEI time series (Figure 4) show a significant (α = 0.05) decreasing trend in SPEI values, indicating that drought has become more serious recently than it was in the past in the Monduli and Longido districts. As no significant changes in rainfall occurred, the enhanced drought can only be the result of the warming of the area. Increasing temperature will result in a higher potential evapotranspiration [54], which will lead to stronger desiccation of the land, and thus more drought stress in semi-arid areas such as the Monduli and Longido districts. Since 1993, six long-term droughts (SPEI-12 < −1) have occurred, while in the 53 years prior to 1993 only three of such droughts occurred. A similar pattern can be observed for short-term droughts (SPEI-3 < −1). A notable period without much drought was the period from the late 1970s until the early 1990s (Figure 4 and Table 4).

It is difficult to compare our results with other studies, as study designs may be different in timescales, datasets, research periods or study areas [67]. The importance of the latter is underlined by [68,69], which studied the Greater Horn of Africa and obtained large spatiotemporal variations in trends in precipitation and temperature between 1980 and 2010 [68], and trends in SPEIs between 1964 and 2015 [69]. A recent study on the entire Lake Manyara catchment, including the Monduli and Longido districts, showed the presence of a drying trend over the past century [22]. Furthermore, a general increase in decadal drought characteristics (duration, severity and frequency) from the 1930s to present, with the exception of the wet 1980s, was reported by [22].

The observed increasing drought is in agreement with local people that state that it is drier and warmer nowadays compared to 25–30 years ago [25]. The drying trend may raise concerns for the future of the Monduli and Longido districts. Conway et al. [70] analyzed the results of 34 climate models that simulated future precipitation and temperatures in Tanzania. These results showed wide spatiotemporal variations within Tanzania regarding future precipitation. The results indicate that the number of rainy days will decrease and the that the intensity of events will increase. This suggests more variable rainfall with a higher chance of droughts or floods in the future. In contrast to rainfall, the climate models predict increasing temperatures between 0.8 and 1.8 ◦C for 2040 in addition to between 1.6 and 5.0 ◦C for 2090 (relative to the period of 1976–2005). The change is evenly distributed across Tanzania. Thus, while future changes in precipitation are uncertain, it can be stated that temperatures will continue to rise, which will further increase the drought risk in the study area due to higher potential evapotranspiration rates.

#### *4.3. Vegetation Trends and Resilience*

Based on the NDVI timeseries a significant (α = 0.05) decline in vegetation cover was observed between 1981 and 2020 (Figure 5). The fitted trendline indicates that the average NDVI declined by 9.7%, from 0.175 in 1981 to 0.157 in 2020. This observed decline in vegetation cover is less dramatic than previously reported numbers in other studies on East Africa [1,23,71], which generally show 1.5 to 3 times more vegetation reduction than our results. It is not clear why those other studies come to these higher vegetation cover decline values for the same study area. One reason could be that our analysis is based on near-continuous NDVI values while most other studies take NDVI values from fewer moments in time. As clearly visible in Figure 5, the timing of the satellite imagery for NDVI calculation can result in rather different NDVI values. This is equally true for the year (dry versus wet) which is chosen and the timing of the image within that year (dry season versus wet season).

The decline in vegetation cover observed here could be the result of different causes. Parts of the study area have been converted from grassland into arable land, which on average has lower vegetation cover [23,24]. Additionally, vegetation degradation due to overgrazing leading to bare soil might play a role in the contemporary lower vegetation cover [23,25,32]. Finally, the increase in drought severeness and frequency (Figure 4) might also result in generally lower vegetation cover. Drought will not only reduce the amount of green vegetation but will also affect the condition of the growing plants, which is reflected by a lower NDVI [72]. Based on the available data and analyses, it is not possible to conclude which of these reasons is the most important, and it could well be that all reasons play a role in the declining vegetation in the study area. However, given the significant increase in drought occurrence in the study area (Figure 4), it is believed that drought is the main cause of declining vegetation cover.

Vegetation resilience of the study area can be characterized as high. When a year was dry the NDVI values were lower than in a normal year, meaning less vegetation cover, but the next year the vegetation always recovered to normal year values (Figure 6). The recovery of the vegetation following a drought year did not change over time (Figure 7). In the 2011–2020 decade the resilience was not different from the 1991–2000 and 2001–2010 decades. At the seasonal scale, vegetation resilience appeared to be good. A dry season resulted in lower vegetation cover, but the vegetation came back quickly during the next normal or wet season and reached similar NDVI values (Figure 8).

These results show that, despite a general decline in vegetation cover over the period of 1981–2020, vegetation resilience was still good in the Munduli and Longido districts. Drought had an immediate impact on vegetation cover, but once rains came back at normal or above normal levels in a new season the vegetation quickly responded and returned to

normal levels. However, what NDVI observations do not tell is possible changes in the species composition. Pressures on the grazing systems could be drought and overgrazing, which may lead to changes in the species that grow in the study area. Drought-tolerant species could replace other species, while continuous preferential grazing of grass species may result in the spread of less favorable plant species that are not eaten by the livestock, and therefore considered a negative change [73,74]. The Masai herders in the study did complain about the lower quality and less availability of grass resources in the area, and mentioned drought and high livestock numbers as the main causes for the decline [24,25].

#### **5. Conclusions**

This study used remote-sensing-based datasets of meteorology and vegetation cover to analyze vegetation resilience in the Monduli and Longido districts of North Tanzania. The results of meteorological analysis show that the amounts of rainfall in the Monduli and Longido districts did not change significantly during the study period (1940—2020), but that temperatures increased by 1.06 ◦C over the same period. The rising temperature resulted in higher potential evapotranspiration rates, which significantly increased drought occurrence and frequency. Since the early 1990s serious droughts became more frequent as well as longer, and it can be expected that in the future this trend will continue, given the climate projections for Tanzania.

Vegetation cover in the two districts declined significantly by 9.7% over the period of 1981–2020. This decline in cover could be due to several reasons, but the increase in drought most likely played an important role. Other reasons such as overgrazing by livestock, land use conversions and species changes may have played a role as well, which have all been indicated to occur according to local Masai herders. Despite the overall decline in vegetation cover and more severe drought conditions, the resilience of the vegetation was high. A drought year or season affected vegetation cover and health, as indicated by lower NDVI values, but the vegetation recovered quickly during the following rainy season when the amounts of rainfall were back to normal or above-normal levels.

Finally, it is concluded that despite the overall decline in vegetation cover, the changes have not been as dramatic as earlier reported, and vegetation resilience is still good in the study area. The climate change predictions for the area suggest a higher occurrence of drought, which could cause a further decline in vegetation cover. In addition, a shift in vegetation species to more drought-prone species could occur, which may lead to fewer grazing resources for the local Masai herdsmen.

**Author Contributions:** Conceptualization, S.L.V. and G.S.; methodology, S.L.V., T.K. and G.S.; formal analysis, S.L.V. and T.K.; resources, S.L.V., T.K., R.K. and J.W.; writing—original draft preparation, S.L.V., T.K. and G.S.; writing—review and editing, S.L.V., T.K., R.K., J.W. and G.S.; visualization, S.L.V. and T.K.; supervision, G.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available on GitHub/Zenodo at https://doi.org/10.5281/zenodo.4635298 accessed on 21 October 2021.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Use of A MODIS Satellite-Based Aridity Index to Monitor Drought Conditions in Mongolia from 2001 to 2013**

**Reiji Kimura 1,\* and Masao Moriyama <sup>2</sup>**


**Abstract:** The 4D disasters (desertification, drought, dust, and *dzud*, a Mongolian term for severe winter weather) have recently been increasing in Mongolia, and their impacts on the livelihoods of humans has likewise increased. The combination of drought and *dzud* has caused the loss of livestock on which nomadic herdsmen depend for their well-being. Understanding the spatiotemporal patterns of drought and predicting drought conditions are important goals of scientific research in Mongolia. This study involved examining the trends of the normalized difference vegetation index (NDVI) and satellite-based aridity index (SbAI) to determine why the land surface of Mongolia has recently (2001–2013) become drier across a range of aridity indices (AIs). The main reasons were that the maximum NDVI (NDVImax) was lower than the NDVImax typically found in other arid regions of the world, and the SbAI throughout the year was large (dry), although the SbAI in summer was comparatively small (wet). Under the current conditions, the capacity of the land surface to retain water throughout the year caused a large SbAI because rainfall in Mongolia is concentrated in the summer, and the conditions of grasslands reflect summer rainfall in addition to grazing pressure. We then proposed a method to monitor the land-surface dryness or drought using only satellite data. The correct identification of drought was higher for the SbAI. Drought is more strongly correlated with soil moisture anomalies, and thus the annual averaged SbAI might be appropriate for monitoring drought during seasons. Degraded land area, defined as annual NDVImax < 0.2 and annual averaged SbAI > 0.025, has decreased. Degraded land area was large in the major drought years of Mongolia.

**Keywords:** aridity index; drought; land degradation; remote sensing; satellite-based aridity index

#### **1. Introduction**

In recent years, global warming has caused an increase in temperature and decrease in precipitation in drylands at high latitudes [1–4]. An increase in environmental stress associated with human activities concurrent with climate change may spread the damage caused by these three disasters [3,5].

In Mongolia, the damage caused by cold and snow is called "*dzud*" and is a natural disaster that can lead to significant livestock mortality and economic damage [6,7]. The authors in [7] have called desertification, drought, dust, and *dzud* the 4D-related hazards. The impact of *dzud* during the winter is strongly affected by the drought conditions (low pasture production) during the last summer. For example, *dzud* occurred from October 2009 to March 2010 due to the effect of drought during summer 2009 [7]. In Mongolia, about 30% of the workforce is engaged in raising livestock, and the 4D hazards, in addition to global warming, pose a serious threat to their livelihood. The authors in [8] have indicated that about 60% of the decline in vegetation in Mongolia from 1988 to 2008 can be attributed to decreases in rainfall and increases in temperature.

The Aridity Index (AI) is a useful metric of the dryness or wetness in arid regions. The AI is defined as the ratio of annual precipitation (Pr) to annual potential evaporation (Ep) and is a water-balance-based climatic index. The total area of arid regions, determined from

**Citation:** Kimura, R.; Moriyama, M. Use of A MODIS Satellite-Based Aridity Index to Monitor Drought Conditions in Mongolia from 2001 to 2013. *Remote Sens.* **2021**, *13*, 2561. https://doi.org/10.3390/rs13132561

Academic Editor: Elias Symeonakis

Received: 27 May 2021 Accepted: 28 June 2021 Published: 30 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

meteorological data collected from 1951 to 1980, was 41% of the terrestrial land surface, including Antarctica [9,10]. The corresponding percentage was 37% from 1981 to 2010 [11], and [12] have estimated that percentage to have been 39.5% from 2001 to 2013. These estimates suggest that the total area of arid regions has not changed or even decreased slightly since 1950.

The most widely used method to estimate aridity is the Palmer Drought Severity Index (PDSI) [13], not the AI. The PDSI is calculated using meteorological data and takes into consideration runoff, water supply, and the water retention capacity of the soil [14]. Some studies have estimated the PDSI in Mongolia [15–17], and all of them have concluded that the PDSI decreased (i.e., drought became more severe) from 2000 to 2010. However, there are some disadvantages to use of the PDSI. For example, the PDSI does not take into account changes in the spatial distributions of soil, vegetation, and hydrological processes [18].

With high resolution and frequency, satellite data offer advantages in monitoring environmental conditions in arid regions [19,20]. For example, the Moderate Resolution Imaging Spectroradiometer (MODIS) and Copernicus Missions (specifically Sentinel 1 or 2) have provided data since 2000 and 2014, respectively. Since lengthy cloudless periods are common in arid regions, much of the MODIS or Sentinel data are usable for global analyses.

Some drought indices are based on remote sensing [21,22]. Spectral reflectance has been widely used to calculate indices like the Normalized Difference Vegetation Index (NDVI) and normalized difference water index (NDWI) because the calculation procedures are simple [23]. The authors in [24] indicated that NDVI performed best in assessing land degradation compared with other indices using spectral reflectance like a soil-adjusted vegetation index (SAVI). Additionally, they revealed that thermal indices using land surface temperature (LST) were identified as the most influential variable for land degradation assessment. The authors in [25] have also suggested that a thermal index that uses the difference of the land surface temperature (LST) between day and night is much more useful as an indicator of water deficit. MODIS has provided daytime and nighttime LST data observed over equivalent locations every day, which have enabled the calculation of a thermal index since 2000.

The authors in [26] proposed a satellite-based aridity index (SbAI) that uses the difference of the LST between day and night, and the SbAI has already been validated and applied [12,26–29]. For example, years of major droughts in China have corresponded to years in which large increases in degraded land area were identified [28]. In addition, a comparison of the SbAI with the AI, that is, within Turc space (which is based on the water balance concept indicated by water limited to energy-limited lines) identified 15 categories in five zones: a stable zone, a zone transitioning toward dryness, a zone transitioning toward wetness, a dry zone, and a moist zone [30]. The authors of [30] have shown that the actual aridity was intensifying in most of Mongolia, though the climatic AI ranged from arid to semi-arid. The authors in [31] have demonstrated the NDVI relationship with precipitation and temperature in semi-arid regions and showed that the majority of sites displayed seasonal reversal, associated with transitions from water-limited to energy-limited conditions during wet winters.

Considering these past findings, the goal of this study was to examine the trends of NDVI and SbAI to determine why the land surface of Mongolia has recently become drier while the AI has ranged from arid to semi-arid and to propose a method to monitor the land-surface dryness or drought directly, using only satellite data.

#### **2. Methods**

#### *2.1. Target Area and Analysis Period*

Mongolia is a landlocked country surrounded by Russia, China, and Kazakhstan. The total land area is 1564,116 km2. Mongolia is surrounded by high mountains and is located on a highland over 1500 m above sea level. The country has four distinct seasons, large temperature variations and little rainfall. The climate changes widely, not only due to differences in altitude but also in latitude. The annual mean temperature is between −8 °C and 6 °C, and the annual mean precipitation is between 50 mm and 400 mm [6].

The AI, SbAI, and NDVI in Mongolia were calculated for latitudes of 41–53◦ N and longitudes of 87–120◦ E (Figure 1). With the exception of the woodlands in northern Mongolia, most of the land surface in Mongolia has been classified as typical grasslands and bare soil, including the Gobi Desert in southern Mongolia [32]. The red circles in Figure 1 indicate the SYNOP (surface synoptic observations) meteorological stations —Ulaanbaatar, Mandalgovi, and Tsogt-Ovoo from north to south—which are located in grassland, the boundary between grassland and bare soil, and bare soil, respectively.

**Figure 1.** Land-use classification in Mongolia. Dots indicate the SYNOP meteorological stations (Ulaanbaatar, Mandalgovi, and Tsogt-Ovoo from north to south).

The time interval used to analyze the relationship between the AI and SbAI was 2001–2013. This period was chosen because precipitation data from the Global Precipitation Climatology Center's (GPCC) full data product (V7) are available throughout that time [12]. The GPCC has calculated precipitation for all global land areas during the target period through objective analysis of climatological average rainfall at the rain-gauge stations in its database.

The goal of this study is an examination of the trends of the NDVI and SbAI to determine why the land surface of Mongolia has recently (2001–2013) become drier. However, to establish the annual changes before and after 2001–2013, we calculated the NDVI from 1981 to 2000 using Advanced Very-High-Resolution Radiometer (AVHRR) data and from 2001 to 2020 using Moderate-Resolution Imaging Spectroradiometer (MODIS) data. We calculated the SbAI from 2001 to 2020 using MODIS data.

#### *2.2. Data*

The analysis of data regarding the relationship between the AI and SbAI from 2001 to 2013 was taken from [30], with a horizontal resolution of approximately 1◦ in both longitude and latitude.

We calculated the daily SbAI and NDVI from 2001 to 2020 using the Terra/MODIS data products MOD09CMG and MOD11C1 for surface reflectance and land surface temperature (LST) (https://modis-land.gsfc.nasa.gov/MODLAND\_grid.html) (accessed on 27 May 2021). The spatial resolution of these two products was the same (0.05◦). We used the "Collection 6" land product subsets web service to access and download the data [33].

We calculated daily NDVIs from 1981 to 2000 using the AVHRR surface reflectance (Channels 1 and 2) with 0.05◦ resolution (https://www.ncdc.noaa.gov/cdr/terrestrial/ avhrr-surface-reflectance) (accessed on 27 May 2021). For long-term continuity of the NDVIs from the AVHRR and MODIS, we compared these two NDVIs in 2000 and obtained the following relationship with root mean squared errors (RMSE) = 0.09 (Figure 2):

$$\text{NDVI}\_{\text{MODIS}} = 0.998 \ast \text{NDVI}\_{\text{AVHRR}} + 1.975 \ast 10^{-5} \tag{1}$$

**Figure 2.** Relationship between the MODIS-NDVI and AVHRR-NDVI. Difference in color indicates the relative frequency.

In this study, Equation (1) was used to correct the NDVIAVHRR from 1981 to 2000.

A global land-cover map (GLCM) was used to characterize the distribution of land use in Mongolia (https://db.cger.nies.go.jp/dataset/landuse/en/) (accessed on 27 May 2021) (Figure 1). The GLCM is a raster image of Earth with a latitude-longitude resolution of 30 s, and it assigns the land cover of Earth into seven categories [34].

Annual rainfall data from 2001 to 2020 at Ulaanbaatar, Mandalgovi, and Tsogt-Ovoo (Figure 1) were downloaded from the Japan Meteorological Agency (JMA) website (http://www.data.jma.go.jp/gmd/cpd/monitor/climatview/frame.php) (accessed on 27 May 2021). When data were missing, the Climatic Resolution Unit Time Series monthly high-resolution gridded climate dataset was used to fill the gap (https://crudata.uea.ac. uk/cru/data/hrg/) (accessed on 27 May 2021).

#### *2.3. Analytical Methods*

The SbAI can be physically interpreted as a metric of the reciprocal heat capacity, which can be estimated from the ratio of the amplitude of the difference of the land surface temperature (LST) between day and night to the incident net solar radiation [26]. For a dry surface, the SbAI is large because the difference of the LST between day and night (Δ*T*s) is large. The Δ*T*<sup>s</sup> is large because the low water content of the land surface causes its heat capacity to be low. For the analysis in this study, we used an annual average SbAI, which we calculated by averaging daily SbAIs, because the AI that we used to examine the relationship between the AI and SbAI was also an annual average [27].

In the analysis, we used the maximum NDVI in each year, because the amounts of vegetation were low, and information from NDVIs is lost in arid regions like Mongolia when annual averages are used [27,35]. In this study, the maximum NDVI occurred in August, and we used that NDVI as a metric of the potential for vegetation growth because the vegetation was strongly affected by the amount of precipitation prior to August (85% of annual rainfall in Mongolia occurs from April to July) [8,36–38].

The authors in [12,30] have used the SbAI to classify arid regions according to their actual degree of aridity (Table 1).


**Table 1.** Classification of arid regions using the SbAI and AI.

The range of the AI in each region, as defined by [9–11,39], is given in parentheses. The authors in [30] have subdivided these regions into 15 categories based on the SbAI and AI values of the four dryland regions (indicated by the double-headed red arrows along the *Y* and *X* axes) listed above (Figure 3). The stable zone (green), which includes points classified into the same dryland region by both the SbAI and AI, comprises categories 1, 2, 3, and 4. The transition zone, in which dryness is increasing (red), comprises categories 5 and 6, and the transition zone, in which wetness is increasing (blue), comprises categories 7, 8, and 9. In zones 10, 11, and 12, the magnitude of dryness is increasing in the dry zone (SbAI > 0.025), and zones 13, 14, and 15 are becoming wetter (SbAI < 0.015) (Figure 3).

**Figure 3.** Relationship between the AI and SbAI, averaged over 2001–2013, and the 15 arid region categories. The red, double-headed arrows along the axes indicate the range of values of the indices that indicate hyper-arid, arid, semi-arid, and dry sub-humid regions. The blue dots with standard deviations indicate the actual ranges of the AIs and SbAIs in Mongolia (modified from [30]).

The authors in [27] examined the global distribution of annual maximum NDVI < 0.2 and annual averaged SbAI > 0.025 and defined areas that meet both these criteria as degraded land, which includes existing desert and land with both permanent and temporal dust erodibility. We examined yearly variation of degraded land area in Mongolia between 2001 and 2020 and discussed it in relation to drought.

#### **3. Results**

#### *3.1. Distribution of Averaged AI in Mongolia from 2001 to 2013*

The distribution of the averaged AI from 2001 to 2013 indicated that the northern region of Mongolia was dry sub-humid (DSH), the north central region was semi-arid (SAr), and the south region was arid (Ar) (Figure 4). A latitude of 47◦ N is the boundary between the SAr and Ar. The distribution of the AI corresponded to the distribution of land classification: northern Mongolia is woodland, north-central Mongolia is grassland, and southern Mongolia is bare soil (Figure 1).

**Figure 4.** Spatial distribution of annual AI averaged during 2001–2013 in Mongolia with a resolution of 1◦ latitude × 1◦ longitude.

However, when the SbAIs were plotted against the AIs (blue dots with standard deviations in Figure 3), the areas that should have been in zones 2 and 3 were in zones 10 and 11 (Figure 5). This result indicates that the actual aridity in most of Mongolia was more severe than the climatic aridity. The authors in [40] have published a map of the distribution of aridity in Mongolia that shows a distribution of extremely strong to strong aridity, and middle to weak aridity similar to zones 10 and 11, respectively.

**Figure 5.** Spatial distribution of dry zone categories 10 and 11 during 2001–2013 in Mongolia with a resolution of 1◦ latitude × 1◦ longitude.

The authors in [30] have indicated that the yearly maximum of the NDVI (NDVImax) in zones above the stable zone in Figure 3 has decreased, and the NDVImax in zones below the stable zone in Figure 3 has increased. The values of the NDVImax are therefore strongly related to the differences between the 15 zones in Figure 3. The values of the NDVImax in zones 10, 11, 2, 3, and 5 are shown in Figure 3, and the distribution of the annual NDVImax, averaged over 2001 to 2013, is shown in Figure 6. The ranges of the NDVImax were 0.20 ± 0.11 and 0.51 ± 0.16 in zones 10 and 11, respectively (Figure 6), and were nearly consistent with the ranges of the NDVImax indicated in zones 10 and 5. That is, the smaller amount of vegetation may be one of the reasons why the land surface of Mongolia became drier from 2001 to 2013 across a range of AIs.

**Figure 6.** Spatial distribution of NDVImax values averaged during 2001–2013 in Mongolia with a resolution of 0.05◦ latitude × 0.05◦ longitude.

These results suggest the following characterizations of zones 10 and 11:


In Sections 3.2 and 3.3, we examine why the land surface of Mongolia became drier from 2001 to 2013 across a range of AIs by examining the trends of the NDVIs and SbAIs.

#### *3.2. Difference of Climatic Conditions Using AI*

We compared the AI distribution in Figure 4 to that from 1981–2010, calculated by [41], to identify the effects of climate change. Although a climatic trend toward aridity was apparent in some places (red circles), the distribution of AIs did not generally change (Figure 4).

Many studies have addressed trends of precipitation in Mongolia [7,8,17,42]. The authors in [43] examined the trend of annual precipitation from 1982 to 2010; they found that precipitation over Mongolia had been decreasing since 1993 (the trend was especially strong in northern and central Mongolia [17]) and the annual rainfall during 1994–2010 was about 30 mm lower than during 1982–1993. The authors in [7,8] found similar results. A decrease in annual rainfall by 30 mm has little effect on the classification of climates based on AI in Ar and SAr regions because the ranges of AIs in those categories are large: 0.05–0.2 and 0.2–0.5, respectively. The AI value itself may be reduced because of a decrease in rainfall and enhanced potential evaporation related to warmer temperatures [44]. The implication is that climatic effects are not revealed by a map of the distribution of AI values that reflect drier land surfaces in Mongolia because the range of AI values is large.

Monitoring the amount of vegetation will be an effective way to examine the effect of a decrease in rainfall by 30 mm over Mongolia. We therefore examined the trend of the NDVImax in August over Mongolia from 1981 to 2020 (Figure 7) because the NDVImax in August is sensitive to the amount of rainfall during the previous season [8,36,38]. In Mongolia, 85% of the annual rainfall is from April to July [37]. Figure 7 shows that the NDVImax decreased from 1994 to 2009. As previously mentioned, the annual rainfall during 1994–2010 decreased by about 30 mm compared with 1982–1993. The decreasing trend of the NDVImax up to 2009 was presumably due to decreased precipitation. However, the peak value of the NDVImax was only 0.39 in 1994, less than the average value of 0.4 in zones 2 and 3.

**Figure 7.** Annual changes (1981–2020) of the NDVImax in August over Mongolia. Dashed lines show the trends from 1981–1994, 1994–2009, and 2009–2020.

In contrast, an increasing trend of the NDVImax after 2009 can be found clearly. Since there have been few analyses of precipitation trends from 2001 to 2020 over Mongolia ([45] from 2000 to 2017; [23] from 2000 to 2016), we reexamined those trends at Ulaanbaatar, Mandalgovi, and Tsogt-Ovoo (Figure 8). At all three locations, annual rainfall increased after 2009, and those trends corresponded to an increase of the NDVImax, which reached an averaged value of 0.4 in zones 2 and 3 (Figure 7). Based on an analysis of precipitation anomalies, the authors in [23] and [45] have also indicated that Mongolia became wetter from 2009 to 2017 compared with 2000–2008.

#### *3.3. Trends of NDVImax and SbAI in Zones 10 and 11 during 2001–2020*

We examined in detail the water retention of the land surface using the trends of the SbAI (averaged value in August and annual averaged value) and the NDVImax in August. In zone 10, the NDVImax decreased by a small amount from 2001 to 2009, and it increased very obviously after 2009 (Figure 9a). The average SbAI in August varied inversely with the NDVImax, and the correlation between the two was high (*R*<sup>2</sup> = 0.64, *p* < 0.001). The averaged NDVImax from 2001 to 2010 was less than the limiting value of 0.2, below which land is considered to be degraded [27], but it increased after 2009. However, the NDVImax was smaller than the average value of 0.26 in Ar regions (zone 2). When the NDVImax exceeded 0.24, the averaged SbAI in August was lower than the limiting value of 0.025, below which land is considered to be degraded [27], but it was higher than 0.025 in many years. The annual averaged SbAI exceeded 0.03 in many years, and thus the environment of zone 10 appeared to be stressed in terms of water retention by the land surface through the year.

**Figure 8.** Annual changes (2001–2020) of annual rainfall in (**a**) Ulaanbaatar, (**b**) Mandalgovi, and (**c**) Tsogt-Ovoo. Black and blue dashed lines represent the normal values and trends during 1981–2010.

In zone 11, the NDVImax increased by a small amount from 2001 to 2009, and it increased very obviously after 2009 (Figure 9b). The averaged SbAI in August varied inversely with the NDVImax, and the correlation between the two was high (*R*<sup>2</sup> = 0.54, *p* < 0.001). The NDVImax from 2001 to 2010 was lower than the general value of 0.54 in the SAr regions (zone 3), but it increased after 2009. The averaged SbAI in August from 2001 to 2009 was slightly lower than the limiting value of 0.025, below which land is considered to be degraded. However, the averaged SbAI in August after 2009 was substantially lower than 0.025, and it was even lower than 0.022, which is the upper bound for classification of a region as SAr (Figure 3). Although summers became wetter after 2009, water retention throughout the year was still low, because the annual averaged SbAI was 0.025–0.03.

The annual averaged SbAI was not necessarily correlated with the NDVImax (*R*<sup>2</sup> = 0.31, *p* < 0.05) (Figure 9b). For example, the NDVImax was lower in 2017 than in 2019, but the annual averaged SbAI was lower (wet) in 2017. There were similar relationships between the NDVImax and SbAI in 2010 and 2011. The authors in [45] indicated that water storage after summer in 2010 was higher than in 2011. It is inferred that precipitation in seasons other than summer affected water retention throughout the year. Numerical simulation results have shown that annual rainfall, especially rainfall during the winter, will increase over Mongolia from 2016 to 2035 and from 2081 to 2100 [37]. At the present time, it cannot definitively be concluded that the increasing trend of annual rainfall since 2009 (Figure 8) is in agreement with the simulation results. If the amount of precipitation increases enough that the annual averaged SbAI in zones 10 and 11 decreases to 0.025 and 0.022, respectively, the aridity in zones 10 and 11 will be close to climatically stable conditions.

**Figure 9.** Annual changes (2001–2020) of the NDVImax in August, averaged SbAI in August, and annual averaged SbAI. (**a**) zone 10, (**b**) zone 11. Dashed lines show the trends of respective indices.

The annual averaged SbAI during the years shown in Figure 9b is the averaged value in zone 11, and thus there were wet regions with SbAIs below 0.022. For example, the distribution of annual averaged SbAIs in 2013 (the lowest SbAIs from 2001 to 2020) revealed wet regions with SbAIs below 0.022 (colored orange in Figure 10). The authors in [46] have indicated that although a large proportion of Mongolia's rangelands are not providing their potential ecosystem services, few have crossed an irreversible threshold of ecological change caused by current levels of grazing pressure. For the sustainable development of stock farming, continuous monitoring should be conducted to conserve the relatively wet regions (colored orange in Figure 10) and to prevent land degradation in nearly degraded regions (colored green in Figure 10).

**Figure 10.** Spatial distribution of annual averaged SbAI values in zone 11 for 2003.

#### *3.4. Detection of Drought Using SbAI*

According to PDSI values, drought occurred frequently in all parts of Mongolia from 2000 to 2013 [7,16,37]. Since occurrences of *dzud* during the winter were strongly affected by drought conditions (low pasture production) during the preceding summer, understanding and predicting the characteristics of drought are of particular concern in Mongolia [15,37]. The authors in [15] have assessed drought frequency, duration, and severity over Mongolia from 2000 to 2010 using the PDSI and the standardized precipitation index (SPI). They have shown that droughts occurred in 2000, 2001, 2002, 2004, 2006, 2007, 2008, and 2009. Droughts therefore occurred in most years from 2000 to 2009 (red arrows in Figure 11). Figure 11 shows the yearly change in the NDVImax, the averaged SbAI in August, and the annual averaged SbAI over Mongolia; the broken lines are the average values for the drought years. Both of the SbAI values equaled or exceeded these broken line averages during 2001–2009, but they have fallen below the broken lines in many years since 2009.

**Figure 11.** Annual changes (2001–2020) in the NDVImax in August, averaged SbAI in August, and annual averaged SbAI over Mongolia. Red arrows indicate drought years.

Drought years can be simply detected as follows: NDVImax ≤ 0.33 averaged SbAI in August ≥ 0.025 annual averaged SbAI ≥ 0.030.

The correct identification of drought (presence or absence) was higher for the SbAI than for the NDVI during 2001–2009. In particular, the identification accuracy was 100% for the averaged SbAI in August. The primary reason for this accuracy is that the droughts during 2001–2010 were summer droughts that led to a reduction in water retention [15].

The years 2014 and 2017 have recently been drought years [28,38,45,47], and they could be detected by the annual averaged SbAI. Drought is more strongly correlated with soil moisture anomalies [36,48], and thus the annual averaged SbAI might be appropriate for monitoring drought during seasons other than summer.

Degraded land area, defined as annual NDVImax < 0.2 and annual averaged SbAI > 0.025, has decreased (*R*<sup>2</sup> = 0.24, *p* < 0.05), especially since 2009 (Figure 12). Degraded land area was small in 2003, 2012, 2016, and 2018 but large in 2001, 2002, 2004, 2005, and 2009, which corresponded to the major drought years shown in Figure 11. Degraded land area can recover form one year to the next, as in 2017 to 2018. Since degraded land area was defined as areas including existing desert and land with both permanent and temporal dust erodibility [27], factors like an ecological processes and human impacts are also important in recovering degraded land [7].

**Figure 12.** Annual changes of areas of degraded land and percentage of total land area in Mongolia. Dashed line represents the average extent of degraded land for 2001 to 2009.

The defined degraded land area should have been in zones 1 and 10 in Figure 3. Therefore, this method will be useful for general detection in very severe drought condition with a possibility of dust occurrences over Mongolia, particularly in zone 10 of Figure 5 (Figure 13). The spatial distribution of degraded land indicates that droughts have occurred frequently in southern-east Mongolia, that is, in Dundgovi, Omnogovi, and Dornogovi *aimags* (*aimag* is the first-level administrative subdivision). The authors in [49] also exhibited high risks of *dzud* in these provinces using social data from 1944 to 1993.

**Figure 13.** Spatial distribution of degraded land from 2001 to 2020.

#### **4. Discussion**

This study examined the reasons why the land surface in Mongolia has recently become drier across a range of AIs by examining the trends of the NDVI and SbAI. We then proposed a method to monitor drought conditions using only satellite data. Among explanations for why the SbAIs have been large within Mongolia are the following:


In contrast, the SbAI decreases when the annual rainfall and/or amount of rainfall increases during seasons other than summer. If the amount of precipitation, including precipitation during the winter, increases enough that the annual averaged SbAI decreases, the aridity of Mongolia will approach climatically stable conditions, and drought occurrences that are correlated with soil moisture anomalies will be less frequent. Since 2009, the NDVImax in August over Mongolia has tended to reach an average value of 0.4 in zones 2 and 3, and the frequency of drought years when SbAI values are over the threshold has also decreased.

For sustainable development in Mongolia, where 30% of the workforce is engaged in stock farming, continuous monitoring should be conducted to detect drought and prevent land degradation. The remote sensing techniques proposed in this study, in addition to other drought indices that make use of meteorological or satellite data, will facilitate this monitoring. We hope that the usefulness of our method will be confirmed by other researchers and in other arid countries, and that our method will serve as the basis for an improved system based on remote sensing techniques that will promote sustainable development in arid regions throughout the world.

#### **5. Conclusions**

The purpose of this study was to examine the trends of the NDVI and SbAI to determine why the land surface of Mongolia has recently become drier; that is, when the SbAIs were plotted against the AIs, actual aridity in most of Mongolia was more severe than climatic aridity. The main reasons were that the NDVImax was lower than the NDVImax found in the other drylands of the world, and the SbAI throughout the year was large. Under the current conditions, the capacity of the land surface to retain water throughout the year caused the SbAI to be large because rainfall in Mongolia is concentrated in the summer, and the conditions of grasslands reflect summer rainfall.

A method was proposed to monitor land-surface dryness or drought using satellite data. The correct identification of drought was higher for the SbAI than for the NDVI. Drought is more strongly correlated with soil moisture anomalies, and thus the annual averaged SbAI might be appropriate for monitoring drought during seasons other than summer. Degraded land area, defined as annual NDVImax <0.2 and annual averaged SbAI > 0.025, has decreased. Degraded land area was small in 2003, 2012, 2016, and 2018 but large in 2001, 2002, 2004, 2005, and 2009, which corresponded to the major drought years in Mongolia. However, it must be noted that degradation is not caused by not only drought events but also ecological processes and grazing pressure in Mongolia [7].

**Author Contributions:** Conceptualization, R.K.; methodology, R.K.; software, M.M.; validation, R.K. and M.M.; formal analysis, R.K.; resources, M.M.; data curation, R.K. and M.M.; writing—original draft preparation, R.K.; writing—review and editing, R.K.; visualization, R.K.; supervision, R.K.; project administration, R.K.; funding acquisition, R.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Grant-in-Aid for Scientific Research, grant Number KAKENHI 19H04239.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We are very grateful to the reviewers who significantly contributed to the im-provement of this paper.

**Conflicts of Interest:** The authors declare that there are no conflict of interest.

#### **References**


### *Article* **Vegetation Trends, Drought Severity and Land Use-Land Cover Change during the Growing Season in Semi-Arid Contexts**

**Felicia O. Akinyemi 1,2**


**Abstract:** Drought severity and impact assessments are necessary to effectively monitor droughts in semi-arid contexts. However, little is known about the influence land use-land cover (LULC) has—in terms of the differences in annual sizes and configurations—on drought effects. Coupling remote sensing and Geographic Information System techniques, drought evolution was assessed and mapped. During the growing season, drought severity and the effects on LULC were examined and whether these differed between areas of land change and persistence. This study used areas of economic importance to Botswana as case studies. Vegetation Condition Index, derived from Normalised Difference Vegetation Index time series for the growing seasons (2000–2018 in comparison to 2020–2021), was used to assess droughts for 17 constituencies (Botswana's fourth administrative level) in the Central District of Botswana. Further analyses by LULC types and land change highlighted the vulnerability of both human and natural systems to drought. Identified drought periods in the time series correspond to declared drought years by the Botswana government. Drought severity (extreme, severe, moderate and mild) and the percentage of land areas affected varied in both space and time. The growing seasons of 2002–2003, 2003–2004 and 2015–2016 were the most drought-stricken in the entire time series, coinciding with the El Niño southern oscillation (ENSO). The lower-than-normal vegetation productivity during these growing seasons was evident from the analysis. With the above-normal vegetation productivity in the ongoing season (2020–2021), the results suggest the reversal of the negative vegetation trends observed in the preceding growing seasons. However, the extent of this reversal cannot be confidently ascertained with the season still ongoing. Relating drought severity and intensities to LULC and change in selected drought years revealed that most lands affected by extreme and severe drought (in descending order) were in tree-covered areas (forests and woodlands), grassland/rangelands and croplands. These LULC types were the most affected as extreme drought intersected vegetation productivity decline. The most impacted constituencies according to drought severity and the number of drought events were Mahalapye west (eight), Mahalapye east (seven) and Boteti west (seven). Other constituencies experienced between six and two drought events of varying durations throughout the time series. Since not all constituencies were affected similarly during declared droughts, studies such as this contribute to devising appropriate context-specific responses aimed at minimising drought impacts on social-ecological systems. The methodology utilised can apply to other drylands where climatic and socioeconomic contexts are similar to those of Botswana.

**Keywords:** Normalised Difference Vegetation Index (NDVI); Vegetation Condition Index (VCI); drought; land use-land cover; remote sensing; Botswana

### **1. Introduction**

Drought as a slow-onset event is increasingly an environmental hazard due to its negative impacts on natural and human systems, including livelihoods [1–4]. Drought conditions are initiated by precipitation shortfall in comparison to the climatological normal in the focus context and amplified by concurrent heatwaves and extreme high-temperature

**Citation:** Akinyemi, F.O. Vegetation Trends, Drought Severity and Land Use-Land Cover Change during the Growing Season in Semi-Arid Contexts. *Remote Sens.* **2021**, *13*, 836. https://doi.org/10.3390/rs13050836

Academic Editor: Elias Symeonakis

Received: 8 January 2021 Accepted: 18 February 2021 Published: 24 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

events [5,6]. Impacts relate to insufficient availability of water to meet human and nature's needs, seasonal moisture deficit including soil moisture and extensive evaporation resulting in a decline in vegetation greenness and health, plant mortality, reduced water levels in dams and other adverse ecological and/or socioeconomic conditions [7–10]. Studies have found and are forecasting increasing drought duration, severity and frequency in different world regions [11–16].

Drylands, which by their very nature are water deficient due to limited rainfall and water supplies [17], are particularly vulnerable to droughts. In African drylands, the occurrence of droughts is not unusual. Some studies have found increasing drought events in African drylands [4,18], whereas other studies project future increases in droughts and other high-temperature events [19,20]. In the arid regions of South Africa, extreme hightemperature events in parks are increasing in frequency [18]. In Botswana, where this study was conducted, there is public awareness on issues relating to drought due to its negative impacts, particularly on agriculture, which is largely rainfed [21]. Studies have found observable changes of varying magnitudes in rainfall, temperature trends and drought over Botswana [21–24]. With future global warming, droughts are projected to increase in frequency and severity in this region based on regional climate model simulations [5,25]. In the face of climate variability and change, it is increasingly important to assess and monitor droughts because of the need to adapt and minimise impacts on the social-ecological systems in African drylands.

The need to monitor and assess droughts in drylands calls for the use of Remote Sensing (RS)-based data and methods as complementary to meteorological gauge data and climatological indices due to the drawbacks of using only ground-based data, such as inadequate spatial and temporal coverage of meteorological station data. RS image data and vegetation indices are widely used in drought monitoring [26–29]. With the increasing availability of free satellite image datasets of good temporal and spatial resolution, RS affords rapid and cost-effective assessment and monitoring of droughts. Drought effects on vegetation and ecosystem services were examined in the Bobirwa sub-district, Botswana using RS-based indices [30]. Although the drought situation is increasingly assessed in Botswana, the use of RS in examining droughts in Botswana is still very limited. Moreover, how drought severity differs between land systems to further exacerbate impacts in this region is not clear.

Using Normalised Difference Vegetation Index (NDVI) image time series datasets, this study contributes to the understanding of the spatial and temporal variations of droughts and the effects across LULC types and change. It integrated remote sensing with spatial statistics in Geographic Information System (GIS) for the analysis and mapping of drought. Variability and vegetation trends over 18 years for the growing seasons between 2000–2001 to 2017–2018 were assessed and afterwards compared to the ongoing growing season (2020–2021). Thus, this study better captured the seasonality of vegetation growth when considering drought severity as this relates to rainfall in dryland contexts. The spatial and temporal evolution of drought severity was assessed and mapped for each year in the time series. We provided an improved methodology for the examination of drought evolution and severity by incorporating indices of LULC change. Transitions in areas of LULC change and persistence, i.e., areas where no changes occurred, are useful indices to understand the processes, especially anthropogenic, driving the observed trends in vegetation productivity and how these relate to drought severity. Most RS-based studies on drought have not evaluated the effects according to LULC types and change. For the examination of drought severity by LULC and change to be meaningful, we further considered the differences in annual LULC configurations and sizes. Considering that both configuration and size differ from year to year, we utilised the annual LULC time series for analysis in each year identified as drought-stricken. Moreover, drought impacts on land-based resources upon which much of livelihoods are dependent would either be exacerbated or ameliorated depending on how the land is put to use and the land management practices. To better demonstrate RS capabilities for assessing drought severity at a finer, sub-national scale, the

assessment was conducted in 17 constituencies (a constituency is the fourth administrative level in Botswana).

#### **2. Materials and Methods**

*2.1. Description of the Study Location*

Seventeen constituencies in the Central District of Botswana (CDB) in the eastern part of the country were used as case studies (Figure 1). The CDB is the largest amongst the nine districts of Botswana both in terms of population and geographic size (Table S1 details the population and geographic area of the 17 constituencies). The district has 576,064 inhabitants (29% of Botswana's population) as of the 2011 census. With 26% of Botswana's land area, it covers an area of approximately 147,730 km2. With a semi-arid, hot steppe climate (Koppen's BSh classification), rains occur in the summer months with peak rainfall in January (71–142 mm). Annual average rainfall at the constituencies ranged from 321 mm to 430 mm. Temperature ranges from 32 to 39 ◦C and can occasionally exceed 40 ◦C [21]. As in most parts of Botswana, the annual evaporation rate of about 2000 mm year−<sup>1</sup> far exceeds that of the rainfall (475–525 mm year<sup>−</sup>1) [4].

**Figure 1.** Study location: (**a**) Central District in eastern Botswana; (**b**) Land use-land cover for 2018; (**c**) Peak distribution of rainfall in the 17 constituencies examined in the study.

The district is of great economic importance to Botswana, with 23% (31,634 holdings) of all traditional agricultural holdings in Botswana [31]. Moreover, the majority of the mines in Botswana are located in the CDB, such as the Morupule coal mine and the diamond mines in Lerala, Orapa and Letlhakane.

#### *2.2. Data Sources*

Variability in vegetation condition and drought severity in the CDB were examined over 18 years using the 1 km Normalised Difference Vegetation Index (NDVI) decadal (i.e., 10-day composite) image time series from the Copernicus Land Monitoring Service (https://land.copernicus.eu/global/products/ndvi, accessed on 8 January 2021). These images were made available through the European Union–African Union-funded project on Monitoring for environment and security in Africa (MESA). The MESA was implemented for the Southern African Development Community (SADC) region comprising 15 countries and included Madagascar and the Democratic Republic of Congo. The dekadal NDVI datasets from October 2000 to 2014 were derived from SPOT VGT, and data from June 2014 to 2018 are from the PROBA-V [32]. These long-term, 1-km NDVI datasets from the two sensors have been pre-processed at the source to ensure compatibility and continuity [33].

The land cover datasets were from the European Space Agency (ESA-LC) Climate Change Initiative (CCI-LC v.2.0.7) ESA CCI and Copernicus Climate Change Service (C3S-LC Mv52 https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab= form, accessed on 8 January 2021). These datasets are a consistent series of multi-sensor annual maps from 1992 to 2018 [34]. The land categories in the ESA-LC were aggregated into six categories for compatibility with previous studies in Botswana [35]—tree-covered areas, grassland, cropland, water bodies, artificial surfaces (settlement including infrastructure) and otherland (Table S2). Tree-covered areas comprise forests and woodlands. In Botswana, forested areas are defined as comprising multi-layered tree canopies with over 10% cover, a minimum size of 0.5 ha and heights of more than 5 m [36]. Grassland incorporated shrubland and other sparse vegetation, whereas the otherland category combines bareland, rock outcrops and dunes [35].

#### *2.3. Methods*

#### 2.3.1. Indicators of Vegetation Variability

Indices used for measuring vegetation variability and drought severity are based on the NDVI. NDVI is widely utilised for assessing and monitoring vegetation greenness, net primary productivity, plant phenology and land degradation in natural and human systems [35]. NDVI is calculated as in Equation (1), where NIR and R are the near infrared and visible red portions, respectively, of the electromagnetic spectrum. Photosynthetically active plants absorb more incident radiation in the visible red portion but reflect more in the near infrared portion [37].

$$\text{NDVVI} = \frac{\text{NIR} - \text{R}}{\text{NIR} + \text{R}} \tag{1}$$

Three indicators of vegetation variability utilised in this study were derived from NDVI—NDVI difference, NDVI anomaly and NDVI trends. Other derived metrics to gauge seasonal vegetation productivity include the NDVI mean, maximum and cumulative values computed for each month in the growing season within the time series.

#### NDVI Difference

To analyse the variability of vegetation during the vegetation growing season over the 18-year study period, the NDVI Difference (NDVIdiff) function implemented in the MESA Drought Monitoring Services (DMS) software was utilised. NDVIdiff is widely used to get an indication of vegetation state over a specific period by comparing vegetation productivity between two dekads or relative to the long-term average for the same period. This indicator highlights areas where vegetation is under stress as well as those performing well. For this study, seasonal NDVIdiff was calculated for every growing season (i.e., annually) as the difference between the start dekad (i.e., first 10-day period) in October (D1) in a certain year *i* to the end dekad (i.e., last 10-day period) in March (D3) of the following year i + 1 in the time series data of 2000 to 2018 (Equation (2)):

$$\text{NDVI}\_{\text{diff}\ (i,i+1)} = \text{NDVI}\_{\text{Dli}} - \text{NDVI}\_{\text{D3i}+1} \tag{2}$$

#### Standardised NDVI Anomalies

NDVI anomaly captures how vegetation productivity for a certain period deviates from the long-term average dynamics. It is calculated by subtracting the considered month NDVI from the month's long-term average and dividing it by the monthly standard deviation for that period. By distinguishing areas that are normal from those that are above or below normal vegetation productivity, NDVI anomaly is helpful to identify outliers, isolate the variability in the vegetation signal and consider the reviewed period within a meaningful historical context [26].

#### NDVI Trend

Using the NDVI time series of 2000 to 2020 as input, we computed vegetation change as trends and their significance based on the Mann-Kendall (MK) non-parametric test. Non-parametric approaches estimate trends in a time series by quantifying the rate of change in vegetation greenness for each pixel and characterises trends in the data using the median slope [38]. MK tau (τ) coefficient ranges from −1 to +1 with values greater than 0 indicating a continually increasing (greening) trend, and values less than 0 indicating a continually decreasing (browning) trend [39]. NDVI trends were reclassified into increase (>0), decrease (<0) and stable (0). The MK is useful to determine the significance of changes in vegetation productivity over time and is robust to outliers [40]. The reclassified NDVI trends were afterwards assessed by LULC type. In conjunction with LULC types, NDVI trends are useful in detecting land areas that are potentially degraded if significant negative trends are found over time [35].

#### 2.3.2. Drought Indicators

Vegetation Condition Index

Drought severity during the growing season was measured using the NDVI-based Vegetation Condition Index (VCI) as in Equation (3) [27]. VCI compares the NDVI of a given period *j* (NDVIj) with the long-term minimum NDVI (NDVIltmin) and long-term maximum NDVI (NDVIltmax) computed over a 10-year time series for the same period.

$$\text{VCI}\_{\text{j}} = \left(\frac{\text{NDVI}\_{\text{j}} - \text{NDVI}\_{\text{ltmin}}}{\text{NDVI}\_{\text{ltmax}} - \text{NDVI}\_{\text{ltmin}}}\right) \times 100\tag{3}$$

VCI is particularly useful for agriculture, as it assesses changes in NDVI through time since vegetation is water-stressed due to water deficiency such as during drought. VCI for the vegetation growing season was calculated between 2000–2001 and 2017–2018 starting with the first dekad of October in the previous year to the third dekad in March of the following year. VCI is measured as a percentage with values ranging between 0 (lowest) and 100 (highest), with values equal to or below 40% considered as drought to varying degrees of severity (Table 1). The two VCI-based indicators used for characterising drought are drought intensity and drought frequency.


**Table 1.** Vegetation Condition Index (VCI) -based drought severity classes (Adapted from [41]).

#### Drought Intensity

Drought intensities for each growing season were calculated through the 18-year study period. These are percentages of pixels (a proxy for the surface land area) whose VCI values fell within the different drought severity and non-drought categories [41]. The evolution of the drought hazard was examined for each year in the time series based on the VCI.

#### Drought Frequency

To relate the pixel-level VCI to each of the 17 constituencies in the CDB, the zonal statistics function in GIS was used. The median VCI value within each constituency was utilised because there is the tendency for more years in drylands to have rainfall below the mean, with the median value deviating more from the mean. Due to the sensitivity of the mean to outliers, the median is a better data distribution measure to gain further insight regarding the frequency of droughts in each constituency. Drought severity was compared to the official declarations of drought years by the government. Drought frequency was computed as the count of the number of drought years in the entire time series (growing seasons 2000–2001 to 2017–2018). A year with an annual VCI value of 40 or less is identified as drought-stricken and the sum of such drought years per constituency in the entire time series amounts to the frequency of drought [26].

#### 2.3.3. Land Use-Land Cover Change

In addition to vegetation trends, drought severity was examined according to LULC types. It was also examined at the constituency level, to take into cognisance the environmental and administrative basis of drought impacts, respectively. For each drought year, drought severity effects on each LULC type were assessed based on the intersection of VCI and LULC values for that particular year. Drought intensity is expressed as a percentage of the area under each LULC type affected by varying drought severity and non-drought conditions. Percentages of the surface land area affected were then normalised by the size of the LULC type for each year. Since LULC configurations and sizes vary from one year to another, the annual LULC map for each identified drought year was utilised. Only for the land change analysis aspect was the maps of the years 2000 and 2018 used for post-classification change detection.

#### **3. Results**

#### *3.1. Vegetation Variability and Trend*

The spatial and temporal variations in vegetation productivity were presented based on analyses during the growing seasons, i.e., October to March, of the 18 years. Providing insights regarding each growing season throughout the entire time series, NDVIdiff maps (Figure 2) depict limited vegetation productivity in the CDB in years 2000–2001, 2003– 2004, 2005–2006, 2007–2009, 2013–2014 and 2015–2018. In other years, such as 2002–2003 and 2010–2011, vegetation productivity was low mostly in the northern and the eastern parts of the district. Improvement in vegetation performance occurred during 2005–2006, 2009–2010, 2010–2011 and 2014–2015.

Figure 3a,b compared maximum, mean and cumulative NDVI in the entire time series (2000–2018) with those of the drought years (2002–2003 and 2003–2004) and non-drought year (2009–2012). Vegetation productivity for 2020–2021 as the ongoing growing season is also depicted. Vegetation productivity during the 2009–2010 non-drought growing season was higher than the mean of the entire time series for most months, except in mid-February to March, whereas, for this current growing season (2020–2021), vegetation productivity is well above the mean of the entire time series from mid-October 2020 to February 2021. NDVI anomalies (Figure 3c) captured lower or higher than normal vegetation productivity in the time series. There were some extended periods of very low vegetation productivity over multiple growing seasons. Examples are the period from the middle of the 2002–2004 growing seasons and 2004–2005 to the start of 2005–2006 growing seasons. The improved vegetation productivity during the growing seasons of 2005–2006, 2009–2010, 2010–2011 and 2014–2015 was further confirmed by the NDVI anomaly (Figure 3c). NDVI anomalies (Figure 3c) captured lower or higher than normal vegetation productivity in the time series.

**Figure 2.** Variability in vegetation productivity based on the seasonal NDVI difference for the growing seasons (October to March) in the time series.

The NDVI trend was analysed as an indicator of vegetation productivity change. Areas of negative trends amounted to about 90% of the land area in the CDB (Figure 4). Decreasing NDVI trends mostly occurred in the north-east (Nkange, Shashe west, Tonota) and to the south (e.g., Sefhare-Ramokgonami, Mahalapye east). Decreasing vegetation trends imply that these areas experienced an overall decline in vegetation cover and biomass. Increasing NDVI trends (4% of CDB's land area), implying an improvement in vegetation productivity, were most pronounced in the north-west (e.g., Boteti west) and the eastern tip along the Motloutse River (Bobonong). Areas of stable vegetation productivity (6%) were mostly in the western parts around Serowe west, Shoshong, Palapye and Serowe north. The direction of these trends and the extent of land area affected are generally in line with the overall national vegetation productivity dynamics [35].

**Figure 3.** Profiles of NDVI metrics for the growing seasons: (**a**) Mean and maximum NDVI in the time series compared to multi-drought seasons (2002–2003 and 2003–2004), non-drought season (2009–2010) and 2020–2021 as the current growing season; (**b**) Seasonal cumulative NDVI for these metrics as in Figure 3a; (**c**) NDVI anomaly. Grey circles indicate when the anomaly occurred and sizes interpreted as follows: small circle = a part of the growing season was affected (indicated either as start or mid), medium circle = entire growing season was affected, large circle = multi-year growing seasons were affected.

#### *3.2. Spatio-Temporal Evolution of Drought Severity during Vegetation Growing Seasons*

Drought negatively impacts vegetation growth, the supply of water for nature's needs (e.g., to sustain wildlife) and human needs (e.g., for livelihoods and food security). Seasonal maps depicting spatial and temporal variations in drought severity for the CDB were produced (Figure 5a). The growing seasons of the years 2002–2004 and 2015–2016 were the worst drought periods in the entire series. The growing seasons of 2004–2007, 2012–2013 and 2016–2018 were also affected but to a lesser extent. During droughts, the most affected areas were towards the west, south and the eastern tip of the district except for 2002–2004, when the entire district was affected by drought. Land areas most affected by drought were in Mahalapye west and east, Boteti west, Shoshong, Shashe west, Palapye and Bobonong. The magnitude of the drought hazard varied between the years considered in the time series (Figure 5b). The highest percentages of extreme, severe, moderate and mild droughts were recorded during 2002–2005. These years corresponded to drought years declared by the government [31]. Drought severity ranged from extreme to mild as lower than normal, erratic rainfall amounts were recorded in the CDB, similar to most parts of Botswana during these years.

**Figure 4.** Spatial patterns of vegetation productivity change between 2000–2001 and 2020–2021 (Makgadikgadi salt pans in the north is indicated as white using the waterbody mask): (**a**) Vegetation trends; (**b**) Significance of trends.

**Figure 5.** Evolution of VCI drought severity and intensity through the time series in the Central District: (**a**) Growing season VCI (October to March); (**b**) Percentage of land area affected by varying drought severity and non-drought conditions based on the VCI by years.

#### *3.3. Area of Land Use-Land Cover Change and Persistence*

In the CDB, areas of LULC persistence, i.e., unchanged, between 2000 to 2018 amounted to 88% (130,166 km2). Figure 6a depicts the spatial distribution of areas of persistence as well as losses where LULC types have been displaced, i.e., transitioned to other land uses. Figure 6b shows the share of land under each persistent and transitioned land uses between the year 2000 and 2018 as a percentage of the surface land area (Table S3 provides the LULC transition matrix in km<sup>2</sup> between 2000–2018). Land transitions in the matrix are to be interpreted as 'from-to' changes, whereby a particular LULC type in 2000 (initial year) transitions to another LULC type in 2018 (target year). For example, 37% and 2% of tree-covered areas were derived from grasslands and croplands, respectively. Thus, tree-covered areas increased from 11.5% of the total land area of the CDB in 2000 to 14.5% in 2018. Other notable transitions are the expansion of artificial surfaces such as settlements, with 60% derived from grassland, 1.6% from tree-covered, 1.4% from cropland and 1.8% from otherland areas. Thus, artificial surfaces increased from 0.05% in 2000 to 0.13% in 2018. The main gains by grassland were from tree-covered areas (2.8%) and cropland (1%). The main gains by cropland were derived from tree-covered (8%) and grassland (~28%). Although cropland expanded over time (increased from 6.3% of the total land area in 2000 to 8% in 2018), it lost 2.3% through its conversion to tree-covered areas, 1% to grassland and 1.4% to artificial surfaces.

**Figure 6.** Changing land use-land cover conditions between 2000 and 2018: (**a**) Areas of land use-land cover loss and persistence; (**b**) Land use-land cover transitions from the year 2000 to 2018 in percentages (areas of persistence for each land use-land cover class are in bold, that is, areas of no change that remained in the same land class over the 18 years).

3.3.1. Land Change and Associated Vegetation Trends

With about 90% of the land area in the CDB experiencing negative vegetation trends, we investigated how changes in vegetation productivity (i.e., increasing, stable and decreasing) are associated with land change. The focus is on major LULC types (tree-covered

area, cropland, otherland and grassland), as these made up over 95% of the study area as of 2018. Figure 7a–d depict the spatial distribution of vegetation trends found in these major LULC types. Figure 7e shows the percentage of land area by vegetation productivity change (direction and magnitude). By LULC categories as of 2018, vegetation productivity decreased in about 98% and 94% of tree-covered areas (such as forests and woodlands) and croplands, respectively, and 94% of grasslands. In other words, the majority of treecovered, cropland and grassland areas experienced decreasing vegetation productivity as of 2018. Seventeen percent (17%) of wetlands and settlement areas, respectively, and 12% of otherland experienced increasing trends, signifying improved vegetation productivity.

**Figure 7.** Distribution of vegetation trends for major land use-land cover categories for the year 2018: (**a**) Tree−covered area; (**b**) Cropland; (**c**) Otherland; (**d**) Grassland; (**e**) Vegetation change as percent area of land in (i) 2018 land categories, (ii) areas of land change between 2000 and 2018 and (iii) areas of land persistence where no change occurred between 2000 and 2018.

Between years 2000 and 2018 in the loss areas, the greatest percentage of decreasing vegetation productivity (above 90%) were found in tree-covered, settlement and cropland areas. Areas with increasing trends in loss areas are otherland (34%), wetlands (137%) and grasslands (2%). In areas of persistence, vegetation productivity declined mostly in the same LULC types as in loss areas, whereas it improved in 27% of settlement areas, 17% of wetlands, 11% of otherlands and 4% of grasslands. Minimum and maximum values of NDVI trends in areas of land loss varied between forests (−0.3, 0.21), grassland (−0.28, 0.70), cropland (−0.27, 0.24), wetland (−0.26, 0.54), settlement (−0.24, 0.02) and otherland (−0.23, 0.35).

#### 3.3.2. Drought Severity by Land Use-Land Cover Type

Focusing on the drought years identified earlier in the time series analysis of vegetation variability and drought severity (refer to Figures 2 and 5), we examined how drought severity differed between the major LULC types. The 2002–2003 drought-stricken growing season was used as an example because it was the worst drought experienced in the entire time series (Figure 8a–d). For example, land areas most impacted by extreme and severe droughts, respectively, in 2003–2004 are: over otherland (32%, 31%), grassland (19%, 39%) and cropland (12%, 37%). Drought intensities for areas under each LULC type for the identified drought years are shown in percentages alongside drought severity classes (Figure 8e).

**Figure 8.** Distribution of drought severity for major land use-land cover types: (**a**) Tree-covered area; (**b**) Cropland; (**c**) Otherland; (**d**) Grassland, during the growing season of 2002–2003 (a drought year); (**e**) Percentage of land area under varying drought severity and non-drought conditions based on VCI for selected drought years by land use-land cover types (T = Tree-covered areas, G = Grassland, C = Cropland,W=Wetlands/Waterbodies, S = Settlement, O = Otherland).

#### **4. Characterising Drought in the Constituencies**

*Drought Severity in Constituencies in Comparison with Drought Declaration*

Drought and household food security vulnerability assessments are conducted annually during the mid-growing season in Botswana [42]. Since assessment and interventions are conducted at local levels, we further examined the severity of the drought in the 17 constituencies (Figure 9). The drought frequency and heatmap reveal how the constituencies were affected by droughts of differing magnitudes throughout the entire time series.

**Figure 9.** Heatmap of drought severity at constituency level during the growing seasons of 2000–2001 to 2017–2018. The years with dashed lines across were declared drought years by the Botswana government [31].

Figure 9 depicts drought intensities for each year's growing season and drought frequency per constituency in the entire time series. This heatmap is based on the median VCI value for each constituency (heatmaps of drought severity using the minimum and mean VCI values are shown in Figures S1 and S2, respectively). The prolonged drought during the multiple seasons of 2002–2004 is evident in the heatmap. Based on the count of drought occurrences, irrespective of severity classes between 2000–2018, constituencies with the most frequent droughts in descending order are Mahalapye west (eight), Mahalapye east (seven), Boteti west (seven), Shoshong (six), Bobonong (five), Boteti east (five) and Palapye (five). Other constituencies experienced between two and four drought occurrences with lesser severity. Of the 16 declared drought years by the government (the years with dashed lines in Figure 9), eight were evident in the time series, whereas the other years were favourable for the CDB.

#### **5. Discussion**

#### *5.1. Vegetation Condition Change and Drought Severity*

There was high spatial and temporal variability in vegetation productivity during the growing seasons in the 18-year study period. This is typical of dryland ecosystems which are often non-equilibrium and dynamic in response to both climatic and anthropogenic perturbations [43]. Recovery of vegetation during the start of the time series (2000–2001) after the prolonged droughts of the 1990s was evident. This finding is corroborated by [21], which noted above-normal rainfall in the CDB in the year 2000. For example, analysing rainfall amount between 1960–2015 for Palapye, the authors in [32] noted that vegetation condition improved in 1999–2000 after 649 mm rainfall was recorded that year, which was well above the long-term average of 351 mm.

Comparing vegetation productivity in the CDB during drought and non-drought years with the mean for the entire time series revealed the impacts droughts have on vegetation. For example, when compared to the mean, vegetation productivity was very limited in 2002–2003 (the worst drought episode in the time series). Drought occurrence was evident during the growing seasons of eight declared drought years in line with [21,30]. FAO special alert for Southern Africa in 2015 noted the retarded growth of early-planted crops as soil moisture was very low at the beginning of the growing season in most parts of the southern African region, including Botswana [44]. The drought and household food security outlook report of 2017 [45] attributed the decrease in vegetation productivity in most parts of Botswana to negative drought impacts on vegetation.

The lower than normal vegetation productivity during some of the drought-stricken growing seasons can be attributed to droughts linked to El Niño southern oscillation (ENSO). For example, the prolonged droughts in the growing seasons of 2002–2004 and 2015–2016 coincided with the El Niño years in the recent records [4]. Relating the association of ENSO to drought severity during the growing season as utilised in this study, the authors of [4] found the highest statistically significant correlations in January, February and March in Botswana, whereas they found negative non-significant correlations at the start of the season in October. At the regional level in southern Africa, the authors of [46] associated droughts with anomalies of negative Standardised Precipitation Evaporation Index and positive Sea Surface Temperature. At the global level, studies have also documented the effects of ENSO on drought severity, such as [46].

Comparing the ongoing growing season (2020–2021) with the mean for the entire time series, we found above-normal vegetation productivity after mid-October 2020 until February 2021. Thus, this suggests the full recovery of vegetation productivity during this season from the impacts of the prolonged droughts in the last couple of growing seasons. However, this observation is somewhat fraught with uncertainty judging from the below-normal vegetation productivity at the start of the season. Moreover, the growing season has not ended yet. The growing season spanning the first dekad in October to the third dekad in March was chosen to align the cropping and the raining season in Botswana, which enabled the exclusion of the dry season from the drought analyses.

#### *5.2. Vegetation Trend and Drought Severity by Land Use-Land Cover and Change*

Many studies on droughts have not examined how drought effects differ between LULC types. For those that have land use incorporated, little is known of the influence of drought severity on LULC—in terms of the differences in annual sizes and configurations either in changing and/or persistent areas. Processes driving decreasing vegetation trends, either climatic or anthropogenic, are better identified when LULC and change are incorporated in the examination of drought severity. For example, in CDB between 2000 and 2018, vegetation productivity declined in most forests, woodlands, croplands and grasslands. In land change areas, the trend of declining vegetation was equally high. In areas of persistence, the greatest percentage of improved vegetation productivity was in wetlands, settlements, and otherlands. Minimal improvement of vegetation trends in forested areas can be attributed to the overall increase in tree-covered areas during the study period.

Previous studies in dryland contexts such as Botswana and elsewhere associate the improvement in vegetation productivity partly to bush-encroachment which remote sensing vegetation indices capture as vegetation greening but bush-encroachment is undesirable in cattle-based systems [32,47,48].

Relating LULC change to land degradation, conversion of tree-covered areas to grassland, otherland and cropland, is degrading. This is because these land transitions drive the removal of vegetation cover and contribute to land degradation processes. Similarly considered as degrading land transitions are those involving the conversion of grasslands into croplands, artificial surface areas and otherlands. For example, in Palapye, the authors of [32,49] found increases in barelands and rock outcrops with limited vegetation growth because of prolonged droughts. This bareland condition due to drought-induced vegetation decline is further exacerbated by human activities, such as overgrazing. In some instances, such as in Bobonong, researchers [30] found increases in bareland patches in communal grazing areas. Despite declining vegetation conditions during a four-year prolonged drought (2002–2005), livestock overgrazed and natural pastures degraded, as pastoralists had no incentive to destock or sell their cattle because of the slump in prices due to the prevalence of Foot-and-Mouth cattle disease during the drought. In other dryland instances, such as in the Sahel, bareland areas with minimal vegetation growth alternate with grasslands in response to rainfall variability and drought [50].

Drought impacts on grasslands, forests and wetlands imply negative impacts on the cattle system and biodiversity, including wildlife in savannas with the associated tourism and hospitality sector, whereas effects on croplands impact food production. For example, the water crisis of 2015–2016 resulting from relatively low, erratic rainfalls reduced water levels and water inflows into dams drastically across the country [51]. Regarding drought impacts on agriculture, for example, maize production in Botswana declined from 35,322 tons in 2011 to 13,911 tons in 2017 mainly due to drought constraints. This necessitated an expenditure of about P389 Million (~36 Million American Dollars) on maize import to meet cereal requirements not met through domestic crop production [31]. Other changes in LULC, such as the changes in the extent of settlements, as observed between 2000 and 2018 in this study, are not drought-related. Settlement expansion is more a reflection of the increasing land demand for human habitation and infrastructure due to the population growth experienced in the CDB.

#### *5.3. Drought Severity in the Constituencies*

Relating drought severity to the years declared as drought-stricken by the government, the results reveal that not all constituencies were equally affected by drought, as severity differed from severe to mild drought. Moreover, drought severities in some declared drought years were not as widespread in the CDB as in other parts of the country. For example, the growing season of the year 2009–2010 had improved vegetation productivity in response to above-normal rainfall recorded in previous months, which resulted in flooding events in five sub-districts in the CDB (Serowe/Palapye, Tutume, Boteti, Mahalapye and Bobirwa) [52]. In other drought-prone contexts in southern Africa, such as Zimbabwe, researchers [53] found that a drought's distribution and effects differed geographically and from season to season.

Drought severity was further gauged by the frequency at the constituency level. Over the study period, the constituencies experienced between eight to two drought events. Examples are Mahalapye west and east, with eight and seven drought occurrences, respectively, ranging from moderate to mild drought. Boteti west experienced seven drought events with severity ranging from severe to mild. Confirming these drought frequencies in the CDB, the authors of [30] found an average drought frequency of two to four (depending on the index) in the Bobonong region between 2000 and 2015. This is in line with our finding of five drought occurrences in Bobonong as our time series extending up to 2018 captured more drought events.

Drought effects on land-based resources and livelihoods will vary depending on the drought severity, the land use as well as the land management practices. For example, water use strategy adopted as part of land management practices was found to have impacted the responses of tree plantations to drought in China [54]. Droughts cannot be avoided, since these are an integral part of the climate cycle; however, the impacts on social-ecological systems are better minimised with monitoring as input for the use of both proactive and reactive measures [4,21].

#### **6. Conclusions**

This study proposed a spatial and temporal analysis of drought evolution in the Central District in eastern Botswana from 2000 to 2018. The results highlight the usefulness of incorporating land use-land cover and change in assessing the spatio-temporal variability of drought severity in drylands. Remote Sensing-based vegetation time series metrics were used, as complementary to climatological indices. Indicators characterising changes in vegetation conditions and drought severity during the growing seasons (October to March) from 2000–2001 to 2017–2018 were used. These indicators are NDVI difference—NDVIdiff, NDVI anomaly, NDVI trends, Vegetation Condition Index—VCI, Drought intensity and frequency. The use of these different NDVI-based indicators, which might seem redundant, are useful as complementary measures since they differ computation-wise. The NDVI difference as utilised in this study captured the in-season variability in vegetation productivity, whereas the VCI compared each growing season with the long-term minimum and maximum conditions. For example, limited vegetation productivity found during the growing seasons of 2002–2004 and 2015–2016, which was based on both NDVI difference and NDVI anomalies, agreed with the heightened levels of drought severity over the same periods as derived from the VCI.

Results further showed high temporal and spatial variability in vegetation productivity between drought and non-drought conditions in our case studies. The associated negative impact of droughts on vegetation resulting in limited vegetation productivity was further confirmed by results from this study. Drought effects on vegetation productivity during the study period were characterised by decreasing vegetation trends in most parts of the district. Although varying intensities of drought severity (severe, moderate and mild) occurred in the constituencies, the 2002–2003 and 2003–2004 growing seasons were found to be the worst drought periods in the entire series, as most parts of the district were affected. Assessing drought severity and intensities by LULC in selected drought years revealed varying drought effects. We found that drought effects differed between LULC types as well as whether these were areas of land change or persistence. Further examination of drought impacts in areas of no change is required, as our understanding of drought effects in areas with no change is still limited. More empirical studies in this regard will provide useful insights. Using the example of the 2002–2003 drought-stricken growing season, the highest percentage of land impacted by extreme and severe droughts were found in tree-covered areas, croplands and grasslands, whereas improved vegetation trends were found mostly in wetlands and some instances in otherland areas including barelands. Moreover, the results suggest that even in declared drought years, droughts severity varied, and the effects differed between constituencies. A further insight provided is that the magnitude of drought severity in some declared drought years was not as widespread in the CDB. For example, no other severe drought levels were recorded in the CDB after the extended drought which affected the growing seasons of 2002–2004.

Differences in spatial resolution of the datasets utilised and the coarse spatial resolution of the 1 km NDVI datasets compared to 300 m annual LULC are limitations identified in this study. With the increasing availability of images of higher spatial resolution, such as from SENTINEL-2, results from RS-based analysis of drought can be improved. However, methodological challenges ensue with the need to incorporate the newer images into the existing NDVI archives. For example, consideration ought to be given on parameterising across sensors and balancing the trade-offs between taking advantage of the superior

spatial resolution of the newer satellite missions (e.g., Sentinel-1, 2 and 3) and the temporal resolution of images from the older missions (e.g., SPOT VGT and PROBA-V). As the 1 km, NDVI time series datasets extend way back to the 1990s, their use is indispensable for drought analysis at the current time. Moreover, as drought years were easily detected in the time series analyses, this is proof of the usefulness of the 1 km NDVI time series. For example, the lower than normal vegetation productivity during the prolonged drought periods that negatively impacted the 2002–2004 and 2015–2016 growing seasons coincided with strong El Niño years. With the above-normal vegetation productivity in the ongoing season (2020–2021), results suggest the reversal of the negative vegetation trends observed in the preceding growing seasons. How much these negative trends have been reversed remain uncertain, as the season is still ongoing. For clarity, future studies should examine the usefulness of RS-based indices for understanding the ongoing season's phenology in dryland contexts such as the CDB.

Remote Sensing-based time series enabled us to extend the analysis up to the ongoing season, demonstrating its usefulness for better characterisation of drought events. Remote Sensing-based results such as those obtained in this study, when provided at multiple administrative scales in a timely and cost-effective manner, have the potential to aid decision-makers to better plan and respond to drought situations. Scientific evidence is needed as input into the decision-making process to aid national resource mobilisation for drought management. Botswana requires both proactive and reactive approaches for drought management, for which remote sensing-based assessment and monitoring foster the implementation of drought early warning systems.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2072-4 292/13/5/836/s1, Table S1: Geographic and socioeconomic characteristics of the constituencies in the Central District, Table S2: Land cover classification scheme, Table S3: Land use-land cover transition matrix in km2 (2000–2018), Figure S1: VCI minimum value heatmap of drought severity at constituency level during the growing seasons of 2000–2001 to 2017–2018. The years in bold are declared drought years by the Botswana government (Source: [41]), Figure S2: VCI mean value heatmap of drought severity at constituency level during the growing seasons of 2000–2001 to 2017–2018. The years in bold are declared drought years by the Botswana government (Source: [41]).

**Funding:** This research received no external funding.

**Data Availability Statement:** Data supporting reported results can be found through the Copernicus Land Monitoring Service (https://land.copernicus.eu/global/products/ndvi, accessed on 8 January 2021) and the land cover datasets from the European Space Agency CDS (https://cds.climate. copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=form, accessed on 8 January 2021).

**Acknowledgments:** The author is grateful for free access to all datasets used in this study including the provision of satellite image time series and support with the Drought Monitoring System software by the MESA-SADC Thema project. Four anonymous reviewers provided constructive feedback that improved the paper's quality.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **Classification Efficacy Using K-Fold Cross-Validation and Bootstrapping Resampling Techniques on the Example of Mapping Complex Gully Systems**

**Kwanele Phinzi 1,\*, Dávid Abriha <sup>1</sup> and Szilárd Szabó <sup>2</sup>**


**Abstract:** The availability of aerial and satellite imageries has greatly reduced the costs and time associated with gully mapping, especially in remote locations. Regardless, accurate identification of gullies from satellite images remains an open issue despite the amount of literature addressing this problem. The main objective of this work was to investigate the performance of support vector machines (SVM) and random forest (RF) algorithms in extracting gullies based on two resampling methods: bootstrapping and k-fold cross-validation (CV). In order to achieve this objective, we used PlanetScope data, acquired during the wet and dry seasons. Using the Normalized Difference Vegetation Index (NDVI) and multispectral bands, we also explored the potential of the PlanetScope image in discriminating gullies from the surrounding land cover. Results revealed that gullies had significantly different (*p* < 0.001) spectral profiles from any other land cover class regarding all bands of the PlanetScope image, both in the wet and dry seasons. However, NDVI was not efficient in gully discrimination. Based on the overall accuracies, RF's performance was better with CV, particularly in the dry season, where its performance was up to 4% better than the SVM's. Nevertheless, class level metrics (omission error: 11.8%; commission error: 19%) showed that SVM combined with CV was more successful in gully extraction in the wet season. On the contrary, RF combined with bootstrapping had relatively low omission (16.4%) and commission errors (10.4%), making it the most efficient algorithm in the dry season. The estimated gully area was 88 ± 14.4 ha in the dry season and 57.2 ± 18.8 ha in the wet season. Based on the standard error (8.2 ha), the wet season was more appropriate in gully identification than the dry season, which had a slightly higher standard error (8.6 ha). For the first time, this study sheds light on the influence of these resampling techniques on the accuracy of satellite-based gully mapping. More importantly, this study provides the basis for further investigations into the accuracy of such resampling techniques, especially when using different satellite images other than the PlanetScope data.

**Keywords:** satellite imagery; gully mapping; machine learning; random forest; support vector machines; South Africa; semi-arid environment

#### **1. Introduction**

Defined as the detachment, transportation, and deposition of soil particles by the erosive forces of raindrop and runoff [1,2], soil erosion by water represents one of the most typical forms of land degradation affecting many countries around the world [3]. While soil erosion has many negative effects, the most concerning one include the decline in soil fertility, resulting in limited food production [4,5]. This, in turn, contributes to food insecurity in several developing countries, particularly in those ones where a considerable segment of their population strongly relies on agriculture for their survival [6]. South

**Citation:** Phinzi, K.; Abriha, D.; Szabó, S. Classification Efficacy Using K-Fold Cross-Validation and Bootstrapping Resampling Techniques on the Example of Mapping Complex Gully Systems. *Remote Sens.* **2021**, *13*, 2980. https:// doi.org/10.3390/rs13152980

Academic Editor: Elias Symeonakis

Received: 6 June 2021 Accepted: 26 July 2021 Published: 28 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Africa, with approximately six million people deriving a livelihood from agriculture [7], is extremely exposed to soil erosion. Formal agriculture provides employment to about 930,000 farm workers, including seasonal and contract workers [7]. Given the geomorphological conditions coupled with the strongly seasonal nature of rainfall across South Africa, it is not surprising that the country is predisposed to soil erosion, a serious threat to sustainable agriculture and natural environments [8]. Soil erosion in South Africa, especially in rural communities, has been further aggravated by human activities such as inappropriate agricultural practices and overstocking [9–12].

Although various types of water-borne erosion exist in the country, gully formation has been recognized as the major form of erosion in South Africa, accounting for considerable volumes of soil loss [13,14]. Accordingly, the Department of Agriculture, Forestry, and Fisheries (DAFF) in South Africa has identified the need to determine the spatial extent of gullies and their severity at a national scale [15]. Gullies occur when the soil and its parent material are scored and destroyed by surface runoff, resulting in the formation of v-shaped incised channels [16]. Gullies can either be classified as ephemeral or classical (also called permanent) based mainly on their depth. Unlike ephemeral gullies, classical gullies are deeper than 0.5 m and cannot be easily filled in by normal tillage [17], especially in highly dissected terrains [18]. Gullies also result from piping and tunneling due to the influence of soil chemistry on hydrological pathways [19]. The prevalence of erodible duplex and dispersive soils in certain parts of South Africa, especially the Eastern Cape where the subsurface (piping) erosion mostly occur, considerably facilities the formation and development of gullies [9,14]. Land use type and changes also trigger gully initiation [19]. In the context of South Africa, gullies are more prominent on gently sloping lands suitable for cultivation [15]. The spatial extent and severity of gully erosion vary from one province to another because of the differences in land use, soil types, vegetation, rainfall, and topography existing in different provinces. The Eastern Cape is one of the most gully-affected provinces in South Africa, with about 161,500 ha of land covered by gullies [15]. For this reason, most gully erosion studies in the country have been conducted in this province [9,14,20–22].

Accurate mapping of gullies is essential for monitoring gully erosion and understanding the associated environmental and socio-economic impacts [23], thereby supporting the implementation of practical erosion control measures [24,25]. Manual field-based assessments using tapes, rulers, and topographic profilers have been used for years to obtain gully information [26], but over the last few decades, rapid developments had been witnessed in digital aerial photography, and more recently, satellite images with different imaging capabilities [23]. Following the availability of such remotely sensed data, gully information has either been obtained through visual interpretation or automatic classification of remotely sensed data. Remote sensing related mapping, either based on visual interpretation or automatic method, is presently the only practical approach for mapping gully features over large areas, in arid or semi-arid regions, given the complexity of gully appearance (i.e., variability in size, shape, and occurrence) [27]. Although nowadays visual interpretation is regarded as the most traditional and time consuming method, some researchers still prefer it over the automatic method [15,28] because automatically-classified results are still subject to the characteristics of the selected training samples, algorithms, and satellite image, among other factors [29]. However, the low efficiency, uncertainty and high subjectivity associated with visual interpretation have made most researchers to investigate automatic methods [29].

The automatic extraction of gully information from satellite earth observation data takes two forms: pixel-based and object-based analysis [23,30]. The pixel-based analysis is relatively simple, and is the most frequently used and direct approach for image classification, using only the spectral information [31]. Such spectral information can be extracted using various image classification algorithms such as random forest (RF) and support vector machines (SVM), which thus far, are arguably the most commonly used algorithms due to their classification efficiency in relation to other algorithms, including k-nearest neighbor

(kNN), maximum likelihood (ML), artificial Neural Network (ANN), convolutional neural networks (CNN), discriminant analysis (DA), and minimum distance (MD). One study mapped the areas susceptible to gully erosion using RF and ANN [32], and found that RF performed better than ANN. Noi and Kappas [33] compared SVM, RF, and kNN in land cover classification, and found that SVM, followed by RF, were better than kNN. Phinzi et al. [34] reported that both SVM and RF outperformed linear discriminant analyst (LDA) in a study on gully detection. Although deep learning methods such as CNNs have shown better performance over SVM and RF [35], like most deep learning methods, CNNs also strongly rely on the availability of abundant high-quality training/ground truth data [36]. While CNNs perform well in detecting and differentiating active gullies from other forms of surface erosion (e.g., sheet and rills), they have errors in detecting complex gully systems [37]. For these reasons, SVM and RF still attract most researchers' attention, because of their low computational complexity and higher interpretability capabilities compared to deep learning algorithms [36].

The wide usage of these machine learning algorithms in remote sensing proved that learning features from dataset is more efficient and practical than merely defining the features [38]. Although the application of machine learning in soil erosion research is not new, previous investigations commonly use coarser spatial imagery such as Landsat, ASTER and Sentinel/Sentinel-SAR (Synthetic Aperture Radar), which from an economic point of view makes sense, given that such images are obtainable at no cost. Besides, these sensors are good for wide area mapping of soil erosion. However, what has become apparent from previous studies, is that such sensors cannot identify individual gullies (especially small discontinuous gullies) with sufficient detail, this limitation is attributable to their low spatial resolution [15]. Whereas other optical sensors such as IKONOS, WorldView, and RapidEye with relatively higher spatial resolution exist for gully mapping, these sensors are not readily or freely available, as such, their high acquisition costs limit their application for gully mapping. Similarly, the use of LiDAR-derived elevation data from airborne surveys including Unmanned Aerial Vehicles (UAVs) is limited by a lack of financial resources. Depending on the availability of data and objective of a given study, multi-source and multi-sensor data fusion are common in remote sensing since this provides synthetic data that have the combined advantages of different sensors [39]. Multi-sensor or pixel level data fusion are mainly applied to optical images, for example, the fusion of high resolution panchromatic and low resolution multi-spectral images [40], was successfully applied in gully feature extraction [34]. Multi-source data fusion concerns feature level and decision level fusion of data from various sources such as SAR, optical images, LiDAR, geographic information system (GIS) data, and in-situ data [40]. In our case, we did not perform any data fusion due to lack of data (including the panchromatic band) with suitable spatial resolution necessary for detecting individual gullies.

Despite the unavailability of a higher spatial resolution panchromatic band, the 3 m PlanetScope image, which is available free of charge for research purposes, offers a great potential for detecting individual gullies. However, the capability of PlanetScope image in classifying gullies in different seasons (dry and wet) in an arid or semi-arid environment had been investigated only in areas of large forms (1–5 km length, 100–600 m width) [41]. While machine learning algorithms such as the SVM and RF have been frequently applied, little efforts have been made to investigate the influence of resampling techniques, particularly, bootstrapping and k-fold cross-validation (CV), on the accuracy relations. We identified gullies from PlanetScope images based on these resampling methods. Our aim was (i) to compare the satellite's bands reflectance values from the aspect of gullies, (ii) to reveal which classifier (RF or SVM) and resampling technique (CV or bootstrapping) perform better regarding the overall and class level accuracy metrics, and (iii) which season is more appropriate to identify the gullies.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area was located in the rural part of eastern South Africa, characterized by extensive erosion where permanent gully erosion was the most prominent erosion type [42]. Geographically, the study area lies between 30◦42 30–30◦43 55S 28◦46 22–28◦48 47E, covering a surface of about 10 km<sup>2</sup> (Figure 1). Subsistence agriculture (e.g., crop farming and livestock rearing) and settlement were the main land use types. Grassland was the most common vegetation type throughout the area, with some forest patches found in the north-western section of the study area. The topography ranges from 1213 m–1658 m, with the north-western and south-western sections being steeper than other parts of the area. Steep mountain slopes with gently undulating footslopes characterize the geomorphology of the area [14]. The climate is semi-arid with temperatures ranging from 7–30 ◦C. Winters are cold and dry, with less vegetation due to limited rainfall. Rainfall mostly occurs during the summer season reaching approximately 670 mm on average per year. Although the study area has limited annual rainfall, it experiences high-intensity rainfall events. Gully development in the area was further fostered by the predominance of highly erodible soils such as duplex and dispersive soils [9,43], predominantly underlain by mudstone and sandstone of the Beaufort Group [44]. Although vegetation exists in the wet season, its effectiveness in protecting soil against erosion and inappropriate land-use practices such as overgrazing usually reduces vegetation cover, making the area susceptible to soil erosion. The study area features both continuous and discontinuous gully networks with distinct occurrences and appearances, i.e., narrow, wide, vegetated, shallow, deep with shadows, etc. [14,42]. Additionally, some gullies resemble the unpaved road network in appearance. Such complexity of gullies within the area makes the area particularly suitable for study.

**Figure 1.** Location of the study area (PlanetScope false-color images).

#### *2.2. Data Acquisition and Pre-Processing*

Two cloud-free PlanetScope orthorectified products (Level 3B) for the wet and dry seasons acquired on 23 January 2017 and 25 June 2017, respectively, were used in this study. The images were downloaded from the Planet explorer website (https://www.

planet.com/explorer (accessed on 30 July 2020)). The orthorectified scenes had already been radiometrically and geometrically corrected and projected to the Universal Traverse Mercator (UTM) projection, referenced to the world geodetic system (WGS84) datum. With a spatial resolution of 3 m and temporal resolution of 1 day, the PlanetScope image is comprised of 4 spectral bands: red, green, blue (RGB), and near-infrared (NIR). The flowchart summarizing the workflow followed in this study is presented in Figure 2.

#### *2.3. Gully Classification*

Classification of gullies from the PlanetScope image was conducted in Python software using random forest (RF) and support vector machines (SVM). These were the most widely applied algorithms and their detailed description has been provided in the literature [10,34,36,45–47]. The RF, developed by Breiman [48], is a robust machine learning algorithm that is increasingly becoming more popular in remote sensing of soil erosion. The algorithm has several parameters that need to be tuned, amongst which the ntree (number of trees) and mtry (number of features in each split) are the most important that should be considered when training the algorithm [49]. The models were built using only 4 variables (e.g., four multispectral bands of the PlanetScope image), thus we tested all possible values of the mtry parameter. For the ntree parameter, we tested different values ranging from 50 to 1000. After ntree = 100, the accuracies stagnated while the computational time kept increasing [50]; thus, the final model was trained with 100 individual decision trees, selecting 2 random variables at each split.

The support vector machine (SVM) model was capable of overcoming both classification and regression problems [51,52]. To achieve this, SVM searched for the flat boundary (hyperplane) in some feature space that best separated the classes into homogeneous partitions where each partition contained only data points of a given class [34,49]. In reality, however, it was difficult to find a hyperplane that perfectly separated the classes using just the original features [49]. SVM overcomes this problem in 2 ways: first, loosen what is meant by "perfectly separates," and second, use the so-called kernel trick to expand the feature space to the extent that perfect separation of classes is more likely [49]. The radial basis function (RBF) was chosen for the kernel type. For RBF, a C penalty parameter against misclassifications and a kernel coefficient (γ) as a decision boundary have to be specified, which greatly affects the performance of the model [53]. Hyperparameter tuning was performed with the grid search method.

#### *2.4. Reference Data Collection and Accuracy Assessment*

The reference data were collected through field surveys and visual interpretation of high-resolution Google Earth images. We delineated the study area into 7 land cover classes, of which all were identifiable both in the field and in images (Google Earth and PanetScope): forest, built-up, agriculture, gully, bare soil, and mixed bare soil (i.e., exposed rocks, unpaved roads/dirty roads, and exposed soil mostly in ploughed fields). A total of 966 points were collected using stratified random sampling in ArcMap. Each land cover class was assigned a number of points proportional to its size.

**Figure 2.** Workflow followed in this study (CV: cross-validation; boot: bootstrapping; SVM: support vector machines; RF: random forest).

We evaluated the overall performance of the RF and SVM algorithms using CV and bootstrapping. Kappa coefficients and overall accuracy (OA) were among the most commonly used metrics to evaluate classification accuracy [54]. However, the use of kappa in remote sensing classification accuracy is becoming less common [33,55]. Pontius and Millones [56], Flight and Julious [57], and more recently, Delgado and Tibau [58],

recommend against using kappa because of its inherent limitations. A major limitation of kappa was that it is highly sensitive to the distribution of the marginal totals, potentially producing unreliable results [57]. Thus, we used OA to assess the overall performance of the models. Contrary to the conventional error matrix, which used all of the available data to test the model, CV splits the reference dataset into training and testing data. It used the majority of the data for training and the remainder, often called the holdout sample, was used to test the model, ensuring that the model was robust [49]. In total, we used 17,757 pixels for the wet season and 30,597 pixels for the dry season, generated from PlanetScope. We repeated the 5-fold CV 20 times, meaning that final accuracies were computed from 100 models. Before each repetition, the dataset was randomly shuffled and new folds were generated to increase the robustness of the models. Unlike CV, in bootstrapping, the original data were randomly sampled with replacement, meaning that, after a data point (bootstrap sample) was selected for inclusion in the subset, it was still available for further selection [49]. Two parameters must be chosen before running bootstrapping: sample size and the number of repetitions. In our case, the sample size was the same as the original dataset [59], and we applied 100 repetitions. The models were validated on the samples that were not included in the bootstrap sample.

We used the traditional error matrix to assess the model performance at class level as bootstrapping and CV do not provide class accuracies. An error matrix compared reference data to the classified map using various accuracy indices [54], but in this study, we only focused on class level accuracies/errors: producer's accuracy (PA) and user's accuracy (UA); PA was also known as sensitivity or recall while UA was sometimes referred to as precision. The difference of the possible 100% accuracy and the PA represented the omission error, which occurred when a pixel was excluded from the class to which it belonged. A difference of 100% and UA represented a commission error, which occurred when a pixel was incorrectly included in the class where it did not belong. We computed unbiased area-based PAs and UAs, following "good practice" recommendations for accuracy assessment [60]. The F1-score was also reported as the harmonic mean of UA and PA [61]. Additionally, we computed unbiased areal coverages (ha) of gullies along with their standard errors (ha) and associated ± 95% confidence intervals (ha). We generated 6 algorithms based on the combination of the classifiers: svm and rf, seasons: dry (d) and wet (w), and resampling methods: bootstrapping (b) and cross validation (cv), i.e., rf-d-b, rf-d-cv, rf-w-cv, rf-w-b, svm-d-cv, svm-d-b, svm-w-cv, and svm-w-b.

#### *2.5. Statistical Analysis*

NDVI values of the images, and specifically focusing on the gullies, were compared by the 2 seasons with the robust Mann–Whitney test using the Monte Carlo *p* (*pMC*) with 9999 permutations. We applied the General Linear Model (GLM) to determine the effects of spectral bands (4 bands; RGB + NIR), seasons (wet and dry), and the LULC classes (7 classes). Furthermore, we also determined the statistical interactions to reveal if factorial variables had a common effect (e.g., effects of spectral bands differed by LULC classes or were different in the dry or wet seasons). Besides, we also determined the effect size (ω2) as a standardized measure of the variables' contribution in the model (higher values indicate larger contribution, ω<sup>2</sup> > 0.14 was considered as a large effect [62].

The Dunnett test [63] was used to determine if gullies had significant differences from other land cover types (H0: mean reflectance values of gullies was identical with the other land cover types). The Dunnett test was developed to perform multiple comparisons against 1 control group; in this case, gullies' land cover type was chosen as the control. As in the Dunnett test, the number of comparisons was limited (related to a full factorial approach; i.e., 6 instead of 21). Furthermore, the test compared the factor groups' means with the control group's mean (unlike other tests, which compare group means to the grand mean); thus, it can reveal small significant differences [64], and our intent was to find all overlaps in the reflectance with the gullies.

#### **3. Results**

#### *3.1. Spectral Bands, Land Cover Classes and Seasons as Determinants of Reflectance*

The difference of NDVI values was significant between the two seasons (U = 35303, z = 19.102, pMC < 0.0001). The NDVI for the wet season had relatively higher values ranging from −0.36 to 0.81, while the values for the dry season lay in the range −0.41 to 0.59. The dry season had bimodal distribution while the wet season had multimodal distribution (Figure 3). Such bimodal distribution in the dry season represents non-vegetation (first mode) and vegetation pixels (second mode). Like in the dry season, the first mode in the wet season was indicative of non-vegetation pixels denoted by lower NDVI values compared to the last two modes, represented by relatively higher NDVI values. These last two modes represent vegetated areas: vegetation and forest pixels, respectively.

**Figure 3.** Distribution of NDVI reflectance values in the dry and wet season.

We also compared the NDVIs' of the gullies in the dry and wet seasons. Accordingly, the difference was significant (U = 162, z = 9.5534, *p* < 0.0001). The mean difference was 0.08 in the wet season, green vegetation was also present in gullies, thus, NDVI was larger. According to the results of the GLM we found that the spectral bands, LULC classes, and the seasons, as factorial variables and the interactions, were significant (*p* < 0.001) and explained 92.3% of the variance. Among the factors, the difference of dry and wet seasons had the largest effect on the reflectance (0.868). The bands and LULC classes had almost the same effect with a bit lower value (~0.6), however, also indicating a large effect. Regarding the interactions, we confirmed that reflectance by the band was different by LULC classes and seasons. The contribution of these interactions was large (Table 1). Furthermore, the effect size of the interaction between the seasons and the LULC classes was the lowest, being only third related to the interactions with the bands (0.141), but it still indicated a large effect. The interaction of all factors (spectral bands, seasons, LULC classes) also had a large effect but only with a smaller value (0.185).


**Table 1.** Results of General Linear Modelling (GLM) performed with reflectance as an independent variable (SS: Sum of Squares, df: degree of freedom, F: F-statistic, p: significance, ω2p: effect size; *p* < 0.05: significance level).

The post hoc test performed with the Dunnett test revealed significant differences (*p* < 0.001) between the gullies and other LULC classes in the dry season (Figure 4). The difference was not significant between the gullies and the agricultural areas (blue band), the vegetation and agricultural areas (green band and red band) in the wet season. Table 2 ranks the original band's importance in terms of discriminating gullies. We also studied the differences of NDVI and found that this spectral index was not as successful in discriminating the gullies as the original bands. It did not differ from the mixed bare soil and the vegetation in the dry season. Although NDVI performed better in the wet season, the difference was not significant with the built-up class.

**Figure 4.** Differences of gullies and other land cover types' reflectance by bands and seasons (G: gully; F: forest; Bu: built-up; BS: bare soil; MBS: mixed bare soil; V: vegetation; A: agriculture; mean ± 95% confidence intervals; the difference was not significant if confidence range intersects the dashed line).

**Table 2.** PlanetScope bands ranking in discriminating gullies against the surrounding land cover.


#### *3.2. Accuracy Assessment of Gully Mapping*

Using machine learning algorithms (RF and SVM), the cross-validation (CV) resampling method yielded better OA compared to bootstrapping for both the wet and dry seasons (Figure 5). Two apparent trends can be observed from these results based on OA: (i) RF consistently performed better than SVM irrespective of the season or resampling methods: bootstrapping and cross-validation; (ii) dry season had better OAs than the wet season, but this was not reflected in class level accuracy indices for gully classification. Based on the unbiased UA, all algorithms showed good performance in gully classification, recording UA above 70% (Figure 6). In particular, the best performance belonged to the svm-d-b (93.4%), whereas the worst UA belonged to the rf-w-b model (77%). For most models, PA was generally low relative to UA. Only half of the models recorded a PA greater than 70%, with the best performance belonging to svm-w-cv (89.2%), while the other half fell below 70%, with the svm-d-b model recording the lowest PA (32.5%).

An unbiased area estimate of gullies (ha) is presented in Table 3. With the highest PA (89.2%) and lowest standard error (3.7 ha), svm-w-cv provided the most accurate gully areal coverage (57.2ha). The highest standard error (11.5 ha) belonged to rf-w-b model, which had a gully area of 55.2 ± 25ha. However, in the F1-score ranking, rf-d-b and rf-d-cv algorithms achieved the best results (>0.90), but RF algorithms belonging to the wet season had relatively low score (0.82). On the other hand, all SVM algorithms (svm-d-cv, svm-d-b, svm-w-cv, and svm-w-b) recorded lower F1-scores, ranging 0.85–0.88. The two resampling techniques recorded the same omission error (85.1%), but slightly different commission errors, e.g., bootstrapping had 40.8% error of commission compared to 37.8% error for k-fold CV (Table 4).

**Figure 5.** Accuracy assessment based on overall accuracy (OA) by the classification algorithm (RF: random forest, SVM: support vector machine), resampling method (boot: bootstrapping, CV: cross-validation), and season (wet and dry).

**Figure 6.** Unbiased user's accuracy and producer's accuracy (rf: random forest, svm: support vector machine, w: wet season, d: dry season, cv: cross-validation, b: bootstrapping, blue dashed line is 70% accuracy benchmark).

**Table 3.** Estimated gully area (ha) with associated standard error (ha) at ± 95% CI (ha) for each algorithm (rf: random forest, svm: support vector machine, d: dry, w: wet, b: bootstrapping, cv: cross-validation, CI: confidence interval).


The two resampling techniques recorded the same omission error (85.1%), but slightly different commission errors, e.g., bootstrapping had 40.8% error of commission compared to 37.8% error for k-fold CV.

**Table 4.** Summary of average error for resampling techniques, classifier, and season (RF: random forest, svm: support vector machine, CV: cross-validation).


#### *3.3. Gully Distribution*

Results indicated that gullies can be spectrally discriminated from other land cover classes, both in the dry and wet season; although there were observable differences in the distribution of the extracted gullies in these two seasons (Figure 7). In the wet season, there seem to be more gullies than there are in the dry season. This difference in gully areal coverage between the two seasons is more pronounced in Figure 7a, corresponding to rf-d-b and Figure 7b, representing the svm-w-cv model.

Differences in gully reflectance among the two seasons also had a bearing on gully classification. The underlying statistical test revealed that the difference was significant (U = 162, z = 9.5534, *p* < 0.0001), and the mean difference was 0.08. The wet season had more vegetation covering bare surfaces, and because of this, spectral differences were more pronounced during the wet season (Figure 8). On the contrary, in the dry season, most gullies spectrally resembled the bare surfaces they dissect. Consequently, the algorithms were less efficient in extracting gullies occurring on bare soil surfaces in the dry season. This probably explains the high commission error (43.1%) and standard error (8.6 ha) in the dry season.

**Figure 7.** Spatial distribution of gullies: (**a**) rf-d-b and (**b**) svm-w-cv correspond to the best models for gully mapping in the dry and wet seasons, respectively (rf: random forest, svm: support vector machine, w: wet season, d: dry season, cv: cross-validation, b: bootstrapping).

**Figure 8.** An example of a vegetated gully (dashed yellow ellipse) in the dry and wet seasons.

#### **4. Discussion**

Remotely sensed data are inherently subject to errors, hence, error assessment is essential for data assimilation, one of the primary uses of satellite data products [65]. In this section, we discuss errors associated with the derived gully maps, offering a possible explanation for such error sources. Different resampling methods undoubtedly play an important role in classification accuracy, hence, the final model selection. Specifically, we explored the influence of bootstrapping and k-fold cross-validation techniques in gully classification, considering different seasons (dry and wet) and classifiers (SVM and RF). Results revealed that k-fold CV performs slightly better than bootstrapping in terms of commission error. Kohavi et al. [66], in his study of CV and bootstrap for accuracy estimation and model selection, also reported k-fold CV as the best method to use over bootstrapping. Kim [67] estimated classification error rate, comparing repeated k-fold CV, repeated hold-out and bootstrap, and found that the repeated k-fold CV was better than bootstrap. The author further reported that bootstrapping had bias problems for both large and small samples, despite its small variance, hence, the expectation for better performance for small samples.

Although the results of our study are generally in agreement with previous studies, it is worth noting that the performance of the bootstrapping and k-fold CV varied considerably at class level with algorithm and season. There are instances where bootstrapping performed better than k-fold CV in gully classification. For instance, the best model, namely, svm-d-b, based on UA, belonged to bootstrapping. Such results are important because most studies using either bootstrapping or k-fold CV rarely focus on class level accuracy when evaluating the performance of these resampling techniques. More importantly, even at the class level, different accuracy metrics ought to be considered. This increases the robustness and reliability of the accuracy results, making it possible for researchers to draw correct deductions on the behavior of the algorithms under investigation [61]. However, various class accuracy metrics (UA, PA, standard error, and F1-score) used in the current study, all derived from the confusion matrix, disagreed with one another in some instances. For example, some algorithms that obtained high PA values had low corresponding UA values or vice-versa. This is also true with F1-score vs. either PA or UA. Based on the F1-score, the best algorithms belonged to RF (e.g., rf-d-b and rf-d-cv). Given the disagreement amongst various accuracy metrics, we relied on the standard error as a reliable measure to judge the accuracy of the algorithms.

In the wet season, the algorithms proved to be more efficient in gully classification on bare soil surfaces due to the existence of vegetation cover in bare soil surfaces, making it possible to discriminate gullies. Such findings are comparable or similar to those of previous studies. For example, one study automatically identified gullies based on ASTER images acquired during the dry and wet seasons [68]. The study concluded that the wet season-acquired image performed better than the dry season one. It is worth noting that the wet season is not always appropriate for gully identification in all situations. The success of gully identification depends on the complexity of gully appearance as influenced by their morphological characteristics (shape, size, length, depth, etc.) [42], sensor type and/or resolution, and classification algorithms [69], amongst other factors. For example, Sentinel and Landsat images performed relatively well in the dry season than in the wet season [70]. Although gully classification was successful in the wet season relative to the dry season, there were few locations where gullies were filled up with vegetation. Such gullies could not be automatically classified, in which case we relied on visual interpretation of high-resolution aerial photographs and/or dry season PlanetScope images.

Gully appearance also played an important role in gully classification. Consistent with previous studies [42,71], the classification algorithms were efficient in detecting continuous gullies mostly in linear shape. Conversely, the algorithms proved to be less efficient in areas with high gully density, often surrounded by transitional zones to non-gully [71], but these areas form a relatively small portion of the study area and had negligible influence on the accuracy. The SVM combined with CV (e.g., svm-w-cv) reflected the best performance in the wet season with the least standard error (3.7 ha) and highest PA (89.2%), followed by a RF model (rf-d-b), recording slightly different standard error (6.1 ha) and PA (83.6%). Nevertheless, 50% of the models obtained a PA that is below 70%. Despite this discrepancy, the estimated gully areas (ha), based on area-weighted metrics, are unbiased and can be relied upon.

From a practical point of view, the identification of gullies from satellite images with reasonable accuracies is of paramount importance to gully rehabilitation. Like all remote sensing-derived products, gully maps are subject to errors, and hence, accuracy assessment is a prerequisite [54]. However, most remote sensing-based gully studies tend to rely on accuracy indices, such as PA and UA, without taking into account the uncertainty of the estimated gully areas. Although it is not a requirement, it is often recommended to provide not only PA and UA but also unbiased quantitative area estimates such as the area-weighted metrics and confidence intervals [60]. In this study, we quantified gullied areas (ha) together with their associated levels of uncertainties, such as standard errors (ha) and confidence intervals (ha).

RF combined with bootstrapping resampling provided the best gully area (88 ± 14.4 ha) estimate with the least standard error (6.1 ha) in the dry season. In the wet season, SVM combined with CV resampling estimated gully area (57.2 ± 18.8 ha) with the lowest standard error (3.7 ha). These findings shed light on the influence of these resampling techniques on the accuracy of satellite-based gully mapping but also provides the basis for further investigations into the accuracy of such resampling techniques, especially when using different satellite images other than the PlanetScope data, preferable, freely available ones, with higher spatial resolution. Initially, we planned to use both PlanetScope and SPOT-7 images, also obtainable free of charge for the test area, but SPOT-7 image scenes acquired in the wet and dry season months were not available for the test area. Nevertheless, given that we only mapped gullies in a small part of the problem area, we are planning to test the method in other areas with wider spatial coverage. However, mapping gullies over large areas, particularly using automatic methods, is still a challenge due to the complexity of gullies over such large areas [14]. Thus far, even advanced methods such as CNNs have errors in detecting complex gully systems [37]. It is worth noting that the detection of gullies mainly depends on the spatial resolution of the image used. For example, at larger scales, gullies have only been mapped at a spatial resolution of up to 2.5 m in South Africa [14,15]. To overcome this challenge, the future implementation of our method, will in part, require the use of a high spatial resolution (<2 m) image, for instance, pansharpened SPOT-7 image (1.5 m) or WorldView (0.5 m), which can detect individual gullies. Another limitation of this method relates to climate. The method is suitable for application in arid/semi-arid regions where gullies are often not covered by trees [42]. Our study demonstrated that gullies could be better identified in the dry season with RF combined with bootstrapping, whereas SVM combined with k-fold CV is best for identifying gullies in the wet season. Therefore, we recommend the use of RF and SVM for mapping gullies in the dry and wet seasons, respectively. Provided that PlanetScope provides global spatial coverage with daily revisit time, we particularly recommend it for continuous monitoring of gullies at any location.

#### **5. Conclusions**

The aim of this study was to assess the efficacy of cross-validation and bootstrapping in gully classification and also to reveal how well the PlanetScope images perform in gully extraction in the dry and wet seasons of a semi-arid climate. We found the following outcomes.


Accordingly, both resampling techniques were efficient, but RF with bootstrapping resampling technique in the dry season can be suggested to map gullies. In the future, we plan to extend the mapping in larger areas to help landowners and managers to fight against erosion and to plan the interventions at the hot spot areas.

**Author Contributions:** Conceptualization, K.P. and S.S.; methodology, K.P.; software, K.P. and D.A.; validation, K.P. and D.A., formal analysis, K.P.; investigation, K.P.; resources, S.S.; data curation, K.P. and D.A.; writing—original draft preparation, K.P.; writing—review and editing, K.P. and S.S.; visualization, K.P., D.A., and S.S.; supervision, S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Thematic Excellence Programme (TKP2020-NKA-04) of the Ministry for Innovation and Technology in Hungary projects and Department of Higher Education and Training (DHET) of South Africa.

**Data Availability Statement:** PlanetScope images can be purchased from the PlanetLabs Inc. Limited, non-commercial access to PlanetScope imagery can also be gained through the Education and Research Program (https://www.planet.com/markets/education-and-research/ (accessed on 30 July 2020)). Reference data can be provided by the authors on demand.

**Acknowledgments:** The first author (K.P.) greatly acknowledges the Tempus Public Foundation for funding his Ph.D. studies through the Stipendium Hungaricum Scholarship Programme. The author is equally grateful to the Department of Higher Education and Training (DHET) of South Africa for the supplementary support.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Quantitative Soil Wind Erosion Potential Mapping for Central Asia Using the Google Earth Engine Platform**

**Wei Wang 1,2,3, Alim Samat 1,2,3, Yongxiao Ge 1,2, Long Ma 1,2,3, Abula Tuheti 1,4, Shan Zou 1,2,3 and Jilili Abuduwaili 1,2,3,\***


Received: 1 September 2020; Accepted: 16 October 2020; Published: 19 October 2020

**Abstract:** A lack of long-term soil wind erosion data impedes sustainable land management in developing regions, especially in Central Asia (CA). Compared with large-scale field measurements, wind erosion modeling based on geospatial data is an efficient and effective method for quantitative soil wind erosion mapping. However, conventional local-based wind erosion modeling is time-consuming and labor-intensive, especially when processing large amounts of geospatial data. To address this issue, we developed a Google Earth Engine-based Revised Wind Erosion Equation (RWEQ) model, named GEE-RWEQ, to delineate the Soil Wind Erosion Potential (SWEP). Based on the GEE-RWEQ model, terabytes of Remote Sensing (RS) data, climate assimilation data, and some other geospatial data were applied to produce monthly SWEP with a high spatial resolution (500 m) across CA between 2000 and 2019. The results show that the mean SWEP is in good agreement with the ground observation-based dust storm index (DSI), satellite-based Aerosol Optical Depth (AOD), and Absorbing Aerosol Index (AAI), confirming that GEE-RWEQ is a robust wind erosion prediction model. Wind speed factors primarily determined the wind erosion in CA (*r* = 0.7, *p* < 0.001), and the SWEP has significantly increased since 2011 because of the reversal of global terrestrial stilling in recent years. The Aral Sea Dry Lakebed (ASDLB), formed by shrinkage of the Aral Sea, is the most severe wind erosion area in CA (47.29 kg/m2/y). Temporally, the wind erosion dominated by wind speed has the largest spatial extent of wind erosion in Spring (MAM). Meanwhile, affected by the spatial difference of the snowmelt period in CA, the wind erosion hazard center moved from the southwest (Karakum Desert) to the middle of CA (Kyzylkum Desert and Muyunkum Desert) during spring. According to the impacts of land cover change on the spatial dynamic of wind erosion, the SWEP of bareland was the highest, while that of forestland was the lowest.

**Keywords:** wind erosion modeling; RWEQ; GEE; central Asia; spatial-temporal variation; land degradation

#### **1. Introduction**

During the past few decades, global climate change and human disturbance have meant that land degradation has become one of the most serious environmental problems of the 21st century [1]. Despite the lack of strong political will, the land degradation problem has attracted much attention throughout the world [2]. As of today, over 120 countries have committed to the Land Degradation

Neutrality (LDN) Target Setting Programme, which strives to achieve a land degradation neutral world before 2030 [2]. Soil erosion is the major global soil degradation threat to land, freshwater, and oceans [3]. More than 83% of the global extent of land degradation is caused by soil erosion [4]. Because of human activities and climatic variations, the soil erosion can cause topsoil loss, which leads to land degradation requiring centuries to recover, such as soil productivity loss and the thinning out of vegetative cover [5,6].

In the arid land, wind plays a more important role than water in removing the fertile topsoil, which requires centuries to build up [1]. The escalating loss of topsoil by wind erosion is a potential threat to sustainable agriculture, which is closely related to food security [7]. The soil organic matter and several other soil nutrients in the topsoil easily blow away due to strong near-surface wind, and in turn, the soil fertility and plant productivity are decreased. According to a report from the European Soil Data Center (ESDAC), about 28% of the global land degradation area suffers from the wind-driven soil erosion process [8]. Therefore, wind erosion is considered a significant threat to food security and human health, especially in arid and semi-arid regions of the world [9,10]. In addition, the hazard of sand and dust storms is one of the most severe consequences of wind erosion. Wind-blown soil particles and chemicals that lead to air pollution can affect the human respiratory system [11]. Therefore, the ability to accurately simulate and predict soil wind erosion is essential for land degradation control, suitable agricultural management, and sandstorm prevention, especially in arid regions.

As one of the largest land-locked arid regions, Central Asia (CA) has a critical need to combat desertification [12]. Moreover, CA has suffered from the most frequent sandstorms due to the frequent strong wind, limited rainfall, low vegetation coverage, and intense human disturbance [13]. Among those factors, it is generally accepted that the near-surface wind speed plays a vital role in wind erosion dynamics [14]. The need to investigate the soil wind erosion potential over CA is of great importance and brings unique challenges of large-scale wind erosion modeling and insufficient ground measurements [15,16]. Researches on the near-surface wind speed have revealed that there has been a global declining trend of the near-surface wind speed since the 1970s, which is known as global terrestrial stilling [14,17–19]. Affected by global terrestrial stilling, Li, et al. [20] found that the soil wind erosion modulus exhibited a declining trend across CA between 1986 and 2005. However, recent studies found a reversal in global terrestrial stilling around 2010 [21–23]. Furthermore, little is known about the wind speed variability across CA after 2010, let alone the impact of the reversed stilling on the wind erosion dynamic [20,24]. Additionally, existing researches cannot produce continuous high spatial-temporal resolution near-real-time (NRT) wind erosion products of the entire CA, especially for recent years [20,25]. These kinds of wind erosion maps are critical for ecological protection and land use practice in CA.

Wind erosion is a complex physical process controlled by both natural factors and human activities, and normally includes the wind speed, soil characteristics, surface roughness, vegetation cover, agricultural activities, and so on [14,26–30]. It is well-established that three conditions including strong enough wind, susceptible soil surface, and no surface protection by vegetation cover or snow cover, are required for soil wind erosion to occur [31]. The measurement of wind erosion has always been a major obstacle in wind erosion research. According to the literature, two categories of prominent methods, including the 137Cs tracing technique and wind tunnel experiment, can estimate wind erosion more precisely [32,33]. However, they have limitations in that the measurement involves labor-intensive work and hardly describes the spatial variation of wind erosion. Additionally, due to the complex physical processes and driving mechanisms of wind erosion, it is still difficult to monitor the process and conduct quantitative measurements on wind erosion on a large-scale. Over the past few decades, substantial efforts have been made in terms of investigating the mechanism and driving factors of wind erosion. Based on small-scale regional field studies and wind tunnel experiments, several quantitative assessment models of wind erosion have been developed [34–38]. Since the scientific investigations of Bagnold [38] on the wind erosion prediction technology in 1941, soil wind erosion models ranging from empirical-based to physics-based models, have been put forward. The most accepted models developed to quantify soil wind erosion include Wind Erosion Equation (WEQ) [39], Revised Wind Erosion Equation (RWEQ) [37], Wind Erosion Prediction System (WEPS) [40], Single-event Wind Erosion Evaluation Program (SWEEP) [41], Erosion Productivity Impact Calculator (EPIC) [42], Agricultural Policy/Environmental eXtender (APEX) [43], Texas Erosion Analysis Model (TEAM) [44], and Wind Erosion on European Light Soils (WEELS) [45]. Due to the limited parameters and data that can be obtained, it is difficult for this kind of wind erosion model to simulate soil loss by wind erosion at a larger geographic scale. In contrast with others, the RWEQ model proposed by Fryrear, et al. [37] employs a set of mathematical equations to input weather, soils, crops, and tillage data. Additionally, the RWEQ model has been validated with filed erosion data from 45 site years in several US states [37]. Due to the limitations of RWEQ input data acquisition, the original RWEQ was designed to calculate wind erosion loss at a field scale [36]. Zobeck, et al. [46] evaluated the feasibility of scaling up from fields to regions to estimate the soil wind erosion potential by a geographic information system (GIS)-based field scale wind erosion model in Texas, US. Chi, et al. [47] used the RWEQ model to calculate the soil wind erosion modulus based on field sampling point data regression and remote sensing data over China. Borrelli, et al. [48] developed the GIS-RWEQ model to evaluate the soil loss potential due to wind erosion in the European Union (EU). Although the RWEQ model has achieved some success in large-scale applications, a large amount of detailed local geodata and field work are still required [46–50]. With the development of Remote Sensing (RS) technology and cloud computing, Near Real-Time (NRT) wind erosion data have become more valuable for guiding agricultural production in specific areas. The challenge is to integrate global climate reanalysis data and remote sensing data into the RWEQ model so that it can provide essential knowledge about where and when wind erosion occurs. Another challenge is to consider processing terabyte geospatial data in continent-wide wind erosion quantitative mapping. Moreover, considering the limited computing resources and big data scenarios, it is difficult to use conventional software or programming languages to conduct computation.

The Google Earth Engine (GEE) platform, which has cloud computing capabilities and a multi-petabyte catalog of geospatial data, is a perfect tool for executing wind erosion models [51]. In this platform, the open-source geospatial data include RS data, ground observation data, model simulation data, assimilation data, and so on [52]. GEE's public data archive includes more than 40 years of historical imagery and scientific datasets, which almost cover the geospatial data needed to build the RWEQ model; for example, climate data (wind speed, snow depth, soil moisture, and so on), vegetation cover, soil characters, elevation, etc. These datasets are easily accessible and can be processed and computed in the Cloud, which means that it is not necessary to download data locally. In fact, GEE has shown great potential in change detection, mapping trends, and quantifying differences over the past few years [52]. To date, several studies have been conducted on the GEE platform from regional scales to global scales, such as large-scale land cover classification, vegetation monitoring, soil salinity mapping, disaster management, and so on [53–56].

As we discussed above, GEE is a novel and powerful tool for the quantitative mapping of wind erosion. However, to the best of our knowledge, almost no research has been done to simulate the soil wind erosion potential by using GEE, especially in CA. Additionally, in the context of the global terrestrial stilling reversal, it is important to figure out the wind speed variability in CA for the study of wind erosion in recent years. In view of this, the purposes of this study are (1) to evaluate the near-surface wind speed trend in CA from 2000–2019, based on multiple source climate data; (2) to quantify mapping the soil wind erosion potential (SWEP) in CA based on the RWEQ model by using the GEE platform; and (3) to analyze the monthly and seasonally change of soil wind erosion and the response of soil wind erosion dynamics to land cover change (LCC). This is the first study to execute the wind erosion model on the GEE platform. This provides new ideas for the construction and use of empirical models based on batch geospatial data and high-performance computing. The main conclusions could be beneficial for desertification control and land resource management in CA.

#### **2. Study Area and Dataset**

#### *2.1. Study Area*

The most common definition of CA is the official one of the Soviet Union, which includes the five former Soviet republics of Kazakhstan (KZ), Uzbekistan (UZ), Turkmenistan (TK), Kyrgyzstan (KG), and Tajikistan (TJ). The total area of CA is nearly 4 <sup>×</sup> 10<sup>5</sup> km2, which is mainly covered with bareland and sparse vegetation. The landform types of CA are mainly plains and hills. Additionally, the mountains (Tianshan Mountain, Pamir Mountain, and Altai Mountain), which are known as the "Water Tower of Central Asia", are mainly distributed in the southeast [57]. As the Tianshan Mountain and Pamir Mountain block rain clouds that should enter CA from the east and south, CA is one of the largest land-locked arid regions in the world [58]. Most of CA lies in an arid climatic zone, which has low annual precipitation (less than 300 mm), a high air temperature, and strong evaporation. Five large temperate deserts (Karakum Desert, Kyzykum Desert, Muyunkum Desert, Sarresi-Atyray Desert, and Aralkum Desert) are distributed from the southwest to middle east (Figure 1). Additionally, desertification caused by large-scale agriculture practices has been an issue since 1960 and enhanced climate change presents many economic, social, and environmental problems in CA [15,24,59,60]. The most notorious example is the Aral Sea Crisis, which has been considered to be one of the planet's worst environmental disasters of the 21st century [59]. The large-scale construction of irrigation canals has reduced runoff from Syr Darya river and Amu Darya river into the Aral Sea, which in turn reduced the Aral Sea surface area from 68,000 square kilometers in 1960 to less than 7000 square kilometers in 2016 [60]. Meanwhile, a new anthropogenic desert known as Aralkum Desert in the eastern dry basin appeared in 1960. Salt and dust storms, which are caused by wind erosion occurring in Aralkum Desert, represent one of the most serious problems for human health and agricultural activities in CA [16].

**Figure 1.** The study area (background image: Moderate-Resolution Imaging Spectroradiometer (MODIS) NDVI in 2019). It consists of the five former Soviet republics of Kazakhstan (KZ), Uzbekistan (UZ), Turkmenistan (TK), Kyrgyzstan (KG), and Tajikistan (TJ).

#### *2.2. Data Collection and Source*

The meteorological data included the wind speed, soil moisture, and snow depth, which were derived from Global Land Data Assimilation System 2.1 (GLDAS2.1) integrating satellite and ground-based observational products [61]. Three other sets of climate assimilation data, including The Fifth Generation ECMWF Atmospheric Reanalysis Data (ERA5), NCEP Climate Forecast System Reanalysis (CFSR), and the Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS), were used to investigate the wind speed variability across CA (Table 1). The soil mechanical composition, soil organic matter, and several other soil properties were obtained from the Harmonized World Soil Database (HWSD) and OpenLandMap (OLM), which are based on machine learning predictions from a global compilation of soil profiles and samples. A total of six standard depths (0, 10, 30, 60, 100, and 200 cm) were divided in the OLM dataset (Table 1), due to the lack of soil calcium carbonate content data in GEE datasets and based on the finding that a nonlinear positive correlation exists between the soil pH and soil calcium carbonate [62,63]. Huang, et al. [62] found that the relationship between the calcium carbonate content and pH value of surface soil in East Central Asia has the highest R2 when they simulated the factors with an exponential equation. Liu, et al. [63] found that the soil pH and CaCO3 content have a non-linear positive correlation during a study conducted in China. Based on more than 15,000 different soil mapping units, we proposed an exponential equation (Equation (1)) to quantify the relationship between the soil pH and soil calcium carbonate (CaCO3).

$$pH = 4.576 \times \text{CaCO} \text{s}^{0.08089} + 2.378,\tag{1}$$

where *pH* is the soil pH of different soil types in HWSD and *CaCO*<sup>3</sup> is the soil calcium carbonate content in HWSD (%).



Note: \* means that the dataset can be accessed on GEE.

NDVI was derived from the NASA Terra Moderate-Resolution Imaging Spectroradiometer (MODIS) Vegetation Indices (MOD13Q1). The National Aeronautics and Space Administration Shuttle Radar Topographic Mission (NASA-SRTM) provided Digital Elevation Model (DEM) data, which were used to calculate the slope data. The land cover data were provided by the European Space Agency (ESA)-based Climate Change Initiative (CCI) 300 m global land cover data products developed using the GlobCover unsupervised classification chain and by merging multiple available Earth observation products.

Ground observation wind speed data (2 m height) were derived from NOAA Global Surface Summary of Day (GSOD), which includes global data obtained from the United States Air Force (USAF) Climatology Center. We collected the daily ground measurement wind speed data from more than 400 weather stations in CA. Due to political or other reasons, weather stations in the former Soviet Union were abandoned and several new weather stations were established between 1990 and 2010 [60]. Most weather stations currently working were established in the 1960s. Therefore, we integrated ground observations of the wind speed based on 204 weather stations from 2000 to 2019 (Figure 1). The average wind speed data of all-weather stations were used to study the temporal variation characteristics of the wind speed in CA. The visibility, data which could be employed to calculate the Dust Storm Index (DSI), were derived from GSOD. Additionally, the RS technique has provided a new perspective on the validation of soil wind erosion. Aerosol data included the Aerosol Optical Depth (AOD) and Absorbing Aerosol Index (AAI), which were used to compare and validate the wind erosion in this research. AOD was derived from MODIS based on the Multi-angle Implementation of Atmospheric Correction (MAIAC). Moreover, AAI was derived from the Sentinel-5 Precursor, which launched on 13 October 2017. Details on the research data are listed in Table 1.

#### **3. Methodology**

#### *3.1. GEE-RWEQ*

The comprehensive assessment of wind erosion in a large-scale region like CA is complex and challenging. Dozens of parameters are employed to calculate the soil wind erosion modulus by using field-scale models, such as WEPS. GEE is a cloud computing platform specially designed to process raster data, including satellite images, climate assimilation grids, and other geospatial data. The advantage of the GEE platform lies in the instant access, manipulation, visualization, and real-time analysis of large amounts of geospatial data [52]. Therefore, the advent of GEE made it possible to launch global-scale environmental mapping and monitoring programs [53]. This is of great potential for integrating an environmental model on the GEE platform to build a GEE-based production framework. Furthermore, most developing countries where resources are limited have suffered from various environmental problems, including droughts, flooding, deforestation, soil degradation, and dust storms caused by wind erosion [16,53,64,65]. These countries often lack monitoring sites and networks for environmental problems, making these problems more serious [64–67]. In this study, by using multisource geospatial data, we present a fully automated algorithm for mapping NRT monthly wind erosion dynamics at a global scale using the GEE platform.

Although the computational efficiency should not be a concern in the GEE platform, the limited ground observation data present a big challenge for simulating soil wind erosion. Therefore, it is necessary to build a simplified and more practical model that can estimate the SWEP at a large scale on the GEE platform. Although it has a lower accuracy than mechanistic wind erosion models, this relatively simple model is not limited by the input data, location, and scale of the study area. RWEQ has been extensively tested, and good agreements between model results and field measurements were found in previous studies. In this study, a GEE cloud computing-based RWEQ model was developed to conduct quantitative mapping of soil wind erosion in a ground measurement limited area. As mentioned above, based on the progress of Earth observation and numerical modeling, several parameters that used to be filed, measured, or calculated can be easily acquired on the GEE platform. Although most of the input parameters are retained in GEE-RWEQ, the soil roughness factor (K') is difficult to estimate during farming production on a regional scale. Ouyang, et al. [68] replaced the soil ridge roughness with the roughness caused by topography, and it was calculated by the Smith–Carson equation. Because this equation has been widely used in many regions [68–71], it can applied when the study area is scaled up from a field to a region. Due to the limitations of wind

erosion estimation based on RS on regional scales, the combined crop factor (C) was simplified based on previous findings [48–50,68,72].

The GEE-RWEQ involved basic equations, as follows [37]:

*SWEP* <sup>=</sup> <sup>2</sup>*<sup>x</sup> <sup>s</sup>*<sup>2</sup> *<sup>Q</sup>*max*<sup>e</sup>* <sup>−</sup>( *<sup>x</sup> s* ) 2 , (2)

$$Q\_{\text{max}} = 109.8 \times WF \times EF \times SCF \times K' \times C,\tag{3}$$

$$s = 150.71 \times \left( WF \times EF \times SCF \times K' \times C \right)^{-0.3711},\tag{4}$$

where *SWEP* is the amount of soil wind erosion potential per unit area (kg/m2); *Q*max is the maximum transport capacity (kg/m); *x* is the distance from the upwind edge of the field (m), set to 55 m for the study area; *s* is the critical field length (m); *WF* is the weather factor (kg/m); *EF* is the erodible fraction (dimensionless); *SCF* is the soil crust factor (dimensionless); *K* is the soil roughness factor (dimensionless); and *C* represents combined crop factors (dimensionless).

The weather factor can be calculated as

$$\mathcal{W}F = \frac{\text{SW} \times \text{SD} \times \sum\_{i=1}^{N} \mu\_2 (\mu\_2 - \mu\_t)^2 \times N\_d \times \frac{\mathcal{C}}{\mathcal{S}}}{500},\tag{5}$$

where *SW* is the soil wetness (dimensionless), *SD* is the snow cover factor (dimensionless), *u2* is the wind speed at 2 m (m/s), *ut* is the threshold wind speed at 2 m (assumed 5 m/s), *N* is the number of wind speed observations (*u*<sup>2</sup> > *ut*) in the period, *Nd* is the number of days in the period, ρ is the air density (kg/m3), and *g* is the acceleration due to gravity (m/s2).

The erodible fraction (EF) and soil crust factor (SCF) can be calculated as

$$EF = \frac{29.09 + 0.31Sa + 0.17Si + 0.33\frac{Sa}{CI} - 2.59OM - 0.95CaCO\_3}{100} \tag{6}$$

$$\text{SCF} = \frac{1}{1 + 0.0066(\text{Cl})^2 + 0.021(\text{OM})^2} \text{'} \tag{7}$$

where *Sa* is the sand content (%), *Si* is the silt content (%), *Sa*/*Cl* is the sand to clay ratio (%), *OM* is the organic matter (%), *CaCO*<sup>3</sup> is the calcium carbonate (%), and *Cl* is the clay content (%).

The soil roughness factor (K ) can be calculated as [70]

$$K' = \cos \alpha,\tag{8}$$

where α is the slope gradient (degree), which can be calculated by the Digital Elevation Model (DEM). The combined crop factor (C) can be calculated as [70,73]

$$
\mathcal{L} = \mathcal{e}^{-0.0483(\mathcal{SC})},
\tag{9}
$$

$$SC = \left( NDVI - NDVI\_{\text{soil}} \right) / \left( NDVI\_{\text{max}} - NDVI\_{\text{soil}} \right), \tag{10}$$

where *SC* is the vegetation coverage (%), *NDVIsoil* is the NDVI value of a bare soil pixel, and *NDVI*max is the maximum NDVI value of the study area.

#### *3.2. Model Performance Evaluation*

Because of the diverse land cover types and large area, it is extremely difficult to measure wind erosion for a whole region. Additionally, in the past two decades, almost no research has conducted field measurements on wind erosion in CA. Therefore, validation methods that can evaluate the reliability of wind erosion model results need to be proposed. Considering that the ground observed dust storm can indicate the frequency and intensity of wind erosion events, DSI was used to validate the spatial variation of SWEP.

DSI was calculated based on the meteorological record-visibility, which can represent the frequency and intensity of wind erosion events. DSI was first proposed by McTainsh [74] in the National Collaborative Project on Indicators for Sustainable Agriculture (NCPISA). Based on the relationship between meteorological records and DSI, O'Loingsigh, et al. [75] used daily visibility data acquired from 180 long-term meteorological stations to investigate a long-term national wind erosion record (1965–2011) in Australia. DSI is a methodology employed for monitoring wind erosion based on long-term daily meteorological observations. At present, it is generally accepted as an indicator of broad-scale wind erosion rates in Australia, Iran, and Northeast Asia [75,76]. Based on weather codes relating to wind erosion or visibility, wind erosion events were divided into three categories: (a) Severe Dust Storms (SDS); (b) Moderate Dust Storms (MDS); (c) Local Dust Events (LDE). The DSI was calculated using the following equation [75]:

$$DSI = \sum\_{i=1}^{n} \left[ (5 \times SDS) + MDS + (0.05 \times LDE) \right] \,\prime \tag{11}$$

where *i* is *i*th value of *n* stations for *i* = 1 to *n*, *SDS* is a severe dust storm (visibility < 200 m), *MDS* is a moderate dust storm (200 m < visibility < 1000 m), and *LDE* is a local dust event (1000 m < visibility < 20,000 m).

Due to the lower population and urban density, soil mineral particles produced by wind erosion are the main source of atmospheric aerosols in CA [24]. Therefore, the satellite-derived AOD data were used to evaluate the reliability of SWEP simulated by the RWEQ model. There are several satellite-based aerosol products, which have different spatial and temporal resolutions, such as CALIPSO Lidar Tropospheric Aerosol Profiles All sky data, VIIRS/SNPP Deep Blue L3 daily aerosol data, OMI/Aqua Multi-wavelength AOD Daily data, MODIS MO(Y)D08\_M3 Terra (Aqua) Atmosphere Monthly data, MODIS MCD19A2 Terra & Aqua MAIAC Land Aerosol Optical Depth Daily data, and Sentinel-5P NRTI AER AI. However, most satellite-based aerosol products have a low spatial resolution and cannot meet the requirements of quantitative spatial comparisons [77–79]. In this study, we used the MODIS MCD19A2 dataset at the 0.47 μm blue band, along with the parameter Optical\_Depth\_047, which has a spatial resolution of 1 km [79]. Another aerosol dataset named the Absorbing Aerosol Index (AAI), with a 0.01-degree spatial resolution, was extracted from the Sentinel-5P NRTI AER AI product. Because the Sentinel-5P was launched on 13 October 2017, the aerosol dataset was released in 10 July 2018 [80]. Therefore, we used the 2019 annual average AAI to compare with the 2019 SWEP in this study.

#### *3.3. Technical Flowchart of this Study*

The Land Cover Change (LCC), which is influenced by both climate change and human activity, usually affects wind erosion on surface roughness and soil physical and chemical characteristics. Therefore, we studied the SWEP of different land cover types and SWEP changes caused by the conversion of different land cover types. In this study, we chose ESA-CCI 300 m global land use land cover data products developed using the GlobCover unsupervised classification chain and by merging multiple available Earth observation products. Based on the United Nation Land Cover Classification System's (LCCS) plant functional types (PET), the CCI-LC map is classified into 22 land types. According to the study area land characteristics, the land cover types are reclassified into nine categories based on the look-up table-conversion of CCI-LC classes to PET in the product user guide [81].

Based on the objective of this study, this manuscript is organized as presented in the technical flowchart (Figure 2). The research consists of four main steps: First, based on a time-series decomposition model, the wind speed variability of ground measurement data and reanalysis data was explored; second, by using multi-source geospatial data, the monthly SWEP across CA was generated based on GEE-RWEQ, and we explored the spatiotemporal variation of SWEP between 2000 and 2019; third, based on DSI and satellite-based AOD, validation was conducted to test the reliability of annual SWEP; and finally, we investigated the responses of wind erosion to ground measurement wind speed change and land cover change.

**Figure 2.** The technical flowchart of this study.

The ArcGIS10.6 software was implemented for land cover type reclassification in this research. The Pearson correlation coefficient (r) was calculated in R 3.6.3. The time-series decomposition model based on the "tseries" package was also run in R 3.6.3. The exponential fitting of the soil pH and soil calcium carbonate was performed in MATLAB 2018a.

#### **4. Results, Analysis, and Validation**

#### *4.1. Variability of the Daily Average Wind Speed across CA*

As the key factor of wind erosion, wind speed variability plays a vital role in wind erosion dynamics. A host of studies have reported that there was a declining trend in the global near-surface wind speed from 1970 to 2010 [14,17–19]. However, a recent study described an increase in the global wind speed during a particular year. Zeng, et al. [22] found that, after several decades of global terrestrial stilling, the wind speed has increased rapidly across the globe since 2010. Although Zeng, et al. [22] have investigated the global temporal variation of the wind speed, further studies are required because of the sensitivity of CA to global climate change [60].

To better understand the temporal variations of wind speed, it is possible to decompose wind speed time series data into sub-components by a time-series decomposition model. In this study, we used a multiplicative decomposition model, which is more effective when a seasonal value changes over time [82]. The calculation of this model included the following three steps [83]. The trend component was first determined and removed from time series by using the moving averages method. Secondly, the seasonal component was calculated and centered by averaging all periods for each

time unit. In this study, a time-series decomposition multiplicative model was applied to Ground Measurement Wind Speed (GMWS) data. Figure 3 shows that the time series data were decomposed into various sub-components (the trend component, seasonal component, and random component). According to the trend component of the GMWS time series, we found that there was a significant decrease during the time period of 2000–2009 and a significant increase trend in the time period of 2009–2014. Moreover, GMWS exhibited steady fluctuations or a slight upward trend after 2014. From the perspective of quantitative analysis, we calculated the decade change rate of GMWS in these three time periods based on a linear regression analysis of ordinary least squares (OLS). The analysis shows that the daily average GMWS decreased significantly at a rate of <sup>−</sup>0.16 m s−<sup>1</sup> decade−<sup>1</sup> during the period of 2000–2009 (*p* < 0.001). After the turning point of 2009, the increasing rate of 0.42 m s−<sup>1</sup> decade−<sup>1</sup> was significantly higher than the decreasing rate during the period of 2009–2014 (*p* < 0.001). Although the GMWS shows a slight trend in recent years, the result is not statistically significant (*p* > 0.05). The time series of seasonal components indicated that the highest values for the daily average wind speed occurred during the spring, while the lowest values occurred during the autumn. The relationship between the daily average wind speed in different seasons is spring > winter > summer > autumn.

**Figure 3.** The time series decomposition of the average Ground Measurement Wind Speed (GMWS) in Central Asia (CA) from 2000 to 2019. Note: The different colors in the graph on the seasonal component indicate different seasons (blue: Winter (Dec., Jan., Feb.), green: Spring (Mar., Apr., May), red: Summer (Jun., Jul., Aug.), and yellow: Autumn (Sept., Oct., Nov.)).

In order to ensure the consistency of the reanalysis data input by the model and the actual observation data in CA, the trends of four reanalysis data (GLDAS2.1, ERA5, CFSR, and FLDAS) were introduced to conduct comparisons with GMWS. The comparison result showed that the daily average derived from GLDAS2.1 has the highest correlation with the ground measurement wind speed (Supplementary Materials Figure S1). Furthermore, in order to better compare the relationship between the wind speed in trend and seasonal components, we decomposed the GLDAS2.1 wind speed time series into trend component, seasonal component, and random component. Additionally, we calculated the correlation coefficient (r) of the daily average wind speed and components between ground measurement data and reanalysis data. The trend component had the highest correlation coefficient (*r* = 0.829), followed by the seasonal component (*r* = 0.552), daily average wind speed (*r* = 0.125), and random component (*r* = −0.007). On the other hand, according to the trend component, the turning point for the GLDAS wind speed (GLDASWS) time series occurred around 2011 (Figure 4). Moreover, the changing rate of the daily average wind speed is more significant. The result of linear regression shows that the daily average GLDASWS decreased significantly at a rate of <sup>−</sup>0.34 m s−<sup>1</sup> decade−<sup>1</sup> during the period of 2000–2010 (*p* < 0.001). The increasing rate of 1 m s−<sup>1</sup> decade−<sup>1</sup> is significantly higher than the decreasing rate during the period of 2010–2014 (*p* < 0.001) after the turning point of 2010 (Figure 4). The seasonal components of two wind speed data have a broadly similar pattern.

**Figure 4.** The time series decomposition of the average Global Land Data Assimilation System wind speed (GLDASWS) in CA from 2000 to 2019. Note: The different colors in the graph on the seasonal component indicate different seasons (blue: Winter (Dec., Jan., Feb.), green: Spring (Mar., Apr., May), red: Summer (Jun., Jul., Aug.), and yellow: Autumn (Sept., Oct., Nov.)).

#### *4.2. The Spatiotemporal Variation of Wind Erosion across CA*

Figure 5 shows the distribution of the annual mean SWEP in CA over the most recent 20 years (2000–2019). SWEP exhibits significant spatial variation, which has a range of 0–256 kg/m2. The SWEP in the southwest CA is higher than that in the southeast and north CA, where the vegetation coverage and soil moisture are higher. During the past 20 years, the Aral Sea dry lake bed (ASDLB), which is one of the most active dust sources, was the most severe wind erosion area in CA (47.29 kg/m2/y), followed by Kyzylkum Desert (10.64 kg/m2/y), Karakum Desert (10.58 kg/m2/y), and Muyunkum Desert (6.81 kg/m2/y). Due to the dramatic shrinkage of Aral Sea from the second half of the 20th century, the ASDLB, also known as the Aral Sea Desert, was covered with the original salts and chemicals of the water [24]. The toxic sediments of the Aral Sea were blown away by strong winds and formed white sandstorms. These toxic particles from the dry Aral Sea lake bed had been found in Japan, Norway, Greenland, and even in the South Pole [84].

**Figure 5.** The spatial variation of the Soil Wind Erosion Potential (SWEP) over CA between 2000 and 2019.

From 2000 to 2019, the annual mean value of SWEP was 3.45 kg/m2, with the lowest value occurring in 2010 and the highest value occurring in 2015. As mentioned above, the wind speed exhibited a significant increase around 2011, and the area and intensity of wind erosion have also increased significantly since 2011. However, the SWEP gradually decreased by <sup>−</sup>6.85 kg/m2/y, over ASDLB, from 2011 (Figure 5). This reversal may have been caused by the recovery of the water body of the Aral Sea since 2010 [85]. This means that more dry lake beds are covered by water bodies and less bareland suffers from wind erosion. The Amu Darya River (ADR) and Syr Darya River (SDR), which are the two largest rivers across CA, are the principal water suppliers of the Aral Sea. Since the 1960s, many irrigation canals have been constructed in the middle and lower reaches of ADR and SDR [60]. Therefore, according to ESA land cover data, more than 43% of the irrigation cropland of CA survived on these canals, especially in Amu Darya Delta (ADD) and Syr Darya Delta (SDD). The main land cover type of these two deltas is irrigated cropland, which accounts for more than 20% of the total irrigated cropland in CA. Because of the high vegetation coverage, according to Figure 5, the ADD and SDD regions have a lower SWEP than the surrounded area. Similarly, on the edge of the Kyzylkum Desert, oasis agriculture that relies on irrigation also greatly reduces SWEP.

The seasonal variation characteristics of SWEP in CA are shown in Figure 6. Although it is usually consistent with the spatial distribution of the annual mean SWEP, the spatial pattern of SWEP varies in different seasons. As we can see in Figure 6, more land suffered from severe wind erosion in CA during the spring than in other seasons (spring: 1.48 kg/m<sup>2</sup> > summer: 0.70 kg/m2 > winter: 0.69 kg/m2 > autumn: 0.59 kg/m2). Additionally, obvious wind erosion exists in several famous desert regions, such as the Karakum Desert, Kyzylkum Desert, and Aralkum Desert during spring. Due to the impact of snow cover, less wind erosion exists in north CA during the winter. However, the wind speed of the Aral region in winter markedly exceeds that in other seasons, especially in December. Therefore, the ASDLB region has higher SWEP in the winter. Figure 6 shows that the north Kazakhstan region, Kyrgyz, Tajikistan, has been slightly affected by wind erosion. Additionally, in summer, SWEP shows a significant high value in the middle reaches of Amu Darya (Figure 6). As shown in Figure 7, the SWEP displays significant monthly temporal variability, especially in ASDLB. March is the most severe month of wind erosion in the ASDLB. Alternatively, due to strong winds and dry surface soil in December, January, and February, sandstorms frequently occurred in ASDLB. We can confirm this from the true color image of MODIS (Supplementary Figure S2). Due to the difference in the solar radiation energy received by different latitudes, the melting time of snow cover varies in different regions. Figure 6 shows that the center of wind erosion moves from southwest to northeast during the spring (March, April, and May). These factors can explain why significant wind erosion in other famous deserts occurs in different months. The most severe month of wind erosion in Karakum is April, and that in Kyzylkum Desert and Muyunkum Desert is May. However, in general, Central Asia has suffered from the most severe wind erosion in April and the most widespread wind erosion in May.

**Figure 6.** The seasonal variation of SWEP in (**a**) spring (Mar., Apr., May), (**b**) summer (Jun., Jul., Aug.), (**c**) autumn (Sept., Oct., Nov.), and (**d**) winter (Dec., Jan., Feb.) during the period of 2000–2019.

**Figure 7.** The monthly average variation of SWEP during the period of 2000–2019.

#### *4.3. Responses to Wind Speed Change and Land Cover Change*

#### 4.3.1. Impacts of Ground Measurement Wind Speed Changes on the SEWP

Based on the research results mentioned above, the wind speed was found to be the dominant factor of wind erosion. Therefore, in this study, we analyzed the influence of wind speed as a factor of climate change on wind erosion. Based on the ground measurement wind speed data, we investigated the influence of wind speed changes on wind erosion. Figure 8 shows the strong and significant positive correlation (*r* = 0.700, *p* < 0.001) between the average GMWS and average ground SWEP. According to the trend line of these two sets of time-series data, the turning point roughly occurred between 2010 and 2011. The average GMWS showed a slowly decreasing trend (−0.07 ms−<sup>1</sup> decade−1) before the turning point in 2011, while it displayed a significant increasing trend during the period of 2011–2019 (+0.6 ms−<sup>1</sup> decade<sup>−</sup>1, *p* < 0.001). The recent increasing rate is almost tenfold the decreasing rate in the first decade. The average ground SWEP also showed a slightly decreasing trend during the period of 2000–2010 (−0.027 kgm−<sup>2</sup> decade−1), while it displayed a significant increasing trend (+0.37 kgm−<sup>2</sup> decade<sup>−</sup>1, *p* < 0.001). Shao, et al. [86] found that the global monthly mean dust concentration decreased from 2000 to 2012. This shows that the end of the quiet period of dust activities in Central Asia or globally marks the beginning of an active period of dust activities. Based on current research, it seems reasonable to relate the dust trend to the climate trend, especially the reversal in global terrestrial stilling [22].

**Figure 8.** Average GMWS and average SWEP across CA from 2000 to 2019.

Additionally, the highest monthly average SWEP, which appeared in May 2014, was more than 2.4 kg/m2. The seven months with the strongest wind erosion (*SWEP* > 1.5 kg/m2) were March (2), April (2), May (2), and December (1). There were nine months (May: 3, March: 3, April: 2, Decembeer: 1) with an average wind speed greater than 3.5 m/s. Moreover, the highest monthly average wind speed, with a value of 3.79 m/s, appeared in Decembr 2015. Overall, a similar tendency of wind speed and wind erosion was observed for the past two decades. Both show a distinguishable declining trend, and then a sudden remarkable increase, and before slowly declining or finally stabilizing. During the study period, the wind speed (+0.38 ms−<sup>1</sup> decade−1, *p* < 0.001) and SWEP (+0.34 kgm−<sup>2</sup> decade−1, *p* < 0.001) increased very quickly from 2000 to 2019, indicating a more serious soil degradation and air pollution problem in CA.

#### 4.3.2. Divergence of SWEP from Different Land Cover Types

According to the land characteristics of the study area, the ESA CCI land cover types can be reclassified into nine categories (cropland irrigated, cropland rain-fed, forestland, shrubland, grassland, sparse vegetation land, bareland, urbanland, and waterbody). The last subplot of Figure 9 shows the areas of different land cover types across CA in 2018. Figure 9 shows that the monthly average SWEP and its change rate of different land cover types were substantially different. The monthly average SWEP of bareland was more than 0.836 kg/m2, followed by shrubland (0.572 kg/m2), sparse vegetation land (0.203 kg/m2), grassland (0.073 kg/m2), irrigated cropland (0.043 kg/m2), rainfed cropland (0.033 kg/m2), and forestland (0.017 kg/m2). This relationship is basically consistent with previous research conducted in CA and surrounding regions [20,87]. We also compared the soil wind erosion modulus with respect to regions with similar conditions or the use of different methodologies (Table 2). Li, et al. [20] assessed the soil wind erosion modulus variation in CA (including Xinjiang, China) between 1986 and 2005. Zhang, et al. [87] investigated the RWEQ-based soil wind erosion, which was validated by 137Cs in Inner Mongolia (IM), during the time period of 1990–2015. Compared to other arid or semiarid regions, CA has relatively higher rates of soil wind erosion, which may be the result of the widespread distribution of deserts and wind speed increase in the past decade. Grassland has a relatively lower soil wind erosion rate, because it is mainly distributed in northern CA, which has a more humid climate and less erodible underlying surface condition [12]. Although the time periods and dataset sources were different, from the perspective of the wind erosion diversity of land cover, the wind erosion result of our research is reliable.

**Figure 9.** Monthly change of the average SWEP of different land cover types and the area of different land cover types across CA in 2018.

There was a significant increase in SWEP in CA in the past two decades. However, the change rate of wind erosion varied among regions with various types of land cover. Specifically, the change rate of wind erosion was the highest for bareland (0.0978 kg/m2/y), followed by shrubland (0.0730 kg/m2/y), sparse vegetation land (0.0206 kg/m2/y), grassland (0.0062 kg/m2/y), irrigated cropland (0.0045 kg/m2/y), rainfed cropland (0.0038 kg/m2/y), and forestland (0.0016 kg/m2/y). Combined with the areas of different land covers, we could calculate the total amount of soil wind loss for different land covers in the period of 2000–2019. More than 2.8255 <sup>×</sup> 1011t soil was eroded by the wind across CA during the past few decades. The soil wind erosion of bareland (1.838 <sup>×</sup> <sup>10</sup>11t) contributed more than 65% soil loss by the wind in CA, followed by shrubland (0.3907 <sup>×</sup> 1011t), sparse vegetation land (0.3380 <sup>×</sup> 1011t), grassland (0.1812 <sup>×</sup> 1011t), rainfed cropland (0.0517 <sup>×</sup> 1011t), irrigated cropland (0.0223 <sup>×</sup> <sup>10</sup>11t), and forestland (0.0028 <sup>×</sup> 1011t).


**Table 2.** Soil wind erosion rate studies in CA and other regions with similar conditions.

According to the continuous changes in the land cover area in CA during the period of 2000–2018, the cropland, forestland, urbanland, and shrubland showed an increased trend, while bareland and grassland showed a decreased trend (Supplementary Table S1). More than 2.67 <sup>×</sup> 10 km2 land had undergone land cover change, including from bareland to grassland, sparse vegetation land to grassland, grassland to cropland (rain-fed), sparse vegetation land to cropland (rain-fed), and waterbody to bareland. In order to remove wind speed variability effects on SWEP, we calculated the annual SWEP of 2018 based on the wind speed data of 2000 to compare it with the annual SWEP of 2000. We found that the LCCs with the strongest inhibitions of wind erosion activity were bareland into shrubland (−0.782 kg/m2), sparse vegetation land into forestland (−0.106 kg/m2), and grassland into forestland (−0.073 kg/m2). In comparison, the conversions of land cover which accelerated wind erosion the most were waterbody into bareland (+3.784 kg/m2), sparse vegetation land into bareland (+1.124 kg/m2), and grassland into bareland (+0.490 kg/m2).

#### *4.4. Validation of the GEE-RWEQ Model*

Due to lack of long time series and wide range of ground-measured wind erosion data in CA, validation of the GEE-RWEQ model is challenging. Furthermore, due to the large area and complex terrain conditions, almost no previous research has conducted wind erosion field measurements in CA. Considering that most of the local dust storms are caused by surface wind erosion in CA [24], the dust storm index (DSI) can be used as a proxy to evaluate the wind erosion model performance [75]. Therefore, we used the DSI based on weather station visibility records to evaluate the reliability of the SWEP spatial distribution. Figure 10 displays the annual mean DSI across CA from 2000 to 2019. This map was interpolated by the annual average DSI of more than 200 weather stations based on Natural Neighbor Interpolation method. From Figure 10, we can see that the southwest desert region of CA has a high DSI, which means very frequent dust storms. However, there are some high values in the southeastern parts of CA, where there is a lower wind erosion risk. According to the research of Liu, et al. [89], affected by the strong southwest winds, the dust particles were transported from the western desert to eastern mountains and valley. Moreover, dust episodes were observed in these regions. Additionally, the southeastern parts of CA are the most densely populated areas in CA. Most of the weather stations across CA are located around densely populated cities. The anthropogenic pollutants will also be recognized as dusty weather due to reduced visibility. The spatial distribution of DSI is generally consistent with the spatial pattern of the annual SWEP. Although visibility records based on weather stations are a valuable and useful data resource for wind erosion monitoring, several limitations still exist. The low spatial density of weather stations is a challenge for conducting highly accurate wind erosion mapping, especially in the southeastern part of CA. Therefore, we need to obtain higher spatial resolution and more continuous SWEP verification data. It should be pointed out that satellite-based atmospheric aerosols, which refer to solid and liquid particles suspended in the atmosphere, have strong spatial correlations with wind erosion.

**Figure 10.** The location of weather stations that provide visibility records and the annual mean Dust Storm Index (DSI) for the period 2000 to 2019 across CA.

Figure 11 shows the spatial pattern of the annual AOD (a) and average AAI in 2019 (b), as well as the comparisons with SWEP in the Aral Sea region (ASR). From Figure 11, we can see that the Aral Sea region and its southwest surrounding area has the highest value in CA. This is because the dust of the Aral Sea is transported to the southwest under the action of the dominant wind-northeast wind [90]. Therefore, we chose ASDLB as a research hotspot area to compare with SWEP. Linear relationships between SWEP and aerosol parameters (AOD and AAI) were found, as shown in Figure 11. The results show that they had moderate positive linear relationships, with *r* values of 0.5623 (*p* < 0.001) and 0.5660 (*p* < 0.001), respectively. Ultimately, although it is difficult to verify the SWEP value, the comparison results obtained from the perspective of spatial and temporal distribution patterns showed that the RWEQ-based SWEP data in CA were reliable.

**Figure 11.** The spatial distribution of the annual average MODIS Aerosol Optical Depth (AOD) (**a**) and Sentinel-5P Absorbing Aerosol Index (AAI) of 2019 (**b**), and the comparisons of the average SWEP and AOD and AAI in the Aral Sea region.

#### **5. Discussion**

In this study, our results show that there are significant spatial and temporal differences in the wind erosion in CA. Controlled by the latitude zonality and vertical zonality, higher SWEP values are primarily distributed in the southwestern part of CA, which has low vegetation coverage and more fragile surface soil [91], while lower SWEP is mainly concentrated in the northern part of CA. Based on the land cover map of 2018, more than 89% of rain-fed cropland and 78% of forestland are distributed in these regions. Furthermore, due to latitude zonality and vertical zonality, the precipitation in these areas is higher than in other places of CA, and the temperatures in these areas are lower. Therefore, the higher soil moisture caused by lower evapotranspiration will reduce the wind erosion to a certain degree. Affected by the restoration of the Aral Sea in recent years, the vegetation coverage and other underlying surface factors of ASDLB are getting better [85]. Although the SWEP showed a decreasing trend in ASDLB (−6.85 kg/m2/y), this region was still the most severe wind erosion area in CA. In addition, our results show that the SWEP has a clear seasonal and monthly variation. The land threatened by wind erosion has the largest range in spring, especially in May. Due to the difference in solar radiation heat at different latitudes and altitudes, the snowmelt period varies in different regions of CA [92]. The higher soil moisture caused by snowmelt and the snow cover both affected the movement of the wind erosion center across CA during spring [37]. Therefore, considering the major deserts of CA are located in different latitudes and altitudes, the severe wind erosion regions in CA will also migrate over time. The severe wind erosion regions move from the southwest CA (Karakum Desert) to the middle CA (Kyzylkum Desert and Muyunkum Desert) during the spring (MAM). However, due to the special meteorological conditions in certain areas of CA, such as the middle reaches of the ADR, severe wind erosion occurs in summer, but not spring, respectively [24]. We calculated the monthly average wind speed of four weather stations in this region. A comparison was made for the monthly average wind speed of all the weather stations in CA and the four weather stations in this region (Supplementary Figure S3). The most extreme value of the surface wind speed in CA appeared in March, while the most extreme value of the wind speed in the middle reaches of the ADR appeared in July. This result demonstrates that wind speed plays the key role in the spatial distribution of SWEP.

As the most dominant factors of wind erosion, wind speed variability, such as wind stilling or wind stilling reversal, account for the majority of the spatial-temporal variation of wind erosion in CA. Although global terrestrial stilling has been confirmed by many pieces of research, most studies have only looked at global or regional wind speed changes from the 1980s to 2010, and few have involved recent (after 2010) wind speed changes [14,17–19]. According to several climate assimilation datasets (GLDAS, ERA5, CFSR, and FLDAS) and a ground measurement dataset (GSOD), we found a turning point of wind speed stilling during the period of 2009–2012 in CA. This finding is supported by other studies that have reported that global terrestrial stilling has rebounded over the past few decades and has increased rapidly since 2010 [21–23]. Our research proves that the increase rate of the average wind speed in CA (0.6 m s−<sup>1</sup> decade−1) is higher than the increase rate of the average global wind speed (0.24 m s−<sup>1</sup> decade<sup>−</sup>1) over the same period, which means that stronger wind erosion occurred in CA. Indeed, the result shows a strong and significant positive correlation (*r* = 0.7) between the average GMWS and average SWEP (*p* < 0.001). A number of studies have demonstrated that CA is more sensitive to climate change compared to the global average [12,20,57,60,91,93]. While it is widely acknowledged that the global wind speed rebound is beneficial to the wind power industry for the near future [22], this study suggests that more severe wind erosion activity happened in CA.

According to the significant differences in natural conditions, such as the air temperature and precipitation, and the disturbance of human activities such as irrigation, the land cover in CA exhibited strong spatial differentiation. We found that SWEP differs greatly in different land cover types. This result is roughly consistent with a previous study on wind erosion in CA and surrounding regions [20,87]. Most of the shrubland in CA is made up of deserts and xeric shrublands in which *Haloxylon ammodendron*, *Calligonum aphyllum*, and *Ephedra lomatolepis*, as well as grasses such as

*Agropyron fragile*, grow [94–96]. These kinds of vegetation have good windproof and sand fixing functions in CA. In addition, LCCs are closely related to wind erosion activity and affect each other. From 2000 to 2018, more than 2.6 <sup>×</sup> 105 km<sup>2</sup> of land has changed land cover types (Supplementary Table S1). We compared the SWEP for the land with the LCC from 2000 to 2018, in which the effect of the wind speed variability was removed. The conversion of bareland to shrubland helped reduce wind erosion by <sup>−</sup>0.782 kg/m2/y. Furthermore, due to the shrinkage of the Aral Sea, the conversion of waterbody to bareland increased wind erosion by +3.784 kg/m2/y. The restoration of the Aral Sea has not only superficially reduced the possibility of wind erosion in ASDLB, but also increased the increased vegetation coverage of ASDLB caused by the higher groundwater level, making the long-distance transport of dust difficult. Although the wind speed has shown an increasing trend in the past 10 years, the wind erosion risk in the Aral Sea area is gradually decreasing due to the continuous recovery of the Aral Sea area. In the past 30 years, a large number of engineering projects have attempted to improve the Aral Sea environment, directly or indirectly. Although Aral sea restoration is the most effective way to restrain wind erosion in ASDLB, the complex political relations among countries in the Aral Sea basin make cross-border water management difficult [84]. Besides, in other parts of CA, more effective measures should be taken for wind erosion artificial control, for example, increasing the grassland area in regions with a suitable temperature and precipitation, developing cropland by using limited water resources, and planting cold- and drought-resistant shrub vegetation in bareland.

In this study, the RWEQ model was adopted as a wind erosion model in the GEE cloud computing platform. Compared with the local computing platform, GEE can process large amounts of geospatial data in a short time, which means that its processing power is completely unconstrained by time and space. Therefore, we do not need to spend a lot of time on downloading, preprocessing, and model running of a large amount geo-spatial data, which can greatly shorten the time required for long-term wind erosion mapping. As mentioned above, GEE-RWEQ provides the possibility of wind erosion monitoring in developing regions lacking on-site monitoring data. Meanwhile, the GEE platform makes it easier for researchers to publish their results for decision makers, and even the public [55]. Therefore, our research has a broader application value for decision makers than previous studies on wind erosion. In the future, we will interactively develop Earth Engine App to explore our result, which can then be used by experts and non-experts alike.

The RWEQ is a process-based, field-scale, empirical model that can quantitatively estimate wind erosion. However, the RWEQ was initially developed for the middle western area of the United States [37]. Therefore, it still presents some limitations in other regions [37,50,72,97]. Although the most important input parameters were retained in this study, the dataset required by some parameters was unavailable on GEE platforms. Therefore, several factors were simplified to simulate the global-scale wind erosion more effectively and more accurately. The soil moisture data were used to simulate how the surface wetness influences the wind speed required to erode the soil. Additionally, the cosine of the slope gradient which was calculated by DEM represented the soil roughness factor. On the other hand, we only used wind erosion-related data such as visibility data and remote sensing data for the verification of wind erosion in CA, but this still has uncertainties on a global scale. Central Asia, which is one of the most severe wind erosion regions, is restricted in terms of wind erosion modeling studies due to the lack of wind erosion measurement data for this region. Therefore, more ground soil loss measurement data on a global scale should be acquired to conduct more verification studies.

#### **6. Conclusions**

In this study, we developed a fully automated algorithm for quantitatively mapping wind erosion based on the Google Earth Engine, processed terabytes of geo-spatial data, and retrieved spatial and temporal patterns of monthly SWEP in CA, over 20 years (2000 to 2019). Several conclusions were reached in our study, as follows:

(1) With respect to the conventional methods, GEE-RWEQ does not require any ground measurement data, which need lots of manpower and resources, especially in developing countries or sparsely populated regions. However, based on the Cloud computing platform, GEE-RWEQ uses climate assimilation data, soil property data, vegetation data, terrain data, and other underlying data to automatically generate high spatial resolution NRT soil wind erosion potential products. After verification using ground observation-based DSI and satellite-based AOD, the results still reach an acceptable accuracy and can be used for quantitative wind erosion mapping. This methodology provides new ideas for the construction and use of empirical models based on batch geospatial data and high-performance computing;

(2) According to the comparison of GMMS and SWEP, the wind speed is the main driving factor of wind erosion (*r* = 0.7, *p* < 0.001). Affected by the wind speed variability, the SWEP decreased first and increased remarkably during 2011. From the perspective of the temporal and spatial distribution, due to the sparse vegetation distribution and special meteorological conditions, the deserts in southwestern Central Asia are most affected by wind erosion, especially in ASDLB (47.29 kg/m2/y). The severe wind erosion period of CA occurred in spring (MAM), especially in May. We also found that the SWEP distribution has obvious latitude zonality due to the distribution of snow cover and the start time of snow melt, and the wind erosion hot spot in spring moves from the southwest to central area across CA;

(3) Land cover change has strong effects on the soil wind erosion in CA, with the most obvious being the conversion of bareland into the water body in ASDLB. Affected by the restoration of the Aral Sea, the SWEP in this area has shown a downward trend (−6.85 kg/m2/y) since 2011. Additionally, the conversion of bareland to shrubland helped reduce wind erosion by <sup>−</sup>0.782 kg/m2/y. According to the SWEP variation based on LCC, more effective measures should be taken to maintain wind erosion artificial control, for example, restoring the Aral Sea water area to prevent more bareland from being exposed to wind erosion, increasing the grassland area in regions with a suitable temperature and precipitation, developing cropland by using limited water resources, and planting cold- and drought-resistant shrub vegetation in bareland.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2072-4292/12/20/3430/s1, Figure S1: Comparisons of the wind speed trend for different datasets (FLDAS, CFSR, ERA5, GLDAS2.1, and GMMS); Figure S2: Dust events in the Aral Sea during winter captured by MODIS; Figure S3: Monthly average wind speed of CA and the other four weather stations in the middle reaches of Amu Darya; and Table S1: The areas of different land cover types in Central Asia during the period of 2000–2018.

**Author Contributions:** Conceptualization, W.W.; data curation, W.W., Y.G., and A.T.; formal analysis, W.W. and Y.G.; funding acquisition, J.A.; methodology, W.W. and A.S.; project administration, A.S. and L.M.; resources, A.S. and J.A.; software, W.W. and S.Z.; supervision, J.A.; writing—original draft, W.W.; writing—review and editing, W.W. and A.S. All authors have read and agree to the published version of the manuscript.

**Funding:** This research was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences, grant number XDA20060301; by the Youth Innovation Promotion Association Foundation of the Chinese Academy of Sciences under Grant 2018476; and in part by the National Natural Science Foundation of China, grant number 42071424.

**Acknowledgments:** The authors would like to give special thanks to the Google Earth Engine team for their support and allowing access. We appreciate the FLDAS, CFSR, ERA5, GLDAS, MODIS, Sentinel5P, and OLM soil properties data made available via the GEE. We are grateful for meteorological station data provided by the National Oceanic and Atmospheric Administration (NOAA). The GEE JavaScript code for retrieving SWEP can be made available by contacting the first author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Using Synthetic Remote Sensing Indicators to Monitor the Land Degradation in a Salinized Area**

**Tao Yu 1,2, Guli Jiapaer 1,3,\*, Anming Bao 1,3, Guoxiong Zheng 1,2, Liangliang Jiang 4, Ye Yuan 1,2,5 and Xiaoran Huang 1,2**


**Abstract:** Land degradation poses a critical threat to the stability and security of ecosystems, especially in salinized areas. Monitoring the land degradation of salinized areas facilitates land management and ecological restoration. In this research, we integrated the salinization index (SI), albedo, normalized difference vegetation index (NDVI) and land surface soil moisture index (LSM) through the principal component analysis (PCA) method to establish a salinized land degradation index (SDI). Based on the SDI, the land degradation of a typical salinized area in the Central Asia Amu Darya delta (ADD) was analysed for the period 1990–2019. The results showed that the proposed SDI had a high positive correlation (R<sup>2</sup> = 0.89, *p* < 0.001) with the soil salt content based on field sampling, indicating that the SDI can reveal the land degradation characteristics of the ADD. The SDI indicated that the extreme and strong land degradation areas increased from 1990 to 2019, mainly in the downstream and peripheral regions of the ADD. From 1990 to 2000, land degradation improvement over a larger area than developed, conversely, from 2000 to 2019, and especially, from 2000 to 2010, the proportion of land degradation developed was 32%, which was mainly concentrated in the downstream region of the ADD. The spatial autocorrelation analysis indicated that the SDI values of Moran's I in 1990, 2000, 2010 and 2019 were 0.82, 0.78, 0.82 and 0.77, respectively, suggesting that the SDI was notably clustered in space rather than randomly distributed. The expansion of unused land due to land use change, water withdrawal from the Amu Darya River and the discharge of salt downstream all contributed to land degradation in the ADD. This study provides several valuable insights into the land degradation monitoring and management of this salinized delta and similar settings worldwide.

**Keywords:** land degradation; salinization; remote sensing index; salinized land degradation index (SDI); Amu Darya delta (ADD)

#### **1. Introduction**

Land degradation can lead to reduced land productivity, population displacement, food insecurity and the destruction of ecosystems [1]. The report from the National Forestry and Grassland Administration of China (http://www.forestry.gov.cn/, accessed on 24 June 2021) shows that 197 countries have signed the United Nations Convention to Combat Desertification (UNCCD) as of January 2019; the problem has not been alleviated in recent decades and has instead progressively worsened [2,3]. Monitoring land degradation and revealing its characteristics is essential for the management and restoration of land quality.

**Citation:** Yu, T.; Jiapaer, G.; Bao, A.; Zheng, G.; Jiang, L.; Yuan, Y.; Huang, X. Using Synthetic Remote Sensing Indicators to Monitor the Land Degradation in a Salinized Area. *Remote Sens.* **2021**, *13*, 2851. https:// doi.org/10.3390/rs13152851

Academic Editor: Bas van Wesemael

Received: 29 May 2021 Accepted: 16 July 2021 Published: 21 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Salinization induces land degradation, and this ecological problem is more prevalent in drylands [4]. In particular, in most irrigated areas of Central Asia, high-salinity water is used for irrigation, resulting in secondary salinization, which exacerbates land degradation [5]. Moreover, a severe ecological disaster was initiated with the gradual retreat of the Aral Sea due to the massive expansion of agricultural practices [6]. Consequently, the ecosystem around the Aral Sea has been almost destroyed, especially in the Amu Darya delta (ADD) [7–9], which has become one of the most severely degraded areas worldwide due to salinization [10]. The land degradation of the ADD caused by high levels of soil salinization has led to ecological and socio-economic problems, such as the withering of vegetation [5] and reduced agricultural yields [8]. Moreover, the ecology of the ADD is vulnerable to hydrological changes due to its dry climate. With the expansion of agricultural land, the structure of the water systems in the ADD has changed dramatically [8]. Numerous natural lakes and wetlands have disappeared and transformed into sparse vegetation or bare land [10,11]. The increasing land degradation is threatening the stability of ADD ecosystems [10]. However, the characteristics of land degradation in such a saline region remain unclear.

In recent decades, land degradation has attracted considerable research attention worldwide. Different indicators (e.g., vegetation index [12], desertification index [13], etc.) and methods (e.g., Analytic Hierarchy Process (AHP) [14], Entropy Weighting and Delphi [15], etc.) have been used to monitor land degradation. These studies have facilitated the understanding of the mechanism of land degradation at the regional and global scales. However, the characteristics of land degradation are different for each region (e.g., rocky desertification [16], sandy [17], salinization [18], etc.) These studies do not take into account the main characteristics of regional land degradation when establishing a land degradation assessment framework or index, which may affect the accuracy of the monitoring results. Previous studies have confirmed that the factors affecting land degradation vary region-wise [10,19]. Therefore, selecting indicators that are representative of the ecological characteristics of the region during the assessment can increase the rationality of the land degradation assessment. Then, in saline areas, the salinization index (SI) [20], which reflects information on soil salinity, should be considered when monitoring land degradation. In addition, indicators such as the normalized difference vegetation index (NDVI) [21], albedo [22,23] and soil moisture [13,24], extracted from remote sensing data, have been widely used to monitor regional land degradation. The NDVI is one of the most widely used indicators to monitor the land degradation, as it can accurately reflect the vegetation greenness and biomass information [25,26]. The surface albedo is closely related to the soil exposure. The increase in albedo can be used as an indirect indicator to detect the soil degradation in drylands [12,27]. Moreover, the land surface soil moisture index (LSM) can reflect the soil water content and is a key indicator to monitor the land degradation in drylands [28,29].

The combination of the aforementioned (NDVI, LSM, SI and albedo) indicators can provide a comprehensive understanding of land degradation in salinized areas for the reference of regional land management [10,12,30]. Thereby, a salinized land degradation index (SDI), including information on the salinity, vegetation, soil moisture and bareness, needs to be constructed to reflect the land degradation characteristics of salinized areas. Recently a method based on the principal component analysis (PCA) was developed to assess the regional ecological conditions [31–33]. The PCA method is a multidimensional data compression technique. This method allows the characteristics of the indicators to be coupled, and the weights of each factor are automatically and objectively assigned according to the contribution of each factor to the principal component [31,34]. In contrast to weighting methods such as AHP and Delphi, PCA prevents variations or errors in the definitions of weights caused by individual subjective experience [35,36]. Therefore, in this study, we attempted to (1) construct the SDI based on the PCA, (2) assess the reliability of the SDI in monitoring land degradation in salinized areas and (3) explore the spatial and temporal patterns of land degradation. Finally, the potential driving factors related

to land degradation were discussed. Understanding the spatiotemporal characteristics of land degradation can likely contribute to the management of ADD land and sustainability of the ecosystem and provide guidance for future studies.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The ADD is located south of the Aral Sea, downstream of the Amu Darya River (Figure 1). The region runs through Turkmenistan and Uzbekistan and covers an area of 6.3 × 104 km2. The runoff from large permanent glaciers in the mountains and melting snow are the main water sources of the Amu Darya River [37]. The ADD has a typical continental climate characterised by extreme dryness throughout the year. The average annual temperature is approximately 13 ◦C [37]. The potential annual evapotranspiration can be as high as 1600 mm, and the average annual precipitation is less than 100 mm [37]. Such a dry climate makes the ADD one of the most ecologically fragile regions worldwide [9].

**Figure 1.** Location of the Amu Darya delta (ADD). Colour composite map of Landsat-8 OLI images in the ADD for 2019 in a colour combination of shortwave-infrared band 1, near-infrared and red band. Green represents vegetation, and brown represents bare soil.

However, as the main grain-producing region of the Aral Sea basin, dry climate and land use changes due to agricultural expansion have further exacerbated the salinization of the ADD [38]. Moreover, the changes in the political system after the disintegration of the Soviet Union have led to intensified conflicts in the use of water resources among different countries in the region, resulting in land degradation and a decline in the stability of the ecological system [39]. In this regard, a series of ecological conservation and restoration projects have been implemented or are about to be launched to mitigate the land degradation caused by salinization of the ADD [40]. Within this context, it is essential to investigate the spatial and temporal characteristics of the land degradation in the ADD, a typical salinized region, to provide reference for the ecosystem management of the delta.

#### *2.2. Data and Pre-Processing*

The datasets used in this study included satellite images, field soil salinity, temperature and precipitation, land use and water withdrawal and salt discharge data. The satellite images from the United States Geological Survey (http://earthexplorer.usgs.gov/, accessed on 5 June 2020), acquired on 19 July and 26 July 1990, July 21 and 30 July 2000, 10 July and 17 July 2010 (Landsat-5 TM) and 4 August and 11 August 2019 (Landsat-8 OLI/TIRS) were used. The selected images were in the growing season, and the time phase was similar, and there was basically no cloud coverage, which ensured the accuracy of the remote sensing index calculations [41,42]. The images were first radiometrically corrected with ENVI 5.1 software to convert the digital numbers to irradiance values and later atmospherically corrected using the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) module to eliminate the effects of noise generated during the imaging process [43,44]. The images acquired at different times were geometrically corrected using the two polynomials, and the root mean square error was controlled within 0.5 pixels [33]. Finally, the images were clipped based on the ADD boundary, and the water bodies were masked using the modified normalised difference water index (MNDWI) [45].

The salinity data sampled in the field on 18 March 2019 (see Figure S1 and Table S1 for details) were used to explore the feasibility of the SDI in assessing the land degradation in salinized areas. The average annual precipitation and temperature records during 1980– 2016 were derived from Nukus Station in the ADD. The annual statistical data on water withdrawal and salt discharge data for 1990–2015 were obtained from the Amu Darya River basin database of the Inter-State Commission for Water Coordination of Central Asia (ICWC, http://www.cawaterinfo.net/, accessed on 10 June 2020). The land use data were obtained from the Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences and interpreted based on the Landsat images from 1990, 2000, 2010 and 2019 that have been applied in related studies in the ADD [10,46]. Following this dataset, the land use types were divided into five categories: cropland, forest, grassland, built-up land and bare soil.

#### *2.3. Construction of the SDI*

#### 2.3.1. SI

Salinization is the major factor influencing the land degradation of the ADD. The SI extracted from the remote sensing image has been shown to be able to assess the characteristics of regional salinization [47–49]. In this work, an inversion model that has been proved to be applicable in the ADD [10,50] was selected to construct the SI. The model was derived using the following equation [20]:

$$\text{SI} = \sqrt{\rho\_{\text{Blue}} \times \rho\_{\text{Red}}} \tag{1}$$

where ρBlue and ρRed are the blue and red bands of the Landsat TM and OLI imagery, respectively.

#### 2.3.2. Albedo

The albedo is a key physical parameter related to the soil exposure [51]. In general, the albedo is higher in desert areas due to the sparse vegetation and soil exposure, and areas with high vegetation cover exhibit a lower albedo [23,52]. Therefore, the albedo was chosen to represent the surface exposure, and it was calculated as follows [51]:

$$\begin{array}{l}\text{Albedo} = 0.356 \text{ } \rho\_{\text{Blue}} + 0.130 \text{ } \rho\_{\text{Red}} + 0.373 \text{ } \rho\_{\text{NIR}} + 0.085 \text{ } \rho\_{\text{SWIR1}}\\ + 0.072 \text{ } \rho\_{\text{SWIR2}} - 0.018 \end{array} \tag{2}$$

where ρBlue, ρRed, ρNIR, ρSWIR1 and ρSWIR2 denote the blue, red, near-infrared and two shortwave-infrared bands of the Landsat TM and OLI imagery, respectively. These bands are the same as those referred to in the following text.

#### 2.3.3. NDVI

The NDVI is based on the structure absorbed by the plant leaf surface that reflects the parameters of the plant biomass and vegetation coverage [53]. This parameter has been successfully used to monitor land degradation at different scales [54], and its expression is as follows:

$$\text{NDVI} = (\mathfrak{p}\_{\text{NIR}} - \mathfrak{p}\_{\text{Red}}) / (\mathfrak{p}\_{\text{NIR}} + \mathfrak{p}\_{\text{Red}}) \tag{3}$$

#### 2.3.4. LSM

The LSM is crucial to regulate the vegetation productivity, and it directly affects the regional land degradation [55]. The LSM can be calculated using the tasselled cap transformation through the following formulas [56,57]:

$$\rm{LSM\_{TM}} = 0.0315 \text{ } \rho\_{\rm{Blue}} + 0.2021 \text{ } \rho\_{\rm{Green}} + 0.3102 \text{ } \rho\_{\rm{Red}} + 0.1594 \text{ } \rho\_{\rm{NIR}} \\ -0.6806 \text{ } \rho\_{\rm{SWIR1}} - 0.6109 \text{ } \rho\_{\rm{SMIR2}}$$

$$\text{LSM}\_{\text{OL1}} = 0.1511 \text{ } \rho\_{\text{Blue}} + 0.1972 \text{ } \rho\_{\text{Green}} + 0.3283 \text{ } \rho\_{\text{Red}} + 0.3407 \text{ } \rho\_{\text{NIR}} \\ -0.7117 \text{ } \rho\_{\text{SWIR1}} - 0.4559 \text{ } \rho\_{\text{SWIR2}}$$

where ρ denotes the corresponding bands of the Landsat TM and OLI imagery.

#### 2.3.5. Constructing SDI Based on PCA

In this study, based on previous studies [33,58,59], the PCA method was used to synthesize the four selected indicators (SI, albedo, NDVI and LSM) to construct the SDI. Before using the PCA method to couple the SDI for 1990, 2000, 2010 and 2019, respectively, it is necessary to normalise the four indicators in the range of 0 to 1 [58] by Equation (6). The SDI obtained using Equation (7) and higher values of the SDI revealed more severe land degradation. Figure 2 illustrates the processing for SDI.

$$I\_{normal} = (\mathbf{I} - \mathbf{I}\_{\rm min}) / (\mathbf{I}\_{\rm max} - \mathbf{I}\_{\rm min}) \tag{6}$$

$$\text{SDI} = \text{PC1}[\text{f(SI, Allbedo, NDVI, LSM)}] \tag{7}$$

where I*noraml* is the index value after standardisation, I is the numerical value of this index, and Imax and Imin are the maximum and minimum values of the relevant index, respectively.

#### *2.4. Spatial Autocorrelation Analysis*

A spatial autocorrelation analysis is an effective way to test whether the values of adjacent samples of a spatial variable are correlated [60]. In this study, the global Moran's I index and the local indicator of spatial association (LISA) were used to analyse the spatial correlation of the SDI.

Moran's I generates a global assessment for spatial autocorrelation, with Moran's I values ranging from −1 to 1 [61,62]. Moran's I value > 0 means that the SDI has a positive spatial autocorrelation, while Moran's I value < 0 means that the SDI has a negative spatial autocorrelation. The closer the value to 1, the stronger the positive spatial autocorrelation, and the closer the value to −1, the stronger the negative spatial autocorrelation. Moran's I = 0 means that there is no spatial autocorrelation, and the SDI has a random spatial distribution. The global Moran's I index was calculated using the following equations:

$$\mathbf{I} = \frac{\sum\_{\mathbf{i}}^{n} \sum\_{\mathbf{j} \neq \mathbf{i}}^{n} \mathcal{W}\_{\mathbf{i}\mathbf{j}} (\mathbf{x}\_{\mathbf{i}} - \overline{\mathbf{x}}) (\mathbf{x}\_{\mathbf{j}} - \overline{\mathbf{x}})}{\mathbf{S}^{2} \sum\_{\mathbf{i}}^{n} \sum\_{\mathbf{j} \neq \mathbf{i}}^{n} \mathcal{W}\_{\mathbf{i}\mathbf{j}}} \tag{8}$$

$$\mathbf{S}^2 = \frac{1}{\mathbf{n}} \sum\_{\mathbf{i}}^n \left( \mathbf{x}\_{\mathbf{i}} - \overline{\mathbf{x}} \right)^2 \tag{9}$$

$$\overline{\mathbf{x}} = \frac{1}{\mathbf{n}} \sum\_{\mathbf{i}(\mathbf{j})}^{\mathbf{n}} \mathbf{x}\_{\mathbf{i}(\mathbf{j})} \tag{10}$$

$$Z\_{\text{Score}} = \frac{1 - \mathcal{E}(\mathbf{I})}{\sqrt{\text{Var}(\mathbf{I})}} \tag{11}$$

where xi and xj are the values of the SDI at spatial locations i and j, respectively, x is the mean value of the SDI, S<sup>2</sup> is the mean squared deviation of the SDI, Wij is the spatial weight value, which is expressed by the n-dimensional matrix W(n×n), var(I) is the variance of Moran's I, and E(I) is the expected value of Moran's I.

**Figure 2.** Flowchart of construction salinized land degradation index (SDI). MNDWI: modified normalised difference water index; NDVI: normalised difference vegetation index; LSM: land surface soil moisture index; SI, salinization index; PCA: principal component analysis; SDI: salinized land degradation index.

LISA is a local statistical method for spatial variables that reveals the spatial clustering characteristics of observations within spatially adjacent regions [33,60]. A positive LISA value indicates that the SDI is similar to the adjacent value and reveals a spatial pattern of high–high clustering (H–H, high values are near other high values) or low–low clustering (L–L, low values are near other low values). A negative LISA value indicates that the SDI is a spatial outlier and can include a high–low outlier value (H–L, a high value is near a low value) and a low–high outlier value (L–H, a low value is near a high value).

#### **3. Results**

#### *3.1. Integration of the Remote Sensing Indexes Based on PCA*

The SDI of the ADD was calculated by the PC1 of the four indicators. The PC1 results shown in Table 1 indicated that, during the studied years, the percent eigenvalues of PC1 were higher than 78%, revealing that PC1 integrated most of the information of the four indictors. Therefore, PC1 was chosen to construct the SDI in this study. The loading values of the four variables in PC1 were divided into two types according to their signs. The albedo and SI comprised one type with loading values that were positive, and the NDVI and LSM comprised the second type with loading values that were negative. The opposite

signs of the two variables indicated that the corresponding contributions to the SDI value were opposing.

**Table 1.** Loadings of the four selected variables on the first principal component (PC1) and associated contributions in different study years.


SI: salinization index (SI); NDVI: normalised difference vegetation index; LSM: land surface soil moisture index.

The descriptive statistics of the PC1 indicated that the average PC1 value increased from 0.42 in 2010 to 0.43 in 2000, and the medium value of PC1 also increased from 0.38 to 0.40 (Table 2). From 2010 to 2019, on the contrary, the average PC1 value decreased from 0.41 to 0.30, and the medium value of PC1 also decreased from 0.38 to 0.28. The positive skewness in 1990, 2000, 2010 and 2019 indicated that the tail on the right side of the probability density function was longer or fatter than the left side. Overall, the statistical results of PC1 showed that land degradation accelerated from 1990 to 2000 and weakened from 2010 to 2019.



Furthermore, the correlation coefficient between each indicator and the SDI and that among the indicators are shown in Figure 3 (at the 0.01 level of significance). For four years, the SDI exhibited a high correlation with each single indicator. In general, the SDI exhibited a positive correlation with the SI and albedo and a negative correlation with the NDVI and LSM (Figure 3). The SI exhibited the highest correlation with the SDI, and the positive correlations were 0.971, 0.983, 0.973 and 0.972 in 1990, 2000, 2010 and 2019, respectively. The albedo exhibited the highest negative correlation with the SDI in 2000, and the correlation was 0.915. The NDVI exhibited the highest correlation with the SDI in 2019 (−0.885). The correlation coefficients between the LSM and SDI were greater than −0.72, and the highest correlation with the SDI was observed in 2000 (−0.911).


**Figure 3.** Correlations between pairs of the four selected indicators and their correlations with the SDI in different study years. SI: salinization index; NDVI: normalised difference vegetation index; LSM: land surface soil moisture index; SDI: salinized land degradation index. The blue and red colours represent negative and positive correlations, respectively (the darker the colour, the stronger the correlation).

#### *3.2. Spatiotemporal Changes in the Land Degradation*

To analyse the spatiotemporal characteristics of the land degradation during the different periods in the ADD, the SDI values were normalised by Equation (6) (range of 0 to 1). As the SDI approximates a normal distribution, we divided it into five categories by equal intervals to indicate the different land degradation levels [36,58]—namely, no degradation (0–0.2), slight degradation (0.20–0.4), moderate degradation (0.4–0.6), strong degradation (0.6–0.8) and extreme degradation (0.8–1). In summary, the land degradation level distribution was not uniform in space and varied over space and time. As shown in Figure 4, in terms of the land degradation level distribution, the extreme and strong degradation areas were clustered in the west and north of the ADD during the studied years. Areas with moderate degradation corresponded to a sporadic distribution in the middle and south of the ADD. Most areas in the middle of the ADD exhibited a slight degradation. The spatial distribution of the not-degraded areas in the study area showed a difference in the 4 years: in 1990 and 2000, the not-degraded areas were mainly distributed along the Amu Darya River, and a small portion appeared in the northwest and northeast corners of the ADD; in contrast, the not-degraded areas were mainly distributed in the middle of the study area in 2010 and 2019 and formed a "V" shape.

**Figure 4.** Spatial distribution of the land degradation levels in the ADD in each study year. Extreme: extreme degradation; Strong: strong degradation; Moderate: moderate degradation; Slight: slight degradation.

Figure 5 shows the percentage of the study area occupied by the five land degradation levels in 1990, 2000, 2010 and 2019. In general, the largest areas in the ADD were slight degradations in the years 1990–2019, which accounted for more than 26% of the total area covered by the Landsat images. From 1990 to 2000, the areas with extreme, strong and no degradation decreased, whereas the areas with slight and moderate degradations increased. From 2000 to 2010, the areas with extreme, strong and no degradation expanded; among which, the expansion of the no degradation regions was significant. In contrast, the areas

with slight and moderate degradations exhibited a decreasing trend; in particular, for the areas with a slight degradation, the dynamic degree was 7%. From 2010 to 2019, a small increase from 13.30% to 15.56% of the total area was observed in the area with moderate degradation, and the areas of the other four levels did not change considerably.

**Figure 5.** Proportion of land degradation levels in different study years. Extreme: extreme degradation; Strong: strong degradation; Moderate: moderate degradation; Slight: slight degradation.

Using a spatial analysis, the land degradation spatial distribution changes from 1990 to 2000, 2000 to 2010, 2010 to 2019 and 1990 to 2019 were mapped (Figure 6). We defined the figure elements as follows: development of land degradation across levels 1 or 2 corresponded to "Developed" (e.g., a change from "No degradation" to "Slight degradation" or "Moderate degradation"), a development across levels 3 or 4 corresponded to "Seriously developed" (e.g., a change from "No degradation" to "Strong degradation" or "Extreme degradation"), an improvement of the land degradation across levels 1 or 2 corresponded to "Improvement" (e.g., a change from "Extreme degradation" to "Strong degradation" or "Moderate degradation"), an improvement across levels 3 or 4 corresponded to "Significant improvement" (e.g., a change from "Extreme degradation" to "Slight degradation" or "No degradation"), and no change during the study periods corresponded to "Stable". The areas of land degradation dynamics for the four periods are shown in Table 3.


**Table 3.** Area and proportion of land degradation developed or an improvement of the ADD during different time periods.

**Figure 6.** Spatial distribution of land degradation developed or improvement in the ADD in different time periods.

Overall, the stable areas accounted for a large proportion (more than 42%) during the four study periods (Table 3), and these areas were mainly located west and southeast of the ADD. From 1990 to 2000, the improvement areas covered 3653.78 km2 (30.6%), and these areas were mainly clustered north of the ADD (Figure 6). Additionally, the developed areas covered 2687.53 km2 (22.5%), and these areas mainly occupied the south and middle regions of the ADD. A smaller proportion (1.4%) of significant improvement areas was observed in the northern part of the ADD. In comparison, the proportion of the seriously developed areas was smaller (1.1%), and these areas were mainly located in the northwest corner of the ADD. From 2000 to 2010, the developed areas covered 3728.76 km2 (31.3%), and the areas were mainly concentrated in the northern part of the study area. The improvement areas covered 2886.02 km2 (24.2%) from 2000 to 2010 and were mainly located in the eastern and central parts of the ADD. The areas of seriously developed and significant improvement exhibited smaller proportions—0.92% and 0.88%, respectively—and the seriously developed areas were mainly observed downstream of the ADD. Compared with those in 2000–2010, during 2010–2019, the areas with developed, and the improvements exhibited decreasing trends and occupied 20.5% and 19.7% of the total area, respectively. Moreover, these regions were mainly clustered in the downstream of the ADD. The seriously developed and significant improvement areas occupied less than 0.4% of the total area. From 1990 to 2019, the developed areas covered 2901.65 km2 (24.3%) and were mainly concentrated north of the ADD and in the downstream region of the Amu Darya River. The improvement regions, which occupied 27.8% of the total area, were mainly observed in the west and east of the ADD. The seriously developed regions, with an area of approximately 217.75 km2, were mainly concentrated in the north of the ADD, and the areas with significant improvement were smaller, accounting for only 0.9% of the total area.

#### *3.3. Spatial Autocorrelation Analysis of the SDI*

To further clarify the spatial and temporal variabilities in the land degradation, the spatial autocorrelation of the SDI was examined.

The mapping of the global spatial autocorrelation of the SDI is shown in Figure 7. Most of the SDI values are distributed in the first and third quadrants, with H–H and L–L clustered in the first and third quadrants, respectively, indicating a strong positive spatial correlation between the spatial units in these two quadrants. The Moran's I values in 1990, 2000, 2010 and 2019 were 0.888, 0.856, 0.891 and 0.851, respectively, which were high values greater than zero. The results showed that the SDI of the ADD exhibited significant spatial clustering, which indicated a strong positive spatial correlation. The Moran's I values of the SDI decreased, increased and later decreased again in 1990, 2000, 2010 and 2019, exhibiting an overall decreasing trend during the study period.

**Figure 7.** Moran scatter plot of the SDI in the ADD for each study year.

*3.4. Spatial and Temporal Changes in Land Use and Salinization*

Figure 8 displays the land use maps for 1990, 2000, 2010 and 2019. These land use maps indicated that croplands were the dominant land use type in the ADD. Grassland and forest were distributed in the north of the ADD and bare soil mainly in the edge and north of the ADD. Figure 8b displays the spatial variations in the land use types. The combinations with no land use type transformations, smaller conversion areas and built-up land transformations were merged into "Stable and others". From 1990 to 2010, the conversion of land use categories was mainly between grasslands and croplands. The conversion from grassland to cropland (GL to CL) was prominent in the northern part of the ADD, where the area of grassland decreased by 158.74 km2 and 338.93 km2 from 1990 to 2000 and 2000 to 2010, respectively (Table 4). However, the conversion of land use categories from 2010 to 2019 was mainly cropland to grassland (CL to GL), which was distributed in the northern and western parts of the ADD. The area of cropland decreased by 602.26 km2, and the area of grassland increased by 492.37 km2 during this period. In addition, there was partial degradation of grassland to bare soil (GL to BS) in the northern

part of the ADD from 2010 to 2019. The conversion of land use categories throughout the study period was mainly from grassland to cropland and cropland to grassland.

**Figure 8.** Land use change maps from 1990 to 2019. CL: cropland; GL: grassland; FR: forest; BS: bare soil. (**a**) land use maps for 1990, 2000, 2010 and 2019; (**b**) changes between different periods.


**Table 4.** Areas and percentages of different land use types from 1990 to 2019 in the ADD.

Figure 9 reveals the spatial distribution (Figure 9a) and variation (Figure 9b) of the SI over the study time period. The values of the SI were larger at the edges and north of the ADD and smaller in the centre. The SI in the northern part of the ADD decreased from 1990 to 2000, while it increased significantly from 2000 to 2010. From 2010 to 2019, the SI increased slightly in the central part of the ADD and decreased in the northeast. Throughout the study period (1990–2019), the SI increased significantly in the northern part of the ADD and decreased at the edges. To reveal the relationship between land use change and salinization, the mean SI values for each land use type (built-up land and water were excluded) were calculated in Figure 10. The mean value of the SI for the land use type was the largest in 2000 compared to the other three years, indicating higher soil salinization. The mean value of the SI for the land use type was the largest in 2000 compared to the other three years, indicating higher soil salinization, with a decreasing trend in the SI from 2000 to 2019. Within each year, bare soil had the largest mean value of the SI, followed by grassland and forest.

The dynamic characteristics of salinization during land use change were illustrated in Figure 11. The ΔSI represents the difference in the SI over the study period (the following year minus the previous year). We counted a percentage of areas with ΔSI < 0 and ΔSI < 0 over the course of each land use type transfer. The SI increased during all land use type changes from 1990 to 2000 (more than 50% of the area with ΔSI > 0), which was related to the maximum SI value in 2000 mentioned earlier. The 2000–2010 SI values decreased for BS-CL and GL-CL (ΔSI < 0 over 80% of the area). The area with ΔSI < 0 dominated the land use change process from 2010 to 2019. During the conversion of bare soil to cropland (BS-CL), forest (BS-FR) and grassland (BS-GL) from 1990 to 2019, the area with ΔSI < 0 exceeded 50%, indicating a decrease in the SI.

**Figure 9.** Spatial distribution of soil salinization from 1990 to 2019 (**a**) and spatial distribution of the salinity index (SI) changes between different periods (**b**).

**Figure 10.** Mean values of the salinity index (SI) for land use types over the study period.

**Figure 11.** The percentage of ΔSI area for the land use categories that changed during the study period. ΔSI: the salinity index (SI) of the next year minus the SI of the previous year. BS–CL: bare soil to cropland; BS–FR: bare soil to forest; CL–GL: cropland to grassland; FR–BS: forest to bare soil; GL–BS: grassland to bare soil; GL–CL: grassland to cropland; BS–GL: bare soil to grassland.

#### **4. Discussion**

#### *4.1. Effectiveness of the Proposed SDI*

In this study, we established a new SDI by integrating the SI, albedo, NDVI and LSM indices based on the PCA method. The SDI was used to explore the land degradation characteristics for a typical salinized area, i.e., the ADD. We found that the regions with extreme land degradation were mainly distributed downstream and at the periphery of the ADD (Figure 4). The SDI-based results supported previous findings that the ecological risks and vulnerabilities are higher in the downstream and peripheral regions of the ADD [10,46], thereby demonstrating that the SDI can reflect the land degradation conditions of the ADD. To further evaluate the effectiveness of the proposed SDI, we evaluated its effectiveness by field survey data (Figure S1 and Table S1). The relationship between the SDI and soil salt content shown in Figure 12 indicates that the soil salt content was significantly positively correlated with the SDI (R2 = 0.89, *p* < 0.001). The relationships between the four indices derived from Landsat imagery in 2019 and field-measured soil salinity are also presented in Figure S2. There was a positive correlation between the SI, albedo and measured soil salinity (R2 = 0.41 and 0.43, respectively) and a negative correlation between the NDVI, LSM and measured soil salinity (R2 = 0.29 and 0.19, respectively). In contrast to the single indices, the SDI showed a stronger agreement with the measured soil salinity. In general, although the land degradation is influenced by multiple aspects of the environment, this positive correlation suggests that the SDI can capture the salinity features pertaining to the land degradation, which provides potential evidence for the effectiveness of the index in monitoring the land degradation in salinized areas. In addition, the SI extracted from the remote sensing data exhibits a positive correlation with the SDI (Figure 3). The reliability of the SDI is also reflected in the other three indicators. High vegetation cover and sufficient soil moisture reduce the risk of land degradation, and this finding is supported by the negative correlation between the SDI and NDVI and LSM in our study (Figure 3). An increase in the albedo values leads to a higher SDI, which is related to the exposure

information represented by the albedo and may be attributed to a strong coupling between the soil salinity information and albedo (Table 1 and Figure 3). These characteristics have also been reported previously [33]. These results indicate that the SDI can help reliably and efficiently monitor the land degradation of salinized areas. In addition, the SDI is composed of accessible remote sensing indicators and can thus be extended to other similar ecological environments [35,59,63].

**Figure 12.** Relationship between the SDI and field-measured soil salt content.

#### *4.2. Factors Influencing the Land Degradation in the ADD*

Climate variables mainly affect the land degradation through changes in the precipitation and temperature [10,64]. The annual mean precipitation (AMP) and annual mean temperature (AMT) of Nurkus Weather Station are shown in Figure 13. It can be noted that the AMP decreased and AMT increased in the ADD in the past 40 years. From 1990 to 2019, land degradation developed over an area of more than 3000 km2 (Table 3). Previous studies have raised concerns regarding the withering of grasslands and sparse vegetation caused by warming and dry climates in the ADD, warning that these aspects could accelerate land degradation [10,65].

**Figure 13.** Changes in the annual mean precipitation and temperature measured at Nurkus Station from 1980 to 2016.

Amu Darya River is the main source of water for ADD living and irrigation. Overwatering has disrupted the water system of the ADD, leading to the disappearance of an amount of lakes, followed by local climate change, which has reduced the ecosystem stability of the ADD, particularly in the downstream region [7,10]. The widespread use of diffuse irrigation has caused individuals to compensate for inefficient irrigation by collecting large amounts of water, resulting in a reduction in the ecological water that sustains the ecosystems and the gradual withering of vegetation without sufficient water to support growth [9,66,67], exacerbating the land degradation. The extent of the land degradation followed the same trend as that of the water withdrawal from Amu Darya River. With the decline in the water withdrawal in 1990–2000 (Figure 14), the improvement in ADD land degradation was most pronounced, while, in 2000–2019, as the water withdrawal increased, the land degradation developed in a larger area than the improvement area. A more critical situation is that the reservoir built in the upper reaches of the Amu Darya River intercepted a large amount of the water [66], resulting in a decrease in the supply of ecological water downstream. The ecological effects caused by these factors were confirmed by our research: The land degradation downstream of the ADD was significantly degraded compared with the region in the study period (Figures 4 and 6). In addition, the higher levels of land degradation in the outer delta, farther from Amu Darya River, were likely caused by the lack of water supply to the ecosystem and the difficulties in the land management in the transboundary area [46,68]. This finding indicates that, to alleviate the ADD land degradation and the ecological crisis in the Aral Sea basin, further effort and cooperation is necessary in the rational allocation of water resources.

**Figure 14.** Annual changes in the water withdrawal and salt discharge in the ADD from 1990 to 2015.

In addition, the impacts of land-use changes on the land degradation cannot be ignored [10,69,70]. The consolidation and management of croplands can contribute to the mitigation of land degradation, and our research supports this perspective. We demonstrated that the area with no degradation occupied a larger proportion of cropland during most of the study period, while the area with extreme land degradation was mainly distributed on bare soil (Figure 15). In general, the risk of land degradation was reduced when land with sparse vegetation and bare soil was reclaimed as cropland, as crops contribute to higher ecosystem productivity and stability. Land degradation is more severe in the northern part of the ADD, where part of the cropland has been abandoned and converted into grassland or bare soil (Figure 8). Previous studies have indicated that the ADD is facing an ecological threat posed by the degradation of grasslands and croplands to bare soil [10,46]. Compared to the other land use categories, bare soil has a higher soil salinity

(Figure 10), and the conversion of land use types to bare soil not only reduces the biomass but also increases the risk of soil salinization.

**Figure 15.** Percentage graphs showing the proportion of each land degradation level in different land use types. Extreme: extreme degradation; Strong: strong degradation; Moderate: moderate degradation; Slight: slight degradation.CL: cropland; FR: forest; GL: grassland; BS: bare soil.

However, to reduce the salt content of croplands, a large amount of water is acquired from Amu Darya River to rinse the cropland soil, which further aggravates the water deficit of other ecosystems. In addition, the excess salinity from croplands is discharged by widely distributed channels to the Amu Darya River, as well as to the lakes downstream of the delta, resulting in increased salinity in the river water and a significant accumulation of salinity downstream (Figure 9) [10]. The increase in the discharge was the most significant after 2000 (Figure 14). The extreme land degradation distribution patterns were noted to be clustered downstream of the ADD (Figures 4 and 6). These ecological effects caused by the large accumulation of salt in the downstream region were confirmed by our research. In addition, with the disintegration of the Soviet Union, the socialist economy turned into a market economy, and the gradual influx of the rural population into cities led to certain croplands eventually transforming into unused lands with a low biodiversity [71], thereby accelerating land degradation.

The 15th initiative of the Sustainable Development Goals (SDGs) aims to achieve land degradation neutrality by 2030. To effectively alleviate land degradation of the ADD and promote the further realisation of SDGs, based on the abovementioned factors influencing the SDI, the following corresponding measures and countermeasures are proposed. First, salinization treatment technology should be implemented to alleviate the promotion of land degradation caused by salinity, and reservoirs for storing salt can be built to reduce the ecological pressure caused by the transportation of salt from the alkali drainage canal that discharges into the downstream area of the ADD. Second, a drip irrigation system can be promoted to achieve precision irrigation and enhance the irrigation efficiency to relieve

the pressure of water resources required to maintain the stability of the land degradation. Furthermore, farmers are encouraged to maintain the stability and biodiversity of croplands through agricultural subsidy policies. In addition, ecological conservation projects can be considered to mitigate the impacts of climate change on the land degradation.

#### **5. Conclusions**

We coupled multiple remote sensing indices (SI, NDVI, albedo and LSM) to construct a new SDI by using the PCA method. The proposed index integrated the soil salinity, soil bareness, soil moisture and vegetation coverage and made it possible to identify the characteristics of the regional land degradation, especially in salinized areas. To test the reliability of the SDI, the index was applied to the typical ADD region to monitor the spatial and temporal dynamics of the land degradation.

The results indicated that the NDVI and LSM adversely influenced the land degradation, while the SI and albedo had positive effects. The SI was strongly positive correlated with the SDI, with an average correlation coefficient of 0.97. Regions with extreme and strong land degradation were mostly clustered west and north of the ADD. The temporal and spatial dynamics of the SDI indicated that the land degradation in the ADD developed by approximately 26% (including seriously developed and developed areas) from 1990 to 2019, and the degradation was mainly concentrated in the downstream region of the ADD. The areas exhibiting improvement accounted for approximately 28% of the total area of the ADD and were mainly centred in the eastern and central parts. Among them, the area of land degradation developed from 2000 to 2010 was the largest (approximately 32%), while the improvement area was 25%. The results of spatial autocorrelation analysis showed that the SDI values of Moran's I in 1990, 2000, 2010 and 2019 were 0.89, 0.86, 0.89 and 0.85, respectively, which showed that the SDI was clearly clustered in space rather than randomly distributed.

The drying climate and excessive water withdrawal from the Amu Darya River exacerbated the land degradation in the ADD, especially in 2000–2019; as the water withdrawal increased, the land degradation developed into a larger area than the improvement area. The expansion of unused land increases the risk of land degradation, with higher levels of land degradation on unused land than on other types during the study period. In addition, a large amount of salt discharged from croplands downstream of the ADD results in the downstream being the most degraded area of land.

Clarifying the characteristics of land degradation of salinized areas is conducive to the restoration and promotion of sustainable terrestrial ecosystems. Our study revealed land degradation characteristics at the interannual scale of the ADD based on the SDI, which provided an efficient decision-making basis for regional land management. Nevertheless, some limitations still exist in this research. For example, the seasonal and continuous dynamics of land degradation have not been taken well into consideration due to the limited temporal resolution of the Landsat satellites. Constructing a SDI with the high temporal resolution MODerate resolution Imaging Spectroradiometer (MODIS) has the potential to enable the seasonal and continuous temporal monitoring of land degradation on a large scale, which is further work that deserves to be advanced.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/rs13152851/s1: Figure S1: Soil sampling sites of the ADD. Table S1: Field sampling data of soil salt.

**Author Contributions:** T.Y. and G.J. designed the research. T.Y. processed the data and wrote the manuscript. A.B. and G.Z. revised the manuscript. L.J., Y.Y. and X.H. provided the analysis tools and technical assistance. All authors contributed to the final version of the manuscript by proofreading and offering constructive comments. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDA19030301) and the Open Foundation of State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences (G2019-02-03).

**Data Availability Statement:** The data is available upon request.

**Acknowledgments:** We thank the journal's editors and reviewers for their kind comments and valuable suggestions to improve the quality of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Soil Salinity Assessment in Irrigated Paddy Fields of the Niger Valley Using a Four-Year Time Series of Sentinel-2 Satellite Images**

**Issaka Moussa 1,2, Christian Walter 1,\*, Didier Michot 1, Issifou Adam Boukary 3, Hervé Nicolas 1, Pascal Pichelin <sup>1</sup> and Yadji Guéro <sup>2</sup>**


Received: 31 August 2020; Accepted: 8 October 2020; Published: 16 October 2020

**Abstract:** Salinization is a major soil degradation threat in irrigated systems worldwide. Irrigated systems in the Niger River basin are also affected by salinity, but its spatial distribution and intensity are not currently known. The aim of this study was to develop a method to detect salt-affected soils in irrigated systems. Two complementary approaches were tested: salinity assessment of bare soils using a salinity index (SI) and monitoring of indirect effects of salinity on rice growth using temporal series of a vegetation index (NDVI). The study area was located south of Niamey (Niger) in two irrigated systems of rice paddy fields that cover 6.5 km2. We used remote-sensing and ground-truth data to relate vegetation behavior and reflectance to soil characteristics. We explored all existing Sentinel-2 images from January 2016 to December 2019 and selected cloud-free images on 157 dates that covered eight successive rice-growing seasons. In the dry season of 2019, we also sampled 44 rice fields, collecting 147 biomass samples and 180 topsoil samples from January to June. For each field and growing season, time-integrated NDVI (TI-NDVI) was estimated, and the SI was calculated for dates on which bare soil conditions (NDVI < 0.21) prevailed. Results showed that since there were few periods of bare soil, SI could not differentiate salinity classes. In contrast, the high temporal resolution of Sentinel-2 images enabled us to describe rice-growing conditions over time. In 2019, TI-NDVI and crop yields were strongly correlated (r = 0.77 with total biomass yield and 0.82 with grain yield), while soil electrical conductivity was negatively correlated with both TI-NDVI (r = −0.38) and crop yield (r = −0.23 with total biomass and r = −0.29 with grain yield). Considering the TI-NDVI data from 2016–2019, principal component analysis followed by ascending hierarchical classification identified a typology of five clusters with different patterns of TI-NDVI during the eight growing seasons. When applied to the entire study area, this classification clearly identified the extreme classes (i.e., areas with high or no salinity). Other classes with low TI-NDVI (i.e., during dry seasons) may be related to areas with moderate or seasonal soil salinity. Finally, the high temporal resolution of Sentinel-2 images enabled us to detect stresses on vegetation that occurred repeatedly over the growing seasons, which may be good indicators of soil constraints due to salinity in the context of the irrigated paddy systems of Niger. Further research will validate the ability of the method developed to detect moderate soil salinity constraints over large areas.

**Keywords:** salinization; irrigated systems; Niger River basin; salinity index; vegetation index; TI-NDVI; Sentinel-2 images; high temporal resolution

#### **1. Introduction**

Irrigated agriculture covers 275 million ha worldwide (i.e., 20% of cultivated land) and accounts for 40% of world food production [1]. In the semi-arid area of the middle Niger valley (Niger) (Figure 1), irrigation techniques have been developed to respond to aridity and an increasing population [2]. Three factors have contributed to the rapid development of irrigation: (i) successive droughts from 1972 to 1973 and 1983 to 1984, which made people aware of serious risks to rainfed production; (ii) the high yields rapidly obtained in irrigated rice (*Oryza sativa*) and vegetable production; and (iii) commitment of the national government, farmers' organizations and several donors to irrigated agriculture. The first irrigated systems in Niger were constructed during the colonial period in the 1930s. From 1934 to 2011, 36 irrigated rice-growing systems were built along the middle Niger valley in Niger. Irrigation has improved the country's food security but has led to serious soil degradation by salinity. Irrigation facilities in many of these irrigated systems have aged and have not been renovated, and their drainage systems are not functioning. Excessive irrigation of less drained soil leads to waterlogging and soil salinization. The absence of a drainage network in clayey soils, which is characteristic of the study area [3], increases the concentration and precipitation of soluble salts at the soil surface but also in the subsoil and groundwater. Salt precipitates appear as white spots, mainly in the bright red oxidized clay horizons and in association with yellow iron spots. This salinization process threatens the sustainability of crop production in the study area by decreasing yields or reducing them to zero. The mechanisms by which soil salinity affects plant growth are generally known and have been summarized by many researchers [4–6]. In the middle Niger valley, rice is the main crop grown in irrigated fields. Rice can tolerate salinity without a reduction in yield up to a threshold of 3.0 dS/m of electrical conductivity of saturated paste (ECe) [7]. To preserve soil as a natural resource and maintain sustainable crop production in the study area, the spatial extent and level of salinity must be known. This can be done by performing field surveys to measure soil EC or electrical resistivity. Due to the large size of the Niger River basin (>85 km2), however, doing so would require large amounts of time, labor and money. Using the potential of remote sensing along with other data sources is currently a promising method to map salinity at a large scale with high accuracy.

**Figure 1.** Location of the study area and aerial view of the experimental fields in the Sébéri and Tchagriré irrigation systems south of Niamey (Niger).

The use of remote sensing to monitor and map soil salinity dates to the 1990s [8]. Many researchers have used remote-sensing and ground-truth data (i.e., direct or indirect measurements of soil) to assess and monitor salinity [9,10]. A variety of remote-sensing data have been used to identify and monitor salt-affected areas: aerial photographs, visible and infrared multispectral images, video images, infrared thermography, microwave images and data collected by airborne geophysics and electromagnetic induction [11]. Two remote-sensing methods exist to identify soil salinity: (i) observing surface conditions of bare soil (i.e., salt efflorescence) and (ii) monitoring the behavior of vegetation affected by salinity over time [8]. Before the most recent generation of satellites was developed, many researchers used a classic single-date approach [12] because most older satellites passed over a given location too infrequently to enable monitoring. However, this single-date approach has some limits: although highly saline and non-saline soils can be detected easily, intermediate salinity classes are difficult to differentiate. Moreover, if vegetation is present, bare soil cannot be seen, and vegetation behavior cannot be monitored with a single-date approach.

The most recent generation of satellites (Sentinel-2, Landsat 8, SPOT 6 & 7) offers high spatial resolution and frequent passes over a given location, which makes identifying salinity variations and monitoring vegetation behavior appear possible. A variety of remote-sensing indices, such as the salinity index (SI) [13], brightness index, normalized difference salinity index and normalized difference vegetation index (NDVI) [14], have been developed to estimate the salinity of bare soil and monitor vegetation behavior in saline environments. These indices have been combined with ground-truth data. For instance, soil salinity was measured by electromagnetic induction and then combined with multi-year Landsat 7 reflectance data to map salt-affected soils in the western San Joaquin Valley, California, USA [9]. Electrical conductivity (EC) ground-truth measurements and various Sentinel-2 spectral parameters were used to create more reliable soil salinity maps in the Ebinur Lake region, Xinjiang, China [15]. To monitor salt-affected areas in Turkey, researchers performed multi-temporal monitoring of salinity using field EC measurements and spectral indices derived from multi-year Landsat 5 and 8 satellite images [16]. In practice, most studies have focused more on detecting severely salt-affected areas than on detecting and monitoring slightly or moderately affected areas.

This study aimed to develop a method to detect potential areas of soil salinity using multi-spectral and high-resolution Sentinel-2 satellite images combined with field data using two complementary approaches: (i) observing salinity of bare soil using the SI and (ii) monitoring vegetation behavior from 2016–2019 (eight growing seasons) in the arid zone of the Niger River basin. The study involved a four-year time series of Sentinel-2 remote-sensing images and field measurements of biomass and topsoil characteristics of cultivated rice fields.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area corresponded to the irrigated systems of Sébéri and Tchagriré (6.5 km2), located in the Niger River basin 50 km southeast of Niamey, the capital of Niger (13◦16 35.32 N, 2◦21 31.81 E) (Figure 1). The climate of the area corresponds to the dry tropical zone of the Sudano–Sahelian type. Annual precipitation has high spatial, temporal and interannual variability and a general trend towards a southward shift of isohyets over the past 30 years. Mean annual precipitation is ca. 510 mm/year (standard deviation (SD) = 100 mm/year) (National Meteorological Service of Niger). Precipitation is irregularly distributed in space and time: it peaks in August (150 mm) and is lowest in October (22 mm) and May (20 mm). Mean monthly temperature is 36 ◦C during the hottest period (April), with a maximum temperature of 47 ◦C. Minimum mean monthly temperatures are 25 ◦C and are observed from December to January. Evaporation varies from 1700 to 2100 mm/year. The water deficit is thus large during the dry season and is accentuated by the Harmattan, a dry continental trade wind from the Sahara. Consequently, the study area has two seasons: a dry season from October to May and a wet season from June to September [17].

Paddy fields in Sébéri and Tchagriré are located on the lowest alluvial terrace of the Niger Valley (Figure 1). The soils are Vertisols (60–74% clay in topsoil and 52–85% in subsoil) but also acidic, with a pH in water of 4.0–6.2 (SD = 0.5), an EC of 0.01–7.20 dS.m−<sup>1</sup> (SD = 2.3 dS.m−1), and they have low hydraulic conductivity at water saturation, which ranges from 2.8 <sup>×</sup> 10−<sup>8</sup> in the topsoil horizon to 1.5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> m s−<sup>1</sup> at a depth of 50 cm [18,19]. In Sébéri, EC generally increases from the areas next to the sand dunes to the river's floodwater protection dike [19]. The types of salts found are hexahydrite, gypsum, epsomite and, secondarily, wattevilleite and sodium carbonate hydrate [19].

The observed long-term downward trend in rice yields can be explained by several factors: poor management of agricultural equipment, lack of technical supervision, soil degradation due to salinity and failure to respect the cropping calendar. Implementing a fixed and common cropping calendar for all rice-growing systems would create two growing seasons per year during optimal climatic conditions. It provides for two harvests, one in the dry cropping season (mid-November to mid-May) and the other in the wet cropping season (mid-June to mid-December) (Figure 2).

**Figure 2.** Double-cropping calendar for rice in the middle Niger valley (adapted from [20]).

#### *2.2. Field Data Collection Strategy*

In the Niger River basin, rice is grown in 0.25 ha (25 m × 100 m) fields. We selected 44 and 20 fields in the Sébéri and Tchagriré irrigated systems, respectively, for data collection on two dates in the 2019 dry season, each of which corresponded to a date when the Sentinel-2 satellite passed over the study area. We chose the 64 fields based on the level of salinity in the study area and whether they were cultivated with rice or bare. Three sampling plots of 1 m<sup>2</sup> each were set up on one diagonal of each of the 0.25 ha fields selected to collect soil and phenological parameters.

The first data collection campaign was performed on 8 February 2019, near the start of the growing season, when fields were already flooded and covered by a layer of irrigation water a few centimeters thick. The phenology of rice plants was assessed in a quarter (0.25 m2) of each 1 m2 plot. Then, all aerial biomass of this quarter was cut, weighed and dried under laboratory conditions to determine dry matter. The topsoil (0–30 cm) was sampled in each plot using an auger, and one composite sample per field was obtained by carefully mixing the three elementary plot samples. Given the rapidity of auger sampling and the very low hydraulic conductivity of these clay soils, sampling under a water layer did not significantly change the EC and pH values of the soil, as shown in previous studies performed in these irrigated systems [19]. Consequently, salt losses during soil sampling are very low, ensuring good reliability of collected samples and good representativeness of soil salinity measurements. The composite samples were air-dried and ground to pass through a 2-mm sieve. Then, pH in water (pH1:5) and EC in water (EC1:5) were analyzed in the laboratory following ISO 10390 and ISO 11265, respectively.

The second data collection campaign was performed on 4 May 2019, during the harvest period, when fields were not flooded. Phenological assessment and soil sampling were identical to those

during the first campaign. Moreover, rice grain was collected from an undisturbed 0.25 m2 quarter to estimate grain yield. Soil and phenological parameters were analyzed statistically for each field.

#### *2.3. Remote-Sensing Data Collection*

Remote-sensing data and in situ observations were processed in multiple steps (Figure 3). First, optical images were obtained from Sentinel-2 satellites. The Sentinel-2 mission is a constellation of two satellites that are 180◦ out-of-phase on the same orbit, which increases the frequency of passes over a given location (i.e., every five days). Each satellite records 13 bands, with three spatial resolutions: blue, green, red and near-infrared bands at 10 m resolution; red-edge and mid-infrared bands at 20 m resolution; and aerosol and water-vapor bands at 60 m resolution. Sentinel-2 s utility thus lies in its high revisit frequency and high spatial resolution.

**Figure 3.** General data-processing flowchart.

All existing pre-processed Sentinel-2 images from January 2016 to December 2019 were downloaded. Pre-processing was performed by the MACCS (Multi-sensor Atmospheric Correction and Cloud Screening) processing chain, which consists of three successive steps: (i) cloud detection (using the satellite cirrus band), (ii) aerosol-thickness estimation and (iii) atmospheric correction. This processing chain, developed by CESBIO and CNES, provides ortho-rectified atmospheric-corrected surface reflectance images of 100 km × 100 km as a final product [21].

Overall, 157 preprocessed Sentinel-2 images were downloaded and then resampled at a spatial resolution of 10 m. Reflectance values that were missing due to cloud cover were estimated as the mean reflectance value of the two dates before and after the missing date. Finally, the study area of the irrigated systems of Sébéri and Tchagriré was extracted from the entire image.

A second step consisted of distinguishing the periods during which a plot was bare or covered with vegetation. To do this, the NDVI was first calculated per pixel (Table 1) [22], and then the mean and standard deviation of the NDVI of each plot's pixels was calculated. A pixel was considered to be part of a plot if its centroid fell within the plot's polygon. Since the plots measured 25 m × 100 m, each contained ca. 25 pixels. Based on the literature [23,24], plots with a mean NDVI ≤0.21 or >0.21 were considered as bare soil or vegetation, respectively.

**Table 1.** Characteristics and equations of the bare soil and vegetation indices selected. RED and NIR indicate red and near-infrared bands of the Sentinel-2 images.


Means and standard deviations of NDVI were used to create a 4-year time series of NDVI for each plot. For the periods when the plots were estimated to be bare, we calculated an SI [25] (Table 1). Otherwise, for the periods when vegetation was dominant, we created a time series of NDVI over the four years of study.

A few days before rice planting and during the first part of the vegetation development cycle, a water layer covers the soil. This layer is thin, no more than a few centimeters thick, and has high turbidity. The NDVI of this water layer ranges from 0.10–0.21 before planting and then decreases as the rice is transplanted and the vegetation canopy develops. In comparison, NDVI values of the river near the plots are negative and close to −0.3.

The last step consisted of calculating the time-integrated NDVI (TI-NDVI) [26,27] for each growing season for fields with vegetation. TI-NDVI for a given growing season was calculated as follows:

$$TI - NDVI = \sum\_{t=d1}^{d\epsilon} \left[ \frac{1}{n} \sum\_{t=1}^{n} \left( \left( \frac{NDVI\_{i,t} + NDVI\_{i,t+1}}{2} \right) - 0.21 \right) \times \left( d(t+1) - d(t) \right) \right] \tag{1}$$

where *d*1 and *de* are the day of the year of the start and end (harvest) of the growing season (the same for all fields), respectively; *d(t)* is the day of the year of date *t* for the set of days for which Sentinel-2 images are available during the growing season; *n* is the number of pixels within the field; and *NDVIi*,*<sup>t</sup>* is the NDVI of pixel *i* on date *t*. Units of TI-NDVI are expressed as NDVI.days.

Figure 4 illustrates the calculation of TI-NDVI for two successive growing seasons in 2019 derived from NDVI estimates based on existing Sentinel-2 images.

**Figure 4.** Example of normalized difference vegetation index (NDVI) dynamics for a given field for the dry and wet seasons of 2019 (derived from 42 cloud-free images among 60 existing Sentinel-2 images) and time-integrated NDVI (TI-NDVI) calculation for the two growing seasons.

#### *2.4. Multidimensional Analysis of the Data*

Principal component analysis (PCA) was performed using the FactoMiner package in R software [28] to analyze the multidimensional space of the remote-sensing and field data from 2019. Field data were pH and EC at the start and end of the growing seasons, aerial biomass and grain yields of the 64 fields. Qualitative variables were added: pH class, salinity class, land use and position in relation to the direction of river flow. Remote-sensing data were the mean SI of each field, TI-NDVI of each of the eight growing seasons and the number of seasons that each field was used to produce rice in dry and wet seasons. Remote-sensing data were considered as active variables and field data as illustrative variables.

PCA was followed by hierarchical clustering analysis (HCA) using the module HCPC (hierarchical clustering on principal components) [29] in R software [30] to create clusters of fields that behaved similarly. Finally, supervised classification for grids (SCG) was applied to 10-m resolution grids of the entire area that described the remote-sensing variables used in the PCA-HCA analysis (i.e., 8 grids with the TI-NDVI of each pixel for the 8 seasons and 1 grid with the SI of each pixel) to represent the spatial distribution of the clusters. Each pixel is assigned to the cluster with the shortest Mahalanobis distance, and the distance to the nearest cluster is analyzed to assess the quality of assignment to a cluster. SAGA-GIS 2.3.2 [31], implemented in QGIS 3.4.3, was used to perform the SCG with a maximum-likelihood algorithm.

#### **3. Results**

#### *3.1. NDVI Dynamics over the Eight Growing Seasons*

Figure 5 shows the dynamics of mean NDVI estimated from the 25 pixels within each of the three fields. The threshold of NDVI = 0.21 was selected to identify periods without live vegetation, and it enabled the identification of the short intercropping period after harvest when field irrigation was stopped, the soil plowed and irrigation started again before transplanting the next rice crop.

**Figure 5.** Example of dynamics of mean NDVI for three fields over eight growing seasons from January 2016 to December 2019: (**a**) Field CP3\_1, cultivated in season 2 but not in the other seasons; (**b**) Field CP11-1, always cultivated but with lower NDVI in dry seasons than wet seasons; and (**c**) Field TVP7, always cultivated and with high NDVI in both dry and wet seasons. For each field, mean NDVI and its 95% confidence interval were estimated from the field's ca. 25 pixels.

Non-cultivated fields had NDVI dynamics below the threshold or slightly above due to weed growth (Figure 5a). When cultivated during a growing season, some fields had lower NDVI due to constrained plant growth, which could have been due in part to soil salinity. Among the fields cultivated during both dry and wet seasons, two groups of cultivated fields were identified: (i) those with lower NDVI during the dry seasons than wet seasons (Figure 5b) and (ii) those with smaller differences between dry and wet seasons and generally high NDVI (Figure 5c).

#### *3.2. Spatial Variation in TI-NDVI over the Eight Growing Seasons*

For the 8 seasons and 44 irrigated fields monitored in Sébéri, TI-NDVI was always significantly lower during dry seasons than wet seasons (Figure 6a): mean TI-NDVI per season ranged from 15 to 19 NDVI.days during the four dry seasons and 29 to 34 NDVI.days during the four wet seasons. TI-NDVI also varied greatly among fields for a given season and showed systematic trends, with values frequently lower in some fields in the south and northwest of the irrigation system and significantly higher in fields in the center (Figure 6b). This spatial variation was lower in some seasons (e.g., seasons 2 and 6) but higher in others, e.g., season 7 in 2019, when field data were collected.

**Figure 6.** Temporal and spatial variations in the time-integrated normalized difference vegetation index (TI-NDVI) for the 44 fields of the Sébéri irrigation system: (**a**) mean and standard deviation of TI-NDVI for each season; (**b**) maps of TI-NDVI estimated for the eight growing seasons.

#### *3.3. Salinity Measured in the Field in 2019*

For the 64 fields monitored in 2019 during the dry season, EC1:5 ranged from 0.01 to 5.36 dS/m at the start of the season and 0.44 to 6.19 dS/m at the end of the season (SD = 1.16 and 1.33 dS/m, respectively) (Table 2). The pH ranged from 5.52 to 6.68 in January and 5.29–6.25 in June (SD = 0.44 and 0.52, respectively). Differences between means and SDs for the two dates were not significant for EC1:5 or pH. Total aerial biomass and grain yield at harvest varied greatly (Table 2), and 18 fields had no rice production.

**Table 2.** Statistics of field data for the dry growing season in 2019 (i.e., season 7) (n = 64, including 18 non-cultivated fields).


#### *3.4. Correlation between Remote-Sensing Data and Field Data during the 2019 Dry Season*

A strong and significant (*p* < 0.05) positive correlation was observed between TI-NDVI in season 7 and both total aerial biomass (r <sup>=</sup> 0.77, *<sup>p</sup>* <sup>&</sup>lt; <sup>2</sup> <sup>×</sup> <sup>10</sup><sup>−</sup>14) and grain yield (r <sup>=</sup> 0.82, *<sup>p</sup>* <sup>&</sup>lt; 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>16) (Table 3), which confirms that TI-NDVI is a good indicator of rice vegetation growth and its final yield. Soil EC1:5 was negatively correlated with TI-NDVI at the start and end of the season (r = −0.38, *p* = 0.002 for the start, *p* = 0.002 for the end), with total biomass (r = −0.23, *p* = 0.062) and, significantly, with grain yield (r = −0.29, *p* = 0.019).

**Table 3.** Pearson correlations between the TI-NDVI indicator derived from Sentinel-2-images and soil (EC1:5, pH1:5) and crop indicators (total biomass, grain yield). Data were collected for 64 fields during the dry season of 2019 (i.e., season 7). SS and ES indicate the start and end of the growing season, respectively. Bold values indicate significant (*p* < 0.05) correlations.


#### *3.5. PCA and HCA Analysis of Remote-Sensing Data over the Eight Growing Seasons*

The first axis of the PCA was positively correlated with all of the TI-NDVI variables (Figure 7), which indicates that the spatial variability in TI-NDVI among fields was the main factor that influenced variations in the dimensional space. The second axis was correlated with SI but also distinguished the TI-NDVI estimates of the dry seasons (1, 3, 5 and 7) from those of the wet seasons (2, 4, 6 and 8).

While soil pH was little represented in the first PCA plane, soil EC1:5 measured at the start and end of season 7 was negatively correlated with the first axis and thus with TI-NDVI variables. More surprisingly, neither soil EC1:5 variable was correlated with SI. Following PCA, HCA identified five clusters of fields that behaved similarly during the eight growing seasons according to the remote-sensing data (Figure 8).

**Figure 7.** First plane of principal component analysis with variables derived from remote sensing as active variables (black) and soil and vegetation variables as illustrative variables (blue). Abbreviations: TI-NDVI\_SN: time-integrated normalized difference vegetation index for season N; Mean\_SI: mean salinity index; biom\_yield: biomass yield at harvest; grain yield: grain yield at harvest; Soil EC\_start\_s: soil electrical conductivity at the start of the season; soil EC\_end\_s: soil electrical conductivity at the end of the season; soil pH\_start\_s: soil pH at the start of the season; Soil pH\_end\_season: soil pH at the end of the season; Nbr\_dry\_s: number of dry growing seasons during the time series; and Nbr\_wt\_s: number of wet growing seasons during the time series.

**Figure 8.** Representation on the first plane of principal component analysis (PCA) of the five clusters of fields defined by ascending hierarchical classification applied to the first three axes of the PCA. Points are labeled with field codes starting with S or T respectively for Seberi or Tchagrire irrigation system, while squares indicate barycenters of clusters.

#### *3.6. Description of the Field Clusters*

The five clusters differed in their remote-sensing and field data (Table 4):


**Table 4.** Mean (and standard deviation) per cluster of fields defined by ascending hierarchical clustering for the variables derived from Sentinel-2 images (TI-NDVI, SI) and from field data collected in 2019 (EC1:5, pH, total biomass, grain yield). For each variable, different letters indicate significant differences in the mean among clusters according to a *p* < 0.05 Tukey test at a 95% confidence level.


### *3.7. Mapping the Clusters over the Entire Study Area*

Using the eight grids with the TI-NDVI of each pixel for the eight seasons and the grid with the SI of each pixel, SCG enabled each pixel to be attached to one of the five clusters defined from the 64 study fields considered as training areas. Figure 9 shows the distribution of the five clusters over the entire study area, with the associated distance to the nearest cluster, which indicates when the cluster represents a given pixel well (i.e., small distance) or poorly (i.e., large distance). Clusters 1 and 2, both of which had low TI-NDVI during the dry seasons since they were often not cultivated, were located mainly in the north and south of the Sébéri system (Figure 9). Cluster 5, with the highest TI-NDVI in both dry and wet seasons, predominated in the north of the Tchagriré system and the center of the Sébéri system. Cluster 3, with moderate TI-NDVI in both seasons, represented large areas in Sébéri in intermediate positions between clusters 1 or 2 and 5, but also in non-agricultural areas (e.g., paths, natural areas), which had large distances to the nearest cluster. Cluster 4 occupied small areas in Sébéri and the south of Tchagriré, often near fields of cluster 2.

**Figure 9.** Mapping of the five clusters over the entire study area of the irrigated systems of Sébéri and Tchagriré using supervised classification that assigned each pixel to the nearest cluster.

#### **4. Discussion**

This study continuously monitored spatio-temporal dynamics of SI and NDVI in paddy fields using Sentinel-2 satellite images over eight growing seasons from 2016 to 2019. Based on field data collected (biomass, EC and pH) during the 2019 dry growing season, a relation between spectral indices (NDVI and SI) and field data was established to understand the behavior of rice crops and relate the spatio-temporal variation and pattern of spectral indices to salinity.

#### *4.1. Variation in Spectral Indices among Crop Seasons*

In the 2016–2019 time series, the spectral indices used (SI and NDVI) varied in different ways. First, SI was estimated at dates when bare soils prevailed according to an NDVI threshold (NDVI <0.21). SI averaged over the 4 years of study varied in a narrow range and was weakly correlated with soil EC1:5; only highly saline areas, generally not cultivated, had higher SI values, due to the presence of salt crusts at the surface. These results may be explained by the short periods during which bare soil can be observed in irrigated paddy field systems; they generally last less than a month between successive crops, which corresponded to 3–5 dates when Sentinel-2 images were available. Thus, only a few dates when SI could be estimated were available. The soil may also be covered by crop residues or change drastically in water content due to the stopping or starting of irrigation after harvest or before preparatory work for planting rice. These factors may interfere with the estimation of SI and limit its ability to distinguish soil salinity.

In contrast, NDVI dynamics could be followed at a fine temporal resolution since cloud-free images were available at 133 dates during the eight growing seasons, with a mean of 21 dates per season since the end of 2017. We observed significant differences in TI-NDVI among the fields and eight growing seasons. TI-NDVI, which differed greatly between fields in a given season and over time, was used to differentiate areas with constraints to vegetation growth in the irrigated systems. In a given season, non-cultivated or cultivated areas with constraints had low TI-NDVI in dry and wet

seasons (clusters 1 and 2) (Table 4). Areas cultivated throughout the year with few constraints had high TI-NDVI in wet seasons and moderate TI-NDVI in dry seasons (cluster 3), while zones cultivated in dry and wet seasons without any constraints had high TI-NDVI in both seasons (cluster 5) (Figure 8). Nonetheless, there were areas near the main drainage channel with salinity constraints and moderate TI-NDVI that were cultivated only in wet growing seasons. They may have occurred because they are always wet and wild grasses grow there in the dry seasons, which influences the NDVI. TI-NDVI also differed significantly between dry and wet seasons (Figure 5a,b). Wet growing seasons had higher mean TI-NDVI than dry ones. TI-NDVI varied from 29 to 34 NDVI.days in wet seasons and 15 to 19 NDVI.days in dry seasons. This result is due to the climatic conditions in the study area in dry seasons (hot, dry wind, high temperature) that limit crop growth. In addition, high temperature favors the rise of salt from lower horizons to the rooting zone of crops, which damages their roots. In wet seasons, conditions are more favorable for crop growth. Over the time series, wet and dry seasons varied among years, again due to climatic conditions (temperature in dry seasons and precipitation in wet seasons).

#### *4.2. Temporal and Spatial Patterns of NDVI*

Soil salinity influences vegetation density and crop growth, as explained by NDVI. TI-NDVI was used in this study to indicate constraints to crop growth and density, as in previous studies [32]. Five clusters were obtained to differentiate areas with constraints to rice growth in the two irrigated systems. Non-cultivated areas had the lowest TI-NDVI and were considered likely to be areas with high salinity constraints (cluster 1) (Figure 9). Areas cultivated only in wet seasons because of constraints had low TI-NDVI and may be considered to have moderate salinity constraints that limit crop growth (clusters 2 and 4) (Figure 9). Areas cultivated in both seasons with few constraints had high TI-NDVI and may be considered to have few salinity constraints (cluster 3) (Figure 9). Finally, areas cultivated without constraints had the highest TI-NDVI (cluster 5) (Figure 9) and were considered zones without salinity constraints. Our results show that the spatial pattern of TI-NDVI corresponds to the spatial distribution of problematic patches in the study area. The main constraint may be salinity, while other constraints could be explained by soil type, microtopography and farming practices. Mean ECe measured during ground truthing was calculated for each cluster and assigned to the respective TI-NDVI of the clusters to validate the four salinity classes obtained from the classification of TI-NDVI. Based on EC measurements, visual observations and knowledge of the terrain [33], mean TI-NDVI of the clusters were classified into four classes of soil salinity [34]: non-saline, slightly, moderately and very saline (i.e., ECe < 2, 2–4, 4–8 and 8–16 dS/m, respectively). Measured soil EC alone could not explain the level of salinity of the clusters (Table 4); however, integrated interpretation using the EC, mean TI-NDVI, total biomass and grain yield with visual observations of the study area could be used. Cluster 1, consisting of abandoned fields with low mean TI-NDVI in dry and wet seasons, no rice biomass or grain yield, maximum EC of 5.36 dS/m<sup>2</sup> and white efflorescence on the soil surface, can be considered saline soil. Clusters 2 and 4, consisting of fields cultivated only in wet seasons and rarely in dry seasons, despite having a lower level of surface salinity, have low TI-NDVI in dry seasons, low biomass and grain yield in the few fields cultivated in dry seasons and no yield in non-cultivated fields. Salt efflorescence is present in these fields. The maximum EC was 2.59 dS/m2, which may not reflect the true EC of the clusters. These two clusters can be considered moderately saline. Clusters 3 and 5 consisted of cultivated fields in both dry and wet seasons. Based on measured EC, they were classified as non-saline, but since cluster 3 has lower TI-NDVI, biomass and grain yields than cluster 5, cluster 3 has few salinity constraints. Thus, cluster 3 can be considered slightly saline and cluster 5 non-saline.

#### *4.3. Field EC Variation and Salinity*

In the Sébéri irrigated system, EC generally increased from the areas next to the sand dunes to the river's floodwater protection dike, perhaps because the dike has modified the functioning of the soils next to it [18]. In the Tchagriré irrigated system, EC generally increased from the areas next to the river to the main drainage ditch, likely due to topography. Areas at lower elevations had higher EC than those at higher elevations. Overall, the pattern of EC followed those of TI-NDVI, biomass and grain yield. EC measured in the two systems does not reflect the real situation in the field. Areas that showed signs of salinity (i.e., abandoned fields, low NDVI, low yields) had low measured EC, which indicates that a salt stock may lie in the lower horizons. To map salinity in this situation, measured EC should be compared to other maps (e.g., TI-NDVI, yield, soil, elevation) of the same site with similar sampling patterns or resolution. Doing so may provide useful insights into other parameters that could explain salinity.

#### **5. Conclusions**

This study developed a step-by-step method to estimate constraints on the growth of rice, the most important of which is salinity. Dense time series of Sentinel-2 images over eight growing seasons enabled us to describe the behavior of rice biomass and to differentiate fields where rice is subjected to stress during growing seasons. In irrigated systems, periods of bare soil were brief, and the SI derived from Sentinel-2 images could not differentiate soil salinity of fields. Monitoring vegetation behavior over four years by deriving the NDVI from Sentinel-2 images and calculating the TI-NDVI was able to differentiate fields based on constraints that limit rice growth. Several constraints can occur in areas subjected to stresses but can be related locally to soil salinity and verified by field sampling, which can be guided by TI-NDVI classification. This approach is particularly adapted to irrigated rice systems in which monoculture prevails and differences in TI-NDVI are not caused by different crops.

**Author Contributions:** I.M., C.W. and D.M. conceived the topic and developed the method. P.P. performed the technical work of downloading and processing images and calculating spectral indices. I.M., C.W. and D.M. wrote the manuscript, which was improved by H.N., Y.G. and I.A.B. participated actively in the fieldwork. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was made possible thanks to financial support of the EU-funded Agrocampus ERASMUS+ cooperation project in Rennes, France. This program financed a one-year's internship for M.I. and his supervision.

**Acknowledgments:** This study was performed with the administrative and technical support of the soil science laboratory from UMR SAS, INRAE and Institut Agro.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Framework for Accounting Reference Levels for REDD+ in Tropical Forests: Case Study from Xishuangbanna, China**

**Guifang Liu 1,2, Yafei Feng 1, Menglin Xia 1, Heli Lu 1,2,3,\*, Ruimin Guan 1, Kazuhiro Harada <sup>4</sup> and Chuanrong Zhang <sup>5</sup>**


**Abstract:** The United Nations' expanded program for Reducing Emissions from Deforestation and Forest Degradation (REDD+) aims to mobilize capital from developed countries in order to reduce emissions from these sources while enhancing the removal of greenhouse gases (GHGs) by forests. To achieve this goal, an agreement between the Parties on reference levels (RLs) is critical. RLs have profound implications for the effectiveness of the program, its cost efficiency, and the distribution of REDD+ financing among countries. In this paper, we introduce a methodological framework for setting RLs for REDD+ applications in tropical forests in Xishuangbanna, China, by coupling the Good Practice Guidance on Land Use, Land Use Change, and Forestry of the Intergovernmental Panel on Climate Change and land use scenario modeling. We used two methods to verify the accuracy for the reliability of land classification. Firstly the accuracy reached 84.43%, 85.35%, and 82.68% in 1990, 2000, and 2010, respectively, based on high spatial resolution image by building a hybrid matrix. Then especially, the 2010 Globeland30 data was used as the standard to verify the forest land accuracy and the extraction accuracy reached 86.92% and 83.66% for area and location, respectively. Based on the historical land use maps, we identified that rubber plantations are the main contributor to forest loss in the region. Furthermore, in the business-as-usual scenario for the RLs, Xishuangbanna will lose 158,535 ha (158,535 <sup>×</sup> <sup>10</sup><sup>4</sup> m2) of forest area in next 20 years, resulting in approximately 0.23 million t (0.23 <sup>×</sup> <sup>10</sup><sup>9</sup> kg) CO2e emissions per year. Our framework can potentially increase the effectiveness of the REDD+ program in Xishuangbanna by accounting for a wider range of forest-controlled GHGs.

**Keywords:** reference levels; REDD+; greenhouse gas emissions; Xishuangbanna; monitoring and reporting

#### **1. Introduction**

Forests account for almost half of the global terrestrial carbon pool, and the vegetation within them alone (excluding soils) holds approximately 75% of all living carbon. The total carbon content in forest ecosystems is estimated to be 638 Gt [1–5]. Tropical forests play a particularly important role in the global carbon budget because they contain as much carbon in their vegetation and soils as all the temperate-zone and boreal forests combined [6–12]. Per unit area, tropical forests store, on average, approximately 50%

**Citation:** Liu, G.; Feng, Y.; Xia, M.; Lu, H.; Guan, R.; Harada, K.; Zhang, C. Framework for Accounting Reference Levels for REDD+ in Tropical Forests: Case Study from Xishuangbanna, China. *Remote Sens.* **2021**, *13*, 416. https://doi.org/ 10.3390/rs13030416

Received: 30 December 2020 Accepted: 20 January 2021 Published: 26 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

more carbon than their nontropical counterparts. Scientists agree that to achieve the goals of the United Nation's Framework Convention on Climate Change (UNFCCC), namely avoiding irreversible damage to the climate system, global warming must not exceed 2 ◦C [13–16]. However, concentrations of CO2 in the atmosphere are already so high that global emissions will likely peak before they start to decline. Thus, in order to remain under the above-mentioned threshold, emissions from all major sources (i.e., from developed countries, major developing country emitters, and deforestation) must begin to decline within the next decade [17–19].

The Conference of the Parties (COP) agreed that Reducing Emissions from Deforestation and Forest Degradation (REDD+) with the enhancement of the removal of greenhouse gas (GHG) emissions by forests in developing countries could support the goals of the framework through the positive incentives provided by the UNFCCC. The general consensus at Doha in 2012 following last year's COP17 was that the results of financing, safeguards, measurements, and reporting and verification for REDD+ were mixed. In addition, significant progress has been widely recognized as having been made only within the technical arena relating to reference levels (RLs). The 19th Conference of the Parties to the UNFCCC (COP19) and the 9th Conference of the Parties to the Kyoto Protocol (CMP9) were jointly held on the topic of REDD+ funding in Warsaw, Poland, and in-depth discussions on the action points were conducted.

GHG-based compensation for REDD+ requires an agreement on emission RLs. Key elements for setting these RLs include the ability to measure changes throughout all forested areas, the use of consistent methodologies at repeated intervals to obtain accurate results, and the verification of results with ground-based or very high-resolution observations [20–24]. RLs have profound implications for the effectiveness of climate-related policies, cost efficiency, and distribution of REDD+ financing, and they involve a number of tradeoffs [25–30]. In this paper, referring to the business-as-usual scenario, we introduce a methodological framework for setting RLs for REDD+ applications in tropical forests in Xishuangbanna, China, by coupling the Good Practice Guidance (GPG) on Land Use, Land Use Change, and Forestry published by the Intergovernmental Panel on Climate Change (IPCC) and land use scenario modeling. This study contributes to the literature by highlighting key challenges for setting RLs as part of the REDD+ program.

#### **2. Data and Methodology**

#### *2.1. Research Area*

Not only is the forest in the Xishuangbanna region (Figure 1) the world's largest preserved area located in the northernmost part of the Earth, but it is home to the majority of tropical forest ecosystems in China as well. The geology, climate, and soil of Xishuangbanna are suitable for the growth and reproduction of various organisms. Moreover, 4500 species of higher plants have been recorded in Xishuangbanna, accounting for about one-seventh of the total number of higher plants in China. The native vegetation types include those found in tropical rain forests, montane rain forests, tropical monsoon forests, subtropical evergreen broad-leaved forests, deciduous broad-leaved forests, warm coniferous forests, and bamboo forests as well as shrubs and grasses [31–34]. In recent years, due to the increase in the population, intensification of anthropogenic activities, the enabling climate, and suitable terrain conditions in the area, the cultivation of rubber and other tropical and economically important crops has risen rapidly. Thus, the changes to the forest have been very dramatic.

**Figure 1.** Landscape of the Xishuangbanna, China.

#### *2.2. IPCC's Good Practice Guidance*

The IPCC's existing GPG for Land Use, Land Use Change, and Forestry provides the recommended approach to account for fluctuations in carbon stocks resulting from changes in the use and management of forests. This framework has been accepted by all Parties in the Bali Action Plan of COP13 [35–37]. The IPCC's GPG framework refers to two basic inputs for forest carbon accounting, namely activity data and emission factors. Activity data in the REDD+ context refer to the areal extent of emissions. For example, in the context of deforestation, activity data refer to the area of deforestation, presented in hectares (104 m2) over a known time period. Emission factors refer to the emission or removal of GHGs per unit activity. The emission or removal of GHGs resulting from land use conversion ultimately alters ecosystem carbon stocks.

#### 2.2.1. Emissions Factors

To estimate emissions factors, the required number of sample plots was determined to the necessary accuracy using the size of the forest area and other available resources. Provisional surveys and/or existing data can be utilized to establish sample sizes, and tools also exist to calculate sample sizes based on fixed precision levels or given fixed inventory costs [38–41]. In the event carbon stocks and flows are to be monitored over the long term, permanent sites should be considered in order to reduce between-site variability and to capture actual trends as opposed to short-term fluctuations [42].

In the study, the aboveground biomass density map was sourced from Global Forest Watch (http://www.globalforestwatch.org/). This map is a global aboveground biomass density map produced in 2000 according to the method devised by Baccini [43]. Based on the improved methodology, the resolution can be increased to as much as 30 m. The aboveground biomass density map of Xishuangbanna region was extracted using the mask extraction method.

More recently, Maurizio Santoro [44] have proposed an integration methodology for estimation of aboveground biomass density for around the year 2010 by combining SAR, LiDAR, and optical observations together with other datasets such as auxiliary datasets

from forest inventories, additional remote sensing observations, climate variables, and ecosystems classifications. We compared with the latest biomass map developed by Santoro against Global Forest Watch product.

#### 2.2.2. Activity Data

Estimation of activities associated with national-level deforestation monitoring is practically possible only via remote sensing [45–49]. Since the early 1990s, changes in forest area have been monitored from space with confidence. Some countries have had well-established operational systems for over a decade.

Taking into account the availability of the data and their matching, this study used the Enhanced Thematic Mapper Plus/Thematic Mapper (ETM+/TM) remote sensing images related to Path number 130 and Row number 044, Path number 131 and Row number 045, Path number 130 and Row number 045, Path number 129 and Row number 045 in the study area in 1990, 2000, and 2010 to interpret land use changes(Table 1). The TM/ETM+ remote sensing images and normalized difference vegetation index (NDVI) data were sourced from the US Geological Survey (USGS, http://earthexplorer.usgs.gov/). We mosaiced the TM/ETM+ remote sensing images of the same year, used ENVI to perform geometric correction and radiometric correction, converted all the map data projections to WGS84/UTM Zone47N (EPSG: 32647), used the Xishuangbanna administrative vector map for mask extraction, and performed cropping to obtain the images of Xishuangbanna.

**Table 1.** Landsat imagery used in this study.


The vegetation in the study area changes obviously with the seasons, and the NDVI values at different times, thus, have a greater influence on the research results. Thus, this study synthesized the maximum value of the NDVI data in multiple phases of the same year and also eliminated the influence of cloud cover on the research results.

According to the characteristics of land use cover in the study area (Table 2), we divided the land use cover into eight types: forestland, shrubland, grassland, cultivated land, rubber forest, tea gardens, construction land, and water. The training samples were determined using QuickBird images in Google Earth. The terrain of the study area is relatively complex, and many "homogeneous spectrum" phenomena occur in the interpretation of remote sensing images. To avoid this phenomenon, it is necessary to select as many training samples as possible. Different band combinations of Landsat7 ETM images have different characteristics. The selection of the training samples was carried out according to these characteristics. The ETM541 band combination is helpful for distinguishing different vegetation types when supplemented by NDVI data. The training samples of natural forests, shrubs, rubber plantations, and tea gardens were selected. The ETM453 band combination was used to select the cultivated land and water, while the construction land was extracted through the ETM743 band combination, and the remainder was categorized as other land. Supervised classification was performed using the selected training samples to obtain the preliminary classification results, and the accuracy test was conducted. If the results did not agree, the training samples were reselected, and the supervised classification and accuracy tests were reperformed until they were ideal. Finally, the classification results were recoded, clustered, and eliminated, and the broken patches were merged into the adjacent largest classification to unify the smallest unit.


**Table 2.** Land use classification followed for this study.

The Land Use Dynamic Index considers the transfer of land use types during the study period, and reflects the intensity of regional land use changes during this time. It is essential to find hot spots of land use changes at different spatial scales. It is one of the important parameters to analyze the dynamic changes in land use space [50]. Equation (1) was used to calculate the index.

$$K\_i = \frac{\mathcal{U}\_{t\_1} - \mathcal{U}\_{t\_2}}{\mathcal{U}\_{t\_1}} \times \frac{1}{t\_2 - t\_1} \times 100\% \tag{1}$$

*Ki* is the land use dynamic degree for land use type *i* in a certain period of time, *Ut*<sup>1</sup> and *Ut*<sup>2</sup> are the number of certain land use types at the start of the period *t*<sup>1</sup> and its end *t*2, respectively, and *t*<sup>2</sup> − *t*<sup>1</sup> is the research duration.

$$S = \left[ \sum\_{i=1}^{n} \left( \frac{\Delta S\_{i-j}}{S\_i} \right) \right] \times 100 \times \frac{1}{t} \times 100\% \tag{2}$$

*S* is the comprehensive land use dynamic degree in the study area corresponding to *t* time period. Δ*Si*−*<sup>j</sup>* is the area of land use type converted *i* converted to other land use types in the study period; *Si* is the area of type *i* land use type at the beginning of the study; *t* is the time period of land use change.

#### *2.3. Land Use Scenario Modeling for Reference Levels*

Land use simulation is based on years of known land use changes. It predicts future land use changes. Most land use models used to simulate the process of land use change typically need to solve two problems: the quantity problem and the distribution problem. The quantity problem refers to how much of the land area has changed, while the distribution problem involves pinpointing where those land changes occurred. This study applied the Land Change Modeler (LCM) [51–53], which uses the Markov chain model to predict the number of future land use changes, and then calculates the distribution location of these changes according to the Multilayer Perceptron (MLP) model.

Markov chain is a kind of "no after-effect" random stored procedure, as it assumes that the state of the current variable is only related to its previous state, not to its states at other moments. Therefore, it has good operability and is used in the simulation of various land use changes. In Equation (3) of Markov chain, for any positive integer *n* and possible states *i*0, *i*1, ..., *in* of the random variables,

$$P(X\_n = i\_n | X\_{n-1} = i\_{n-1}) = P(X\_n = i\_n | X\_0 = i\_0, X\_1 = i\_1, \dots, \dots, X\_{n-1} = i\_{n-1}) \tag{3}$$

As the land use change conforms to the basic characteristics of the Markov process, it can be regarded as a Markov process. Therefore, the Markov chain analysis can describe the land use change process and predict the future land use change trend. It is an important transformation tool in land use change modeling. However, the following prerequisites must be fulfilled [54–56]: (1) In a certain area, different types of land use should be transformable into each other, (2) the conversion between different types of land use can include many events, which are difficult to describe with a specific formula, and (3) within

the time limit of the study, the conversion status of the land use structure is relatively stable, which meets the requirements of the Markov chain. Moreover, the area ratio of the mutual conversion between the types of land uses equals the state transition probability.

MLP is a very widely used neural network in remote-sensing image processing, especially remote sensing image classification. MLP was used in the model primarily to calculate the land use change potential, that is, the future conversion probability between each land use type. The process involved analyzing future land use by establishing a land use driving force model and the quantitative relationship between each land use type to assess the probability of change. Based on the calculated potential distribution of soil use changes, the location of possible future land use changes can be determined. The back propagation algorithm used in MLP consists of two parts, namely the forward propagation of information and the backward propagation of errors. In the forward propagation process, the input information is calculated from the input layer through the hidden layer to the output layer, and each layer for the state of a neuron only affects the state of the next layer of neurons [57,58]. If the expected output is not obtained in the output layer, the error change value of the output layer is calculated, and then turned to reverse propagation, and the error signal is returned back along the original connection path through the network to modify the weights of neurons in each layer until the desired target is reached. During the forward propagation process, the state of the activated neuron is updated layer-by-layer from the input layer to the output layer, as shown in Equation (4):

$$\alpha\_{\vec{j}} = \sum\_{i} a\_{i} w\_{\vec{j}i}.\tag{4}$$

*xj* represents the total input received by neuron *j*, *wji* represents the weight between neurons *j* and *i*, *α<sup>i</sup>* denotes neuron *i* once *xj* is calculated. The most commonly used mapping function is the S (sigmoid) function, as shown in Equation (5).

$$a\_{\vec{\jmath}} = \mathbf{f}(x\_{\vec{\jmath}}) = \frac{1}{1 + \frac{1}{\exp\left(\frac{x\_{\vec{\jmath}}}{\vec{\jmath}}\right)}}.\tag{5}$$

It is crucial to check the accuracy and effect of the model to determine whether the model needs to be adjusted. The Receiver Operating Characteristic (ROC) curve test evaluates the model by comparing the predicted land change probability distribution map with the actual changed 0–1 map (the changed land value is 1, and the unchanged land value is 0) [59–61]. This step converts the simulated and reference images into a 2 × 2 table, with each table corresponding to a different threshold. The number of pixels within the thresholds of A, B, C, and D create the statistical figure for each ROC curve threshold. The following data are produced; *x* and *y* form the point (*x*, *y*), where *x* is the ratio of classifications labeled as true−, namely D/(B + D), and *y* is the proportion of true+ classifications, that is, A/(A + C). In order to be expressed as a positive value on the *x* axis, the opposite part of true− is generally represented by B/(B + D). Thus, the ROC curve test provides the Area Under the ROC Curve (AUC), which is obtained using the following formula:

$$\text{AUC} = \sum\_{i=1}^{n} (x\_i - x\_{i+1}) \times \left\{ y\_i + \frac{y\_{i+1} - y\_i}{2} \right\} \tag{6}$$

where *xi* refers to *x* for each threshold *i*, that is, B/(B + D), and *y* is calculated using D/(B + D).

#### **3. Results and Discussion**

#### *3.1. Analysis of Historical Land Use*

The land use maps in 1990, 2000, and 2010 and the accuracies are shown in Figure 2 and Appendix A. Firstly we randomly generated 2866, 2549, and 2481 sample points in 1990, 2000, and 2010 through hierarchical random sampling method. There were 1520 sample

points, 1227 sample points, and 1008 sample points in forest area in 1990, 2000, and 2010, respectively. Then we evaluated the accuracy of classification for sample points based on Google earth.

**Figure 2.** Land use maps and the accuracies in 1990, 2000, and 2010.

GlobeLand30, which was developed by the National Geomatics Center of China, is an open-access 30m resolution global land cover data product with an overall classification accuracy of over 80% [62,63]. We compared the area and spatial location of the forest land in 2010 extracted by Globeland30 with those in this study (Figure 3). Firstly, about 600 sample points are randomly generated within Xishuangbanna administrative region. Then these sample points are overlapped with Globeland30 and land use map respectively. Finally we evaluate the accuracy of land use map based on the consistency of forest land and nonforest land in Globeland30.

**Figure 3.** (**a**) Forest area in 2010 from Globeland30 and this study; (**b**)spatial distribution of the validation samples in Globeland30; (**c**) spatial distribution of the validation samples in this study.

In terms of the forest land area, the forest land area extracted from Globeland30 was 1.21× <sup>10</sup><sup>6</sup> ha, and that from this study is 1.05×106 ha, with the accuracy 86.92%. In terms of spatial location of the forest land, among 600 randomly generated sample points, 376 were the forest land and 230 were nonforest land in Globeland30; in comparison, 331 sample points were the forest land and 275 sample points were nonforest land in this study. The overall accuracy is 83.66%, and the kappa coefficient is 0.657.

The areas, changes, and dynamics of the three types of land use in 1990, 2000, and 2010 are shown in Figure 4.

**Figure 4.** *Cont*.

**Figure 4.** (**a**)Land use areas in 1990, 2000, and 2010, (**b**) changes in land use type, and (**c**) land use change degree in Xishuangbanna between 1990 to 2000, 2000 to 2010, and 1990 to 2010.

As shown in Figure 4, the areas under cultivated land, forested land, and water bodies in Xishuangbanna showed a downward trend from 1990 to 2010. Among them, the decrease in forest area is the most obvious, with a total reduction of 360,819 ha (360,819 × 104 m2) over the past 20 years, a dynamic land use change degree of −1.42%, a decrease of 265,491 ha (265,491 × 104 m2) from 1990 to 2000, and a reduction of 95,328 ha (95,328 × 104 m2) from 2000 to 2010. The area of cultivated land showed an increasing trend in the previous 10 years, marked by a rise of 18,153 ha (18,153 × 104 m2) and a dynamic land use change degree of 1.24%. The area of cultivated land decreased by a total of 21,456 ha (21,456 × <sup>10</sup><sup>4</sup> m2) in the latter 10 years, with a dynamic land use change degree of −1.31%. The area under water bodies declined continuously for the two decades, with a total reduction of 3996 ha (3996 × <sup>10</sup><sup>4</sup> m2) and a dynamic degree of −2.17%. Grasslands, rubber plantations, shrubland, tea gardens, and construction land in Xishuangbanna region showed increasing trends from 1990 to 2010. Among them, the area of rubber plantations showed the most obvious growth, with a total increase of 249,948 ha (249,948 × 104 m2) in 20 years, and a dynamic land use change degree of 9.87%. Moreover, the area under tea gardens increased by 43,686 ha (43,686 × <sup>10</sup><sup>4</sup> m2) in the past 20 years, the dynamic land use change degree being 6.26%. Although the areas under grassland, shrubland, and construction land increased, the changes were relatively insignificant.

In summary, the economic development of the Xishuangbanna region and the improvement in people's quality of life led to a rise in the cultivation of cash crops such as rubber and tea in the region in the past 20 years, resulting in a large number of forests being felled.

During the period 1990–2000, carbon emissions for Global Forest Watch and Santoro datasets were 7.85 million t CO2e and 5.63 million t CO2e, respectively, with a difference of 28.30%. During the period 2000–2010, carbon emissions for Global Forest watch and Santoro datasets were 2.82 million t CO2e and 2.00 million t CO2e, respectively, with a difference of 28.81%. Carbon emissions for the period 1990–2000 were about 2.8 times as much as those for the period 2000–2010 (Figure 5).

**Figure 5.** Carbon emissions for the period 1990–2000 and the period 2000–2010.

#### *3.2. Influencing Factors of Land Use Change*

There are many drivers that lead to deforestation and forest degradation within REDD+. Direct drivers are human activities or immediate actions that directly impact forest cover and loss of carbon such as agriculture expansion (both commercial and subsistence), infrastructure extension, and wood extraction. Indirect drivers are complex interactions of social, economic, political, cultural, and technological processes to cause deforestation or forest degradation. They act at multiple scales: international (markets, commodity prices), national (population growth, domestic markets, national policies, governance), and local circumstances (subsistence, poverty) [64–67]. Since RLs refer to the business-as-usual scenario, which means without any change in REDD+ drivers (situation, government, socio-economic forces, etc. that occur over time), this study only considered seven factors influencing land use change, namely distance to a road, distance to a river, elevation, slope, aspect, distance to an administrative center, and nature reserves (Table 3 and Figure 6).

**Table 3.** Factors influencing land use change and data acquisition methods.


**Figure 6.** Influencing factors of land use change in Xishuangbanna region.

Cramer's V coefficients (Table 4) were calculated to measure the correlation between the above-mentioned factors impacting land use change and land distribution. The larger the value, the stronger the correlation.


**Table 4.** Cramer's V coefficients indicating correlations between the influencing factors of land use change and land distribution.

#### 3.2.1. Distance to a Road

Besides playing a very important role in the economic and social development of a region, traffic conditions impact the land use status of a region. The overall correlation between the land type and distance from a road is 0.1334. Firstly, compared with the overall value, Cramer's V coefficient for shrubland and tea gardens is 0.2735 and 0.2148, respectively, which is much higher than the overall value. Thus, the distance from a road is a relative important factor affecting shrubland and tea gardens. Secondly, Cramer's V coefficient of the impact of the distance from a road on rubber plantations and construction land is 0.1550 and 0.1411, respectively, quite similar to the overall value. Thus, the affected land types dominated by road traffic in the Xishuangbanna region are shrubland, tea gardens, rubber plantations, and construction. It is evident that these land types are affected by anthropogenic activity. The reason of highest correlation between the shrubland and road is that it is very common in Xishuanbbanna to have roads built across shrubland rather other areas.

#### 3.2.2. Distance to a River

The precipitation in Xishuangbanna region is abundant and evenly distributed. The dependence of most land use types on rivers is not obvious, except for shrubland and tea gardens. Among them, the influencing factor, namely the overall correlation value of the distance from a river to the land type is 0.0905, and the Cramer's V coefficients for tea gardens (0.1277) are higher than this overall value. This result indicates that the distance from a river is the main factor affecting tea gardens.

#### 3.2.3. Terrain-Related Factors

Topographic factors play a very important limiting role in various production activities. The study area is mainly mountainous, and, thus, the topographic factors of elevation, slope, and aspect cannot be ignored. Firstly, the overall value of the correlation is 0.2539, and woodland and rubber plantations alone show higher correlation coefficients than this overall value (the corresponding Cramer's V coefficients are 0.3482 and 0.5297). During the period 1990–2010, the rubber plantation in Xishuangbanna continuously expanded from low-altitude flat valleys to mountainous areas in high altitudes due to high rubber price from the international market, population pressure, and economic development. This is the reason for the highest correlation between the elevation and rubber plantation. Cramer's V coefficients of elevation for shrubland, grassland, cultivated land, tea gardens, construction land, and other land are 0.1192, 0.1289, 0.1355, 0.1230, 0.0451, and 0.2130, respectively, indicating that their correlation coefficients are lower than the overall value.

Secondly, the slope affects the water distribution, wind speed, and soil texture required for crop growth. The overall value of the correlation for the slope is 0.1608, while Cramer's V coefficients for shrubland, rubber plantations, and tea gardens are 0.2131, 0.1984, and 0.1665, respectively, higher than the overall value. Thus, this factor can be regarded as the main factor impacting these land uses. However, in overall terms, Cramer's V coefficient is less than the corresponding values for grassland, cultivated land, construction land, and other land (0.0406, 0.0210, 0.1076, and 0.0256, respectively).

Finally, the aspect primarily affects the length of time and temperature for the growth and final yield of crops. The overall value in this case is 0.0431, while Cramer's V coefficients for shrubland, grassland, and rubber plantations are all greater than the overall value (0.0729, 0.0560, and 0.0570, respectively).

#### 3.2.4. Distance to an Administrative Center

Governmental administrative organizations are typically located in townships. Given the increasingly strict forest protection policies being applied to Xishuangbanna region, areas closer to governmental administrative organizations can be conveniently supervised and regulated, resulting in a certain deterrent effect on forest destruction and illegal mining of local resources. The overall value of the distance from a township is 0.1252. The corresponding Cramer's V coefficients for rubber plantations, and tea gardens (0.1542, and 0.1516, respectively) are higher than the overall value. However, the coefficients for grassland, cultivated land, construction land, and other land (namely, 0.0933, 0.0455, 0.1025, and 0.0361, respectively) are less than the overall value. Therefore, rubber plantations and tea gardens are clearly (and expectedly) impacted by distance to a township, whereas this is not so for the remaining land use types.

#### 3.2.5. Limiting Factor (Nature Reserve)

Xishuangbanna Nature Reserve is a national nature reserve consisting of five small subreserves, namely the Mengyang, Menglun, Mengla, Shangyong, and Manzhang Reserves. These sub-reserves are not geographically connected to each other and cover a total area of 242,500 ha (242,500 × <sup>10</sup><sup>4</sup> m2). Notably, 12.68% of the total area of the Prefecture is allocated to nature conservation, namely the protection of the tropical forest ecosystem and its rare wildlife. Relatively little land change has been observed in the protected area, and man-made damage has also been effectively contained. In this study, the conversion rate of certain land use types, such as forestland, in the protected area was set to 0; in other words, anthropogenic activities in these areas are completely restricted.

#### *3.3. Future Land Use Simulation Results and Inspection*

The expansion of rubber and other cash crops has caused massive forest loss and fragmentation in Xishuangbanna. The region experienced the most severe forest losses and degradation particularly for the period 1990 to 2010. Therefore, we chosen the period 1990 to 2010 for REDD+ in Xishuangbanna as the baseline, which is crucial to measure the emission reduction performance and consequently to negotiate meaningful deforestation emission reduction targets. As a result, the land use change data for 1990 and 2000 were used as inputs to the model of the Markov chain and MLP, and the 2010 land use change data were used as the verification values to simulate future land use. The validation of AUC value from the ROC curve method is 0.8, indicating that the results provided by the model are ideal. The land use prediction results for the Xishuangbanna region in the next 20 years of 2016–2035 are shown in Figure 7.

Area under forestland shows a downward trend and is the largest change over the 20 years, with the areal reduction amounting to 158,535 ha (158,535 × 104 m2). Conversely, the areas under rubber plantations, tea gardens, and cultivated land increase, with rubber plantations showing the highest increase (by 108,450 ha (108,450 × <sup>10</sup><sup>4</sup> m2)). The areas under tea gardens and cultivated land also increase, but only slightly (by 39,204 ha (39,204 × <sup>10</sup><sup>4</sup> <sup>m</sup>2) and 31,707 ha (31,707 × 104 <sup>m</sup>2), respectively). The areas under shrubland, grassland, construction land, and water bodies remained stable. Thus, in the next 20 years, the Xishuangbanna region will undergo further deforestation; simultaneously, given its improved economic development and the rising human demand for resources, the cultivation of cash crops such as rubber and tea will continue to increase, which will add pressure on the region's forests.

**Figure 7.** Land use forecast for Xishuangbanna region for the next 20 years (unit: ha or 104 m2).

#### *3.4. Reference Levels in Xishuangbanna*

According to IPCC's Good Practice Guidance, the source/or sink estimates were determined by multiplying the activity data by a carbon stock coefficient (i.e., emission factor) at two points in time. In this study, the combination of the IPCC method and the land use change model showed that the carbon emissions from the study region obviously increased year by year over the 20 years of this study (Figure 8); the simulated growth trend provides an estimate of 0.35 million t CO2e of annual carbon emissions on average. Simultaneously, the large increase in rubber plantations facilitated a rise in carbon absorption, resulting in average annual carbon sequestration of 0.13 million t CO2e. Although the total amount of carbon sequestration attributable to cultivated land, grassland, shrubland, and tea gardens changed, the overall increase was not large. In general, the total carbon emissions in Xishuangbanna rose year by year during the past two decades. The average annual carbon emissions in the past two decades were estimated to be 0.23 million t CO2e, while the total carbon emissions in the same time period amounted to 4.6866 million t CO2e, indicating an obvious increase.

**Figure 8.** Reference levels in the Xishuangbanna region.

#### **4. Conclusions**

A careful assessment of RLs for REDD+ in Xishuangbanna, China provides significant insights to REDD+ project. The implications that emerge from this study are as follows.

1. We developed a methodological framework to estimate carbon emissions for the REDD+ program in the tropical forests of Xishuangbanna, China. By coupling IPCC's GPG and land use scenario modeling, we could successfully estimate the RLs. Within the framework, the Enhanced Thematic Mapper Plus/Thematic Mapper(ETM+/TM) remote sensing images in the study area were used to interpret land use changes in 1990, 2000, and 2010. The Land Use Dynamic Index was used for the transfer of land use types during the study period to identify that rubber plantations were the main contributor to forest loss in this region. The Markov chain model was used to predict the number of future land use changes and the Multilayer Perceptron model was applied to calculate the distribution location of these changes.

2. According to Paragraph 71 of Decision 1/CP.16, forests RLs are one of the elements to implement REDD+ activities for developing country parties. Moreover, the COP recognizes the importance and necessity of adequate and predictable financial and technology support for developing such RLs. Identifying these RLs is, therefore, a critical step in the provision of financial incentives and/or creation of carbon markets. Furthermore, they guide the design of the REDD+ strategy. In this study for the business-as-usual scenario of the RLs, Xishuangbanna will lose 158,535 ha (158,535 × 104 m2) of forest area in next 20 years, resulting in approximately 0.23 million t (0.23 × 109 kg) CO2e emissions per year. This is due to the improved economic development and the rising human demand for resources, such as the cultivation of rubber and tea.

#### **5. Future Scope**

Estimating carbon emissions based on RLs is a multidisciplinary task. It requires expertise in forestry science, ecological modeling, statistics, remote sensing, and field techniques. Undertaking this exercise is demanding given global geographical diversity, and, thus, building technical capacity to this end is essential. Modeling future emissions based on historical trend rates and understanding the relationships between deforestation patterns and the drivers of deforestation are essential for RL estimation [68–70].

Remote sensing technology using optical sensors is capable of measuring the carbon content of different forest types when supported by field information from, for example, sample plots used to calibrate the technology. Using this methodology, a multitemporal set of remotely sensed data can be used to detect forest changes over time [71–73]. Thus, freely available Landsat images can provide reliable measurements of forest change, especially when complemented with high-resolution satellite imagery from sensors such as QuickBird, which provide data for image analysis training and validation.

**Author Contributions:** Conceptualization, H.L.; Data curation, G.L., C.Z. and K.H.; Formal analysis, K.H., Y.F. and M.X.; Investigation, Y.F., M.X. and R.G.; Methodology, G.L. and C.Z.; Project administration, H.L.; Resources, G.L.; Writing—original draft, H.L. and G.L.; Writing—review and editing, H.L. All authors have reviewed and agreed to the published version of the manuscript.

**Funding:** This study is under the auspices of NSFC42071267, NSFC41371525, Program for Innovative Research Team (in Science and Technology) in University of Henan Province (21IRT-STHN008), Dabieshan National Observation and Research Field Station of Forest Ecosystem at Henan andSYL20060111.

**Acknowledgments:** We are grateful to Quntao Yang's contribution to the research. We are also indebted to Liangyun Liu and Bowen Song, our colleagues at the Aerospace Information Research Institute (AIR) under the Chinese Academy of Sciences (CAS), for their kind help on China's GlobeLand30.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


**Table A1.** The accuracies in 1990.

#### **Table A2.** The accuracies in 2000.


**Table A3.** The accuracies in 2010.


#### **References**

1. FAO. *Global Forest Resource Assessment 2005*; FAO: Rome, Italy, 2006.


## *Article* **Assessment of Land Degradation in Semiarid Tanzania—Using Multiscale Remote Sensing Datasets to Support Sustainable Development Goal 15.3**

### **Jonathan Reith 1,2,3,\*, Gohar Ghazaryan 2,4,5, Francis Muthoni <sup>3</sup> and Olena Dubovyk 2,4**


**Abstract:** Monitoring land degradation (LD) to improve the measurement of the sustainable development goal (SDG) 15.3.1 indicator ("proportion of land that is degraded over a total land area") is key to ensure a more sustainable future. Current frameworks rely on default medium-resolution remote sensing datasets available to assess LD and cannot identify subtle changes at the sub-national scale. This study is the first to adapt local datasets in interplay with high-resolution imagery to monitor the extent of LD in the semiarid Kiteto and Kongwa (KK) districts of Tanzania from 2000–2019. It incorporates freely available datasets such as Landsat time series and customized land cover and uses open-source software and cloud-computing. Further, we compared our results of the LD assessment based on the adopted high-resolution data and methodology (AM) with the default medium-resolution data and methodology (DM) suggested by the United Nations Convention to Combat Desertification. According to AM, 16% of the area in KK districts was degraded during 2000–2015, whereas DM revealed total LD on 70% of the area. Furthermore, based on the AM, overall, 27% of the land was degraded from 2000–2019. To achieve LD neutrality until 2030, spatial planning should focus on hotspot areas and implement sustainable land management practices based on these fine resolution results.

**Keywords:** land degradation neutrality; SDG; land productivity; land cover; NDVI; Landsat; vegetation-precipitation relationship; soil organic carbon; Google Earth Engine

#### **1. Introduction**

Land degradation (LD) is defined as the "continuous reduction or loss of the productivity of the land due to a combination of natural and anthropogenic causes" [1]. It is a global problem and affects people, their livelihoods and nature. Studies suggest that up to 3.2 billion people live and depend on degraded lands [2] and that approximately a quarter of the world's lands are affected by LD [3,4]. Poor people, who often rely on agriculture, are most vulnerable to LD [5,6]. Lost ecosystem services due to land use and land cover (LULC) change and LD account for up to USD 10.5 trillion loss per year, which is about a sixth of the world's gross domestic product (GDP) [7]. Furthermore, biodiversity is declining globally, with tremendous losses in sub-Saharan Africa because of LD [6]. Projections suggest that lower productivity in the face of climate change will drive LULC change globally. Moreover, the population growth, combined with a changing diet, will have an enormous influence on agriculture and thus LD [8]. It is for these reasons that the world community introduced the sustainable development goal (SDG) 15.3, which aims to

**Citation:** Reith, J.; Ghazaryan, G.; Muthoni, F.; Dubovyk, O. Assessment of Land Degradation in Semiarid Tanzania—Using Multiscale Remote Sensing Datasets to Support Sustainable Development Goal 15.3. *Remote Sens.* **2021**, *13*, 1754. https:// doi.org/10.3390/rs13091754

Academic Editor: Elias Symeonakis

Received: 31 March 2021 Accepted: 26 April 2021 Published: 30 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

"restore degraded land and strive to achieve an LD-neutral world", highlighting the global importance of this issue [9,10].

Tanzania is a hot spot of LD, with more than half its area showing signs of degradation [2,11]. It has the highest annual forest area net loss in East Africa and the fifthhighest worldwide [12]. The cost of LD has been summed up to USD 2.3 billion annually in the first decade of the new millennium [13]. Seventy-five percent of the total labor force, mostly rural people, work and depend on the agricultural sector, which is accountable for about 30% of the GDP [14]. Although the cultivated area increased in the last years, the output per hectare (ha) decreased, both in annual and perennial crops, even though fertilizer consumption quadrupled at the same time [15]. The number of undernourished people is growing and is currently more than 30% [16]. The population is increasing while agricultural productivity is stagnating, and the economic dependency on natural goods is still high. The consequences of this dilemma area persisting pressure on land and, thus, a probable conversion of natural into cultivated land in the coming years. The poor people's food security is also at risk, and in the coming years, in the face of climate change, new insecurities are likely to arise [17]. This holds especially true for the rural semiarid central districts of Kiteto and Kongwa (KK).

Agricultural intensification and sustainable land management (SLM) are keys to halt and reverse LD [18–20]. One major constraint that prevents action is the lack of spatial information on the extent and magnitude of LD [18]. In contrast to the laborious fieldwork, remote sensing offers the unique opportunity to consistently assess vast areas over a long period [2–4]. Unfortunately, the existing LD maps have a coarse spatial resolution and provide inconsistent estimates of the affected area [8]. For example, previous estimates of the extent of LD in Tanzania range from 41% to half of the country [2,3,11]. These variations emanate from differences in definitions of LD, monitoring methods and lack of appropriate data [6,21]. In the course of SDG 15.3 implementation, standard methods for assessing LD were introduced, making reports more comparable.

This new standard methodology, recommended by the United Nations Convention to Combat Desertification (UNCCD), includes the usage of three sub-indicators for the complimentary assessment of LD [22]. The first sub-indicator, land cover (LC), reports changes in vegetation cover. The second, land productivity (LP), captures changes in ecosystem functions. The last, soil organic carbon (SOC), indicates slower changes resulting from biomass alterations [20]. The three sub-indicators are aggregated to form the land degradation indicator. Improvements in one indicator cannot compensate losses in others, as they are complementary and not additive. Thus, the "one-out, all-out" approach is applied whereby even if one indicator shows signs of decline and the others are positive, the land is deemed to be degraded [23].

The recent Tanzanian national LD-neutrality (LDN) report follows these guidelines [24]. However, it only assesses LD for the first ten years of the 21st century and mainly uses global default data with a coarse spatial resolution. The 1 km coarseresolution is inadequate to monitor LD in small mountainous and highly fragmented landscapes, as it may miss out on smaller than pixel size LD areas [25].

Overall, only a few studies have been published on the subject of SDG 15.3.1 monitoring and assessment. Gichenje and Godinho [26], for example, conducted a baseline assessment of the SDG indicator 15.3.1 for the years 1992 to 2015 using the Advanced Very High Resolution Radiometer (8 km, AVHRR) Normalized Difference Vegetation Index (NDVI) time series and the European Space Agency (ESA) Climate Change Initiative (CCI) LC map in Kenya. In Mozambique, Frederique et al. [27] analyzed only the LD sub-indicator LP trend using the Moderate Resolution Imaging Spectroradiometer (250 m, MODIS) NDVI from 2001 to 2016.

However, these studies share the common disadvantage of applying only default methodology and global datasets for national and subnational LD assessments. Though Akinyemi et al. (2020) used a customized 30m resolution LC map to assess the LC subindicator of SDG 15.3.1 in Botswana, this study relied on AVHRR time-series assessment

for the LP sub-indicator. Furthermore, no studies exist in Africa that used high-spatialresolution datasets the assessment of more than one sub-indicator of SDG 15.3.1. Therefore, it is vital to overcome the existing research gaps and use high-resolution spatial data to provide improved information on the SDG 15.3 [28].

In this light, the main aim of our study was to assess the SDG 15.3.1 indicator based on the newly adopted approach based on the higher resolution (compared to default UNCCD datasets) 30m Landsat time series and 30m LC maps and compare our results to the estimates of the SDG 15.3.1 based on the default UNCCD data and methods.

Our study addressed the following research questions:


#### **2. Materials and Methods**

#### *2.1. Study Area*

The study site is situated in Kiteto and Kongwa districts, located in Dodoma and Manyara regions of Central Tanzania, respectively (Figure 1). The elevation ranges between 850 and 2100 m above sea level. The study area has a hot arid steppe climate [29]. The average monthly temperature stays between 19 and 25 ◦C all year, and the precipitation is roughly 600 mm a year, with interannual differences of 500 to 800 mm. Large parts of northern Kiteto and more minor areas of the mountainous region in Kongwa are protected areas for nature and landscape conservation.

#### *2.2. Materials*

The SDG 15.3.1 indicator and its three LDN sub-indicators were computed using the recommended default method (DM) with Trends.Earth [30] and the adapted methods (AM) using high-resolution Landsat (and other) datasets (Table 1).

**Figure 1.** Location of the study area in Central Tanzania (**A**,**B**) and protected areas (**C**).

The DM (LC) map provided by the UNCCD is based on the 300 m ESA CCI LC map (Table 1). The AM utilized 30 m LC maps for 2000–2018 in the study area that the Regional Centre for Mapping of Resource for Development (RCMRD) developed. Both datasets were disaggregated into the six LC classes as defined by Intergovernmental Panel on Climate Change (IPCC), i.e., forestland, grassland, cropland, wetland, urban, and otherlands [31].

The recommended global default dataset uses the MOD-13Q1-coll6 (250 m) MODIS-NDVI products [30]. In contrast, the AM was calculated based on a 30 m resolution NDVI from a combination of Landsat 5, 7 and 8 (Table 1). The Landsat time series were accessed and analyzed using Google Earth Engine [32], based on atmospherically corrected surface reflectance collections (Table 1). The Landsat 5 and 7 data were spectrally harmonized with Landsat 8 series using linear transformation [33]. As a further step to improve the image quality, the fmask was adopted to mask out clouds and cloud shadows [34,35]. Generally, the images with cloud cover scores higher than 80% were removed. Finally, the NDVI was calculated for each image, and then the images of the same admission time were merged and clipped to the extent of the study area. As it is recommended to constrain the observation period to the growing season to reduce the number of irrelevant assets for the computation and enhance the quality of the time series [22], we used the imagery from November to June. When using Trends.Earth, there is no possibility to apply the computation to the growing season, so the DM uses the whole calendar year. In order to integrate the rainfall information, data from Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) were used (Table 1).



The SOC metrics were derived from the SoilGrids250m dataset [41] for the DM and the AM, as there is no national SOC database for Tanzania. SOC is measured at a depth of 30 cm and is stated as mass per area (e.g., tons per hectare (t/ha)) [22].

#### *2.3. Methods*

The calculation of the SDG 15.3.1 indicator is based on the "one out, all-out" approach (Ref. [23] and Figure 2). The three LD sub-indicators (LC change, LP decline and loss of SOC) are estimated, and if one indicator signals degradation, the LD indicator will reflect this as well. A baseline is needed to compare the progress of LDN. The baseline year (t0) was set to be 2015 and is computed as the average of the period leading up to t0 (2000–2015). The indicators are then remeasured in regular time intervals leading to 2030, and change is used to monitor the progress to accomplish LDN [20].

**Figure 2.** Steps to derive the sustainable development goal (SDG) indicator 15.3.1 from the sub-indicators. I represents Improvement, S represents Stable and D represents degraded (based on [31]).

To calculate the indicator for the reporting year 2019 (t1), it is necessary first to assess the baseline util t0 and then calculate the change from the baseline to t1 (Figure 2). As a final step, combine both results. The details of the calculation of each indicator are explained in the following section. The three LD sub-indicators were created from satellite images using cloud-based geospatial computing. The indicators were calculated using Trends.Earth [30] and Google Earth Engine [32] for the DM and AM, respectively. As Trends.Earth currently only enables the computation for the baseline period (BP), the DM is only available from 2000 to 2015.

#### 2.3.1. Sub-Indicator 1: Land Cover Transitions and Degradation

The first SDG 15.3.1 indicator is the LC change. To assess the LC degradation, the transitions between 2000–2015 and 2015–2018 were analyzed for the baseline and the first monitoring period (MP), respectively. To determine whether changes from one LC class to another are interpreted as degradation, a change matrix can help visualize the transitions (Table 2) based on the Good Practice Guidance by the UNCCD [31]. It is recommended to adopt this matrix for the national context. Therefore, transitions from grasslands to croplands were not considered LD for the AM to avoid tradeoff between ecosystems and food security and between nomadic and sedentary living.

#### 2.3.2. Sub-Indicator 2: Loss of Land Productivity

LP is described as "the biological productive capacity of the land". It is closely associated with net primary productivity [42], which can be measured directly with earth observation methods [22]. NDVI is a widely used index detecting LP [26,43,44]. The LP sub-indicator consists of three distinct components, namely trend, state and performance.

The LP trend component measures the trajectory of change in productivity over time. It is calculated at the pixel level using linear regression and the Mann Kendall significance test [22,45,46]. Positive and negative changes in NDVI indicate increasing and decreasing productivity associated with vegetation recovery and degradation, respectively. The eight most recent years of data were used to create a new distinct and significant time series that is more responsive to present land conditions. Further, following [47], we accounted

for the effect of rainfall variability on vegetation productivity trends by using the rain use efficiency (RUE) method.

The LP state component represents recent changes in LP compared to the BP. The yearly NDVI mean images of the shortened BP (2000–2012) were normalized and assigned to classes from 1 to 10 based on their percentiles. To avoid annual fluctuations, contemporary values of the three-year anteceding t0 and t1 were classified in this scheme. Areas with a reduction of two or more classes were classified as degraded, while the rise by two categories was interpreted as an improvement [31].

The LP performance component examines local productivity compared to similar ecoregions defined by the unique combination of SoilGrids [41], soil taxonomy great groups and LC classes (Table 1). The 90th percentile in each ecoregion was calculated as a proxy for the maximum productivity level. The LP performance was then calculated based on the ratio of the observed mean NDVI value per pixel and the NDVImax (90th). Values below 0.5 indicate regions where the LP is low and LD may prevail [31].

The overall LP sub-indicator is calculated based on the three components mentioned earlier. As the LP trend is based on a statistically significant test, it is most influential, and its status determines LP degradation. Only if both LP status and LP performance show negative results, does the LP indicator also show degradation [22]. If only the LP state component shows degradation, this could indicate "early signs of decline" because the other indicators may not have detected the most recent LD. Further, if only performance shows degradation, there is no temporal trend, and the land is classified as "stable but stressed" [22]. In contrast to the Good Practice Guidance by UNCCD, Trends.Earth (DM) also incorporates the "early signs of decline" state component into the LP degradation [30].

**Table 2.** Land cover transition matrix (2000–2015) based on the adapted methods (AM). Green, beige and brown colors indicate improving, stable and declining conditions of land cover categories, respectively. The area in km2 and the possible cause of the land cover transition are indicated in the matrix. The change is based on the high-resolution land cover dataset.


#### 2.3.3. Sub-Indicator 3: Degradation of Soil Organic Carbon

The Good Practice Guidance for the SOC sub-indicator is based on the maximum equilibrium SOC content at a location that is controlled by environmental factors such as rainfall, evaporation, solar radiation, and temperature [22]. The content can change based on three distinct change factors: First, the land-use factor represents SOC stock changes based on the type of land use. Second, the management factor reflects the management practice of the land use (e.g., grazing intensity on grasslands). Third, the input factor represents the different amounts of carbon input into the soil [22,48,49]. While the LULC change factor can be used with LC as a proxy, there are presently no sufficient datasets available to provide information about the management or the input for the other two indicators. Thus, the only indicator to assess SOC changes is the second LD indicator LC change [22].

#### **3. Results**

Three sub-indicators, namely LC transitions, LP decline, and SOC loss, were estimated to derive the SDG 15.3.1 indicator using the default and adapted methods. The patterns of each sub-indicator based on DM and AM are described in the following sections starting with the BP from 200 to 2015 for both DM and AM. The first monitoring period from 2015 to 2019 is only assessed using the AM, as the data necessary for this period are currently not available in Trends.Earth.

#### *3.1. Sub-Indicator 1: Land Cover Transitions and Degradation*

According to the DM based on the medium-resolution 300 m LC maps, over 99% of the study area remained stable in the BP (2000–2015) (Table A1). Urban areas covering less than 0.1% of the study area experienced the highest relative expansion (56%). The forestlands were the only other LC class that increased in area significantly (4.4%) in the BP.

In contrast to the DM, the AM with high-resolution (30 m) LC data revealed that 6.7% of the total area changed to a less desirable LC class, signifying LD, and only 2.3% of analyzed areas improved. The area of (semi)natural LC, such as forestlands (−19%), grasslands (−6.6%) and wetlands (0.1%), mostly declined, whereas the croplands recorded the highest spatial gain (24.2%) (Figure 3 and Table 2).

**Figure 3.** Sankey plot describing the land cover transitions between the years 2000, 2015 and 2018 using high-resolution land cover data. Bands represent the actual proportion of land that changed class over time.

The trend observed in the BP continued in the first years of the MP (Figure 3 and Table A2). Overall, from 2015 to 2018, 3.3% of the total area was degraded during the MP, while 1.2% of the area changed to a more desirable LC. Grass- and forestlands continued to decline by 3 to 9%, respectively, while anthropogenic(-influenced) covers such as cropland and urban areas expanded. Compared to about 3000 ha forests lost per year (a) in the BP, the rate doubled to 6000 ha/a in the MP. Similarly, the changes in croplands increased from 6000 ha/a in 2000–2015 to 7500 ha/a in 2015–2018.

#### *3.2. Sub-Indicator 2: Loss of Land Productivity*

The DM revealed that the LP sub-indicator showed degradation in 71.1% of the area during the BP from 2000 to 2015 (Table 3). The LP component trend showed "decline" in 26.8% of the area (Figure A1A). Another 44.3% of the study area showed "early signs of decline" (LP component state, Figure A2A), and the rest (28.9%) remained stable (Figure 4A). According to DM, croplands were most affected (48.4%) by LP decline in 2000–2015 (Figure 5). Forestlands with only about 11.7% marked as degraded were less affected compared to their actual LC share (Figure 5).

**Table 3.** The land productivity (LP) status in percent for the default (DM) and adapted methods (AM) for the baseline period from 2000 to 2015, as well as for the first monitoring period of 2015-2019. Furthermore, the land cover share of the degraded area in the target year is depicted.


**Figure 4.** The land productivity sub-indicator generated using (**A**) the default approach with MODIS imagery, (**B**) the adapted approach with Landsat imagery for the baseline period, and (**C**) the adapted approach with Landsat imagery for the monitoring period 2015–2019.

**Figure 5.** The bar chart showing the distribution of land productivity (LP) decline sub-indicators over the land cover classes using the default (DM) and adapted methods (AM). The dashed lines show the actual land cover share.

Based on the AM applied between 2000 and 2015, the final composite indicators of LP decline revealed that 8.2% of the study area was degraded between 2000 and 2015 (Table 3). This is nearly entirely based on the 8.2% "decline" of the LP trend component (Figure A1B). Further, 9.1% and 1.4% of the study area were marked as showing "early signs of decline" (Figure A2B) and "stable but stressed" areas (Figure A3B), respectively (Figure 4B). Grassand croplands accounted for 43.5% and 42% of the degraded area (Figure 5). The decline in forestlands was, in turn, detected only on 2.6% of the total degraded area.

LP declined over 12.2% of the study area during the MP from 2015 to 2019 (Figure 4C). With an increase from 9.1% up to 17% of the area, the share of areas with "early signs of decline" (state component) was higher than during the BP (Figure A2). The area where LP was improving was reduced from 855 to 171 km<sup>2</sup> compared to the BP.

#### *3.3. Sub-Indicator 3: Degradation of Soil Organic Carbon*

Soil organic carbon was not directly computed but rather assessed through LC classes' alteration and the related change factors [49]. SOC did not change significantly with the DM during the BP from 2000 to 2015: 99.9% of the land did not change the in SOC content by more than 10% (Table 4). Changes in the individual LC classes were also neglectable.

In contrast to DM, the AM approach revealed that during the BP of 2000–2015, 8.4% of the land was degraded due to SOC diminishment, while 2.1% increased in SOC content (Figure 6). The average SOC stock declined from 51.2 to 50.2 t/ha in 2015, losing 1,592,423 t of carbon over 16 years (Table 4). Forestlands had significantly higher SOC stocks (62.2 t/ha) at t0 than the other LC classes. Based on the transitions in LC, the amount of SOC in forests dropped by 19%, while SOC under agricultural use increased by 25.1%.

**Figure 6.** The soil organic carbon sub-indicator generated using the adapted approach with SoilGrids250m for the (**A**) baseline and (**B**) monitoring period.

In the MP, the SOC content experienced significant losses on 3.7% of the land. The same trend was observed in other LC classes (forest-, grass- and wetlands) that gradually lost SOC in the MP (Table 4).


**Table 4.** The soil organic carbon (SOC) content for the default (DM) and adapted methods (AM) for the baseline period from 2000 to 2015 as well as for the first monitoring period of 2015–2018.

#### *3.4. Combined Sustainable Development Indicator 15.3.1 for the Baseline and First Monitoring Period*

During the BP from 2000 to 2015, the DM method identified 71.1% of KK's area as degraded and only as 0.5% improved (Figure 7A). This result is mainly caused by the subindicator LP, while the two other indicators LC and SOC showed nearly no degradation. The LP degradation was mainly driven by the state component of LP in 70.3% of the total area.

**Figure 7.** The sustainable development goal (SDG) 15.3.1 indicator "proportion of land that is degraded over total land area" for the baseline period with the (**A**) default and (**B**) adapted methods, and (**C**) for the first monitoring period using the adapted method.

On the contrary, during the BP, the AM showed that 16.4% of the area was degraded and 2.7% improved (Figure 7B). The distinct sub-indicators influenced the final indicator more evenly with 52.4%, 50% and 31.7% by SOC, LP and LC, respectively, compared to the DM.

The AM for the first MP (2015–2019) showed that 16% of the total area was degraded, 1.5% improved and more than 82% remained stable (Figure 7C). Forests and grasslands were the least affected among LC classes. Croplands (38%) and wetlands (7%) experienced the most degradation between 2015 and 2019. Over three-fourths of the degradation was driven by the LP sub-indicator, whereas LC and SOC only contributed 20% and 23% to LD, respectively.

#### *3.5. Combined Sustainable Development Indicator 15.3.1 over 20 Years Using the AM*

Over the whole period of 20 years (2000–2019), which results in the SDG 15.3.1 indicator at timestep t1, 27.7% of KK was degraded, and 2.8% of KK improved (Figure 8A). Thus, the LD was widespread across the two studied districts and formed several LD clusters (Figure 7B,C). The degradation was not equally distributed over the study area: the biggest LD hotspots were Central and Western Kiteto, as well as Western Kongwa (Figure 8A). Even though the land covered by forests decreased and the land covered by crops increased from 2015 to 2018, the degraded proportion changed conversely as follows: The degraded area covered by forests increased to 3.9%, while the area covered by crops sank to 41.9%. While SOC's degraded area only changed slightly, the relative contribution sank from 50 to 30% (Figure 8B). The degraded area, which is solely influenced by LP, rose over 50% and interplayed with others over 70%.

**Figure 8.** (**A**) The sustainable development goal (SDG) 15.3.1 indicator "proportion of land that is degraded over total land area" for the years 2000–2019 and (**B**) the contribution to the SDG 15.3.1 indicator by its three sub-indicators land cover (LC) change, land productivity (LP) decline and soil organic carbon (SOC) loss.

#### **4. Discussion**

The presented study is the first in Africa to support the monitoring of the SDG 15.3.1 indicator using fine-spatial-resolution (30 m) satellite time series data for LD assessment. This is a key contribution considering that previous studies used 250 m to 8 km resolution data [24,26,50] for LP sub-indicator monitoring, unlike our study that utilized long-term Landsat time series for SDG 15.3.1 monitoring. Furthermore, it is the first sub-national study that assesses the SDG 15.3.1 indicator in Tanzania for the BP and includes the MP until 2019. The first 4 out of 15 years of the SDG time frame are assessed and could help identify hotspot areas for targeting the appropriate measures to combat LD in the study area.

The presented LD assessment in KK districts confirmed that the LD problem is acute in Tanzania. The Tanzanian target is to achieve LDN by 2030 [24]. Both KK are part of declared LD hotspot regions, which need to improve 25% of the area based on the status at t0. According to our analysis, only 2.7% of the land area has improved and 27.7% is degraded. Next to the (sub)national targets, there are also specific targets to avoid, minimize and reverse LD in Tanzania [24]. Among others, about half of the current national forest area should be restored, 50% of the national croplands should improve LP and the SOC content in croplands should rise to 54.5 t/ha [51]. Despite these more specific and ambitious targets, our results show a negative trend in all LD sub-indicators analyzed, suggesting that more efforts are needed to combat LD in the study area.

Precisely, instead of restoring forest areas, even more trees were cut over 19 years (14.7% to 10.9% tree cover). In croplands, LP degradation was above average, while the SOC content in croplands improved marginally. A possible explanation could be that restoration attempts using SLM practices had not yet shown effects, because it takes several years for the change to be monitored remotely [52,53]. Moreover, it takes decades for SOC to change [49,54]. Hence, it is of paramount importance to prioritize the detected LD hotspots for rehabilitation and SLM practices to reverse LD processes.

There are currently no sub-national studies for KK districts. With around 27% of the area in KK being degraded, it is less affected by LD compared to national assessments found in [2,3] or [11]. However, the comparison with these studies is difficult, as they used different monitoring periods (ending in the 2000s and 2016) and only a subset of the methodology (LP trend) and coarse resolution imagery (i.e., 8 km AVHRR data). This suggests that our study brought LD assessment in Tanzania one step further by assessing three components of LD according to the SDG 15.3.1 indicator. Further, using significantly higher spatial resolution, spatial datasets allowed us to reveal spatial patterns of LD beyond pixel sizes of 8 km [2,3,11] or 1 km [24].

Our study compared the results of the LD assessments based on default UNCCDsuggested datasets (250 m MODIS data used for LP sub-indicator and 300 m ESA CCI LC maps) and customized relatively high-resolution datasets (30 m Landsat data used for the LP sub-indicator and 30 m RCMRD LC maps). The resulting differences between LD estimates based on DM and AM were striking and could be primarily attributed to the difference in the pixel size of 6.25 ha (MODIS) versus 0.09 ha (Landsat), which could be critical in specific areas where fine LD patterns prevailed. This finding is confirmed by several studies highlighting the importance of using high-resolution imagery to detect LD, especially on heterogeneous landscapes, such as KK districts, dominated by heterogenous small-scale farms [50,55,56]. Recent studies that used ground-truth data for validation showed that using Landsat data for the LC sub-indicator captured LD better than using ESA-based 300 m datasets [50]. Nevertheless, certain factors could have impacted the AM, such as the scan-line failure in Landsat ETM+ data. To reduce the potential negative influence of this on our analysis, we applied several preprocessing steps confirmed to be effective in similar studies [56].

NDVI was applied in this study, although it was affected by soil brightness in areas with low vegetation cover. Other vegetation indices, such as MSAVI or MSAVI2, are less sensitive to soil optical properties in less vegetated areas and, therefore, can be used to detect a decline in vegetation productivity [57]. However, the alternative indices have significantly better results than NDVI only in areas where bare soils prevail. Further, Tüshaus et al. [58] compared NDVI with the Soil-Adjusted Vegetation Index (SAVI) and MERIS-based Terrestrial Chlorophyll Index (MTCI). The results indicated only little differences between the different vegetation indices. Nevertheless, the impact of different vegetation indices on the estimated LDN sub-indicators can be further tested.

Furthermore, our results pointed out that the ESA CCI LC did not reflect significant LC changes during the BP in KK districts. Other local estimates, such as the National Forest Resources Monitoring and Assessment of Tanzania Mainland [21] or Tanzanian Forest Reference Emission Level [59], suggest a change rate that is three to twenty times higher, respectively, for a similar period analyzed. Our result is in line with the study of Kimaro et al. [60], who investigated the LC change for the study area from 1987 to 2010. Their study indicated that the LC change was already in progress over 30 years ago with heavy declines in (semi)-natural landscapes. This suggests that our research offers advancement of sub-national assessment of LD in heterogeneous landscapes.

Our study revealed that the LP sub-indicator impacted LD in the study area the most (by 50%) using the AM. The remaining half is affected by SOC, LC, or by the combination of more than one sub-indicator. On the other hand, the LD indicator using the DM is nearly solely affected by the LP sub-indicator, which is primarily driven by the state component. This suggests two things: First, our AM is better suited to reflect the ongoing multidimensional degradation in KK districts. Second, even if the ongoing LULC change stops, the degradation will not halt because of the decline in LP.

This is well reflected in croplands, which were the worst affected land cover class, not only in LP decline but also in SOC loss. Due to the continuous cultivation of the agricultural lands combined with overgrazing and little fertilizer inputs, the crop yields in the study area are reportedly low, caused by the limited availability of soil nutrients and organic matter content [18]. Another study that assessed LD in Kenya in similar environmental and land use settings found that croplands experienced the highest decline in LP, indicating that unsustainable farming practices are widespread throughout Eastern Africa [26]. This has serious consequences, as already 30% of the Tanzanian population are undernourished [16], and the yield gap for the main crops needs to be closed for the population to sustain itself in the coming decades [61].

The soils in KK districts lost 1.6 million t of SOC due to LULC change from 2000–2018, according to our study. This is especially dire, as SOC is vital for soil quality and is a key ecosystem indicator [62]. The study by van der Esch et al. [63] suggests that due to LULC change, 27 Gt of SOC will be further lost globally by 2050, mainly in sub-Saharan Africa. Studies conducted in Tanzania found that higher SOC values on the farm level resulted in financial benefits for the farmers [64]. Thus, increasing SOC via SLM practices would not only improve farmers' living conditions but also allow slowing down ongoing SOC degradation.

In contrast to the LP and LC sub-indicators, which have a continuous basis with Landsat and Sentinel missions [65] and for which there are also further high-resolution maps available [66], the SOC sub-indicator still lacks good spatial and temporal coverage. Further, there are currently no sufficient datasets available to provide information about the management or the input for the SOC indicator. Thus, the SOC change is only approximated by the LC change sub-indicator, leading to a misbalance towards the LULC change in the overall SDG 15.3.1 indicator. At the moment of the analysis, the high spatial resolution SOC data by Innovative Solutions for Decision Agriculture (iSDA) based on [67] were not available. Further work should thus address this limitation and incorporate per availability high-resolution SOC data in the analysis, as well as conducting field validation of both approaches. At the beginning of 2021, the UNCCD updated the first version of the SDG 15.3.1 good practice guidance and innovated the methodology [68]. Future studies should therefore adopt this new approach in conjunction with newly available datasets.

The improvement of the subnational analysis with freely available data, the use of cloud computing platforms, and the source code's availability to perform LD assessment present an opportunity to upscale the analysis further and transfer the methods to other study areas.

#### **5. Conclusions**

The presented study demonstrates the potential of earth observation for LD monitoring with high spatial resolution data and uses cloud computing approaches with Google Earth Engine, and it improves the measurement of the SDG 15.3.1 indicator in the study area in Tanzania up until 2015 and 2019 at two different levels of spatial detail. Our study thus offers advancement of sub-national assessments of land degradation (LD) in heterogeneous landscapes. The improvement of the sub-national analysis with high-resolution data, the use of cloud computing platforms and the provision of the source code used here to perform LD assessment should encourage a transfer of the here presented approach to other study areas and/or the upscaling of the results of this study to the national level.

For this, we compared two approaches of assessing the SDG indicator 15.3.1 in Kiteto and Kongwa districts of Tanzania. The first method applied the global default (DM) medium resolution datasets proposed by the UNCCD for monitoring LD for the baseline period (BP, 2000–2015). The second method, the adapted method (AM), applied local land cover 30 m maps and 30 m Landsat to monitor LD for the baseline and the first monitoring period (MP, 2015–2019). The LD assessment for the BP reveals large differences between

the DM and AM. Using the DM, nearly all degraded area stems from the LP sub-indicator based on 250 m MODIS imagery. In contrast, the degradation was less than 1% for the LC and SOC change sub-indicators, calculated based on ESA CCI LC (300 m) maps. The LD captured by the AM based on Landsat time series and 30 m LC data was evenly distributed between the three sub-indicators and revealed LD on 27.7% of the area. We, therefore, concluded that the results derived from medium-resolution datasets are likely to over- and underestimate the LD for different sub-indicators and, thus, might misinform policy- and decision-makers and land managers if used operationally. Further, our study concluded that the local datasets and high-resolution imagery are essential to capture subtle changes within the heterogeneous landscape in semiarid central Tanzania.

Our results confirmed that LD is currently ongoing in the study area. The LD did not halt after 2015 but spread further across the districts and formed several severe LD clusters. Therefore, to achieve the national LDN targets, it is crucial to address the most important LD causes, such as overgrazing and unsustainable farming in the study area. The application of SLM practices would enhance the low LP in croplands and prevent LULC change in KK districts.

Further work should incorporate high-resolution SOC data in the analysis and conduct field validation of LD assessments resulting from both approaches.

**Author Contributions:** Conceptualization, methodology, software and writing—original draft preparation, J.R.; writing—review and editing, J.R., G.G., F.M. and O.D.; visualization, J.R. and G.G.; supervision, F.M. and O.D.; funding acquisition, F.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the United States Agency for International Development, grant number AID-BFS-G-11-00002.

**Data Availability Statement:** The Google Earth Engine code can be found online on https://github. com/JAReith/SDG15.3.1.

**Acknowledgments:** The authors thank the German Academic Exchange Service for the generous funding by the PROMOS-program and the financial support by the International Institute for Tropical Agriculture for the fieldwork in Tanzania.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A**

**Table A1.** Land cover transition matrix (2000–2015) based on the default methods (DM). Green, beige and brown colors indicate improving, stable and declining conditions of land cover categories, respectively. The area in km2 and the possible cause of the land cover transition are indicated in the matrix. The change is based on the moderate-resolution land cover dataset.


**Table A2.** Land cover transition matrix in km<sup>2</sup> (2015–2018) based on the adapted methods (AM). Green, beige and brown colors indicate improvement, stable and decline of land cover category, respectively. The area and the possible cause of the land cover transition are indicated in the matrix. The change is based on the high-resolution land cover dataset.


**Figure A1.** The land productivity component trend generated using (**A**) the default approach with MODIS imagery, (**B**) the adapted approach with Landsat imagery for the baseline period, and (**C**) the adopted approach with Landsat imagery for the monitoring period 2015–2019.

**Figure A2.** The land productivity component state generated using (**A**) the default approach with MODIS imagery, (**B**) the adapted approach with Landsat imagery for the baseline period, and (**C**) the adopted approach with Landsat imagery for the monitoring period 2015–2019.

**Figure A3.** The land productivity component performance generated using (**A**) the default approach with MODIS imagery, (**B**) the adapted approach with Landsat imagery for the baseline period, and (**C**) the adopted approach with Landsat imagery for the monitoring period 2015–2019.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Remote Sensing* Editorial Office E-mail: remotesensing@mdpi.com www.mdpi.com/journal/remotesensing

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-4228-7