**Mapping Three Decades of Changes in the Brazilian Savanna Native Vegetation Using Landsat Data Processed in the Google Earth Engine Platform**

**Ane Alencar 1,\*, Julia Z. Shimbo 1, Felipe Lenti 1, Camila Balzani Marques 1, Bárbara Zimbres 1, Marcos Rosa 2, Vera Arruda 1, Isabel Castro 1, João Paulo Fernandes Márcico Ribeiro 1, Victória Varela 1, Isa Alencar 1, Valderli Piontekowski 1, Vivian Ribeiro 1,3, Mercedes M. C. Bustamante 4, Edson Eyji Sano <sup>5</sup> and Mario Barroso <sup>6</sup>**


Received: 11 February 2020; Accepted: 10 March 2020; Published: 13 March 2020

**Abstract:** Widespread in the subtropics and tropics of the Southern Hemisphere, savannas are highly heterogeneous and seasonal natural vegetation types, which makes change detection (natural vs. anthropogenic) a challenging task. The Brazilian Cerrado represents the largest savanna in South America, and the most threatened biome in Brazil owing to agricultural expansion. To assess the native Cerrado vegetation (NV) areas most susceptible to natural and anthropogenic change over time, we classified 33 years (1985–2017) of Landsat imagery available in the Google Earth Engine (GEE) platform. The classification strategy used combined empirical and statistical decision trees to generate reference maps for machine learning classification and a novel annual dataset of the predominant Cerrado NV types (forest, savanna, and grassland). We obtained annual NV maps with an average overall accuracy ranging from 87% (at level 1 NV classification) to 71% over the time series, distinguishing the three main NV types. This time series was then used to generate probability maps for each NV class. The native vegetation in the Cerrado biome declined at an average rate of 0.5% per year (748,687 ha yr<sup>−</sup>1), mostly affecting forests and savannas. From 1985 to 2017, 24.7 million hectares of NV were lost, and now only 55% of the NV original distribution remains. Of the remnant NV in 2017 (112.6 million hectares), 65% has been stable over the years, while 12% changed among NV types, and 23% was converted to other land uses but is now in some level of secondary NV. Our results were fundamental in indicating areas with higher rates of change in a long time series in the Brazilian Cerrado and to highlight the challenges of mapping distinct NV types in a highly seasonal and heterogeneous savanna biome.

**Keywords:** Cerrado; land cover; grasslands; forests; monitoring; random forest; spectral indexes; vegetation seasonality

#### **1. Introduction**

Widespread in the subtropics and tropics of the Southern Hemisphere, savannas cover approximately 20% of the Earth's terrestrial surface—around 65% of Africa, 60% of Australia, and 45% of South America [1,2]. They are naturally heterogeneous in terms of climate, soil, biodiversity, and threats posed by human activities and land occupation [3]. This heterogeneity is a consequence of varying edaphic and climatic conditions, occurrence and frequency of fires and the impacts of herbivore populations, the latter mainly in the African continent [4,5]. These variations result in the provisioning of a myriad of ecosystem services, with benefits to human populations that depend directly and indirectly on the savannas as source of food, water, materials, and pollinators [5].

The Brazilian tropical savanna (Cerrado) is the second largest biome in South America, occupying approximately 2 million km2. Although mainly distributed in the central part of Brazil, the Cerrado presents a large latitudinal and longitudinal variation, resulting in different ecoregions [6]. It is formed by a mosaic of open grasslands, shrublands, savanna woodlands, deciduous or semi-deciduous forests, and evergreen riparian forests [7]. It is the most biologically diverse savanna in the world and is considered one of the global hotspots for biodiversity conservation, as it is under severe human-induced threats [8,9]. The Brazilian Cerrado has already lost half of the biome to croplands and planted pastures [6,10,11], and the biome currently holds strategic importance for the Brazilian economy, owing to the production of agricultural commodities. The rate of conversion of native Cerrado vegetation (NV) is up to two times the conversion observed in the Amazon in the past five years [12], and most of the native vegetation conversion tends to occur in areas with dense vegetation (favorable climate and soil conditions) and flat terrains (suitable for mechanized farming) [13].

The conversion of NV has significant impacts on ecosystem functioning, such as regional climate regulation, hydrological stability, and biogeochemical cycles, associated with the loss of significant carbon stocks and the replacement of biodiverse ecosystems, presenting heterogeneous canopy and deep root systems with shallow-rooted monocultures [11,14].

It is, therefore, crucial to understand the spatial and temporal dynamics of land conversion taking into account the three main Cerrado vegetation types (grasslands, savanna, and forests), as differences in carbon stocks and water fluxes are related to the degree of woodiness of the vegetation. In that sense, a long time series of remotely sensed data with high to medium spatial and temporal resolutions provide the means to better understand the ongoing land cover processes in the biome in order to support decision making regarding its conservation [13,15].

The lack of systematic land use and land cover (LULC) maps for the Cerrado biome is partially because of the difficulty in classifying highly complex gradients of natural vegetation with important differences in woody and herbaceous layers [16]. The differentiation of anthropogenic land-use classes and the natural cover is not always straightforward. The Cerrado is also highly seasonal, with marked dry and rainy seasons, different vegetation strata, and different levels of deciduousness during the dry season [6,7], which makes change detection by remote sensing a challenge. The spectral responses of the vegetation change drastically from the rainy to dry season [17], and can be confused with land conversion processes. Moreover, disturbances such as fire are common in savannas [18,19], and pose other challenges to the discrimination of changes related to natural processes from those associated with anthropogenic conversion and degradation. Further, the spectral responses of savannas change drastically from the rainy to dry season (Jacon et al. 2017), often being confused with land conversion processes. Therefore, a method to effectively monitor land cover change over time, taking into consideration the effects of fire and seasonal variations, still needs to be developed.

There are several studies that have aimed to distinguish and map NV types in the Cerrado using moderate to low spatial resolution imagery [6,15,20–26]. Other initiatives have provided snapshots of Cerrado LULC at specific points in time. They include the following: Project of Conservation and Sustainable Use of the Brazilian Biological Diversity (PROBIO) from 2002 [26]; TerraClass Cerrado Project from 2013 [27]; Brazilian Foundation for Sustainable Development (FBDS) from 2013 [28]; National GHG Inventory Initiative, from 1994, 2002, and 2010 [29]; and the Brazilian Institute for Geography and Statistics (IBGE), from 2000, 2010, 2012, and 2014 [30] (Table S1). Therefore, the mapping efforts conducted so far for the biome have been done considering either a single year, a smaller area, or a short time interval, at low to moderate spatial resolution satellite imagery, and usually not distinguishing the three Cerrado NV types. Although quite challenging, an extended yearly time series of the spatio-temporal dynamics of the different Cerrado vegetation types using moderate spatial resolution satellite images still needs to be produced.

In this paper, we present a novel method for mapping the changes in Cerrado NV cover using Landsat imagery, which can be applied to other highly seasonal biomes. Landsat time series are the most suitable dataset for monitoring spatial and temporal dynamics because of their moderate spatial resolution (30 m), 16-day repeat pass, and the availability of historical data since 1985, with basically the same acquisition mode (similar spatial, spectral, and temporal resolutions, and same swath width) [31,32]. The drawback here is the number of scenes necessary to cover the entire biome (approximately 120 Landsat scenes) which is time-consuming in terms of downloading and image interpretation. With the recent development of cloud computing platforms (e.g., Google Earth Engine) [33,34], entire series of satellite images such as from the Landsat mission can be easily accessed, processed, and analyzed in a fully automated way. Within this context, the objective of this study was to map three decades of changes in native Brazilian Cerrado vegetation using a machine learning approach, based on Landsat time series obtained and processed in the Google Earth Engine platform. The image processing involved empirical and probabilistic decision tree classification and was based on spectral metrics that indicated seasonal and phenological variations, previously assessed using available field plot data. These maps were used to retrieve the temporal dynamics of the NV types, including loss and gain (NV net loss), to indicate the level of stability, and to identify the NV areas that have faced natural and anthropogenic changes over the last 33 years (1985–2017).

This effort is part of the MapBiomas project (https://mapbiomas.org), which aims to generate annual land use and land cover classification of Brazil using machine learning algorithms available in the Google Earth Platform. The project is a collaborative network in which different expert groups are responsible for the mapping of different Brazilian biomes and/or cross-cutting themes. The project is in constant development and methodological improvement, and has so far launched six collections. Collection 1.0 covered the period of 2008 to 2015 (published in 2016). Collections 2.0 and 2.3 covered the period of 2000 to 2016 (published 2018), while Collection 2.3 was the first time that the random forest algorithm was applied. Collections 3.0 and 3.1 expanded the period covered to 1985–2017. The work presented here consists of the methodology for mapping the native vegetation for the Cerrado biome in Collection 3.1, and our results comprise the final integration of our classification with other cross-cutting themes (e.g., pasture, agriculture, coastal zone, and urban infrastructure). All resulting maps and methodologies are freely available on the MapBiomas platform, to be downloaded and replicated to other regions in the world.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The Cerrado biome occupies more than 2 million km2, with ecotonal transitions with the Amazon (tropical rainforest), Caatinga (semi-arid), Atlantic Forest (coastal tropical forest), and Pantanal (wetland) [35]. It ranges over 10 Brazilian states (Bahia, Goiás, Maranhão, Mato Grosso, Mato Grosso do Sul, Minas Gerais, Paraná, Piauí, São Paulo, Tocantins), and the Federal District. The core area is concentrated in the Brazilian highlands, the central part of the country (Figure 1). This biome presents wide latitudinal variation (22.4 degrees), elevation ranging from sea level to 1800 m, and strong climatic seasonality (rainy season from October to March and dry season from May to September). The annual precipitation varies between 600 mm and 2000 mm, increasing from east (border with the semi-arid Caatinga) to west (boundary with the Brazilian Amazon rainforest). During the rainy season, short periods of drought (dry spells) can occur, while in the dry season, rainfall levels are deficient, and there is frequently no rain for three to five months. Relative air humidity can be lower than 15% in July and August [36]. The average annual temperature is approximately 22–23 ◦C, while the absolute maximum temperature does not vary much over the year, but can be higher than 40 ◦C [37]. The absolute minimum temperature, however, varies widely, reaching negative values in May to July, causing frosts in some regions in the southern part of the biome.

The Cerrado has three predominant NV types, which can be classified according to their degree of woodiness, from grasslands to savanna woodlands and forests. According to the classification system proposed by Ribeiro and Walter [7], grasslands can be classified into two types: *campo limpo* and *campo sujo*, characterized by the predominance of herbaceous-shrub species and, to a lesser extent, sparsely distributed small trees. Savanna vegetation has a variable tree-shrub stratum, with a canopy cover ranging from 50% to 90%, which allows the coexistence of a grass layer. Forests (forested savannas, also known as *cerradão*, and riparian forests) present dense vegetation, with relatively large trees and low cover of grasses. Total plant biomass and carbon stocks in the Cerrado vary according to the type of vegetation, with an average of 18.5 t C ha−<sup>1</sup> in grasslands, 39.9 t C ha−<sup>1</sup> in savannas, and 68.6 t C ha−<sup>1</sup> in forests [29]. Data on biomass and vegetation structure from 262 field inventory plots in the Cerrado compiled by Roitman et al. [38] were used to assess the spectral metrics that defined the different NV types, as detailed in the next session. The biome was partitioned in a regular grid of tiles compatible with the 1:250,000 international cartographic charts, where individual classifications were run for each tile. Each tile covers an area of 1◦ of latitude by 1◦ 30' of longitude [30], generating a total of 172 tiles.

**Figure 1.** (**A**) Location of the Cerrado biome in Brazil and its subdivision into tiles of 1.5 degree by 1.0 degree, and field plots (in red), which were used to explore the spectral differences between vegetation types; (**B**) "fisheye" field photographs showing the canopy cover in a forest (top), savanna (middle), and grassland (bottom); (**C**) panoramic field photos of forest, savanna, and grassland; (**D**) R5G4B3 representative Landsat-8 color composites of forest, savanna, and grassland; and (**E**) average carbon stock estimates in each vegetation [29].

#### *2.2. Classification Approach*

The procedure to generate multi-temporal land cover maps in the Cerrado and to detect changes over time in the NV followed three main steps. The first step was the production of the year-based Landsat mosaics for the entire biome by defining the boundaries between the dry and wet seasons (Figure 2). These mosaics were used to generate spectral metrics, including sub-pixel fractions, indexes, and individual spectral bands. The second step included the selection of the best spectral variables for the definition of a preliminary empirical decision tree (EDT) classification, which in turn was used to derive a greater feature space to train the statistical decision tree (SDT) classifier. The last step was the classification of the annual mosaics using a machine learning approach, based on the class consistency map derived from the previous step.

**Figure 2.** Workflow of the procedures conducted to map annual native vegetation cover in the Brazilian Cerrado biome, including the production of the annual mosaics, the definition of the empirical decision tree classification, and the statistical decision tree classification. L1T TOA = Collection 1, Tier 1 top-of-atmosphere reflectance; SMA = spectral mixing analysis; SEFI = savanna ecosystem fraction index; GVS = green vegetation and shade; GV = green vegetation; NDFI = normalized difference fraction index; and NPV = non-photosynthetic vegetation.

#### 2.2.1. Annual Landsat Mosaics

The historical changes from 1985 to 2017 in the Cerrado NV were mapped based on the Landsat-5 ThematicMapper (TM), Landsat-7 Enhanced ThematicMapper Plus (ETM+), and Landsat-8 Operational Land Imager (OLI). The Landsat top-of-atmosphere reflectance collection (Collection 1 Tier 1 TOA reflectance), with a 30 m spatial resolution, was accessed via Google Earth Engine (GEE) platform. The entire image processing procedure was also conducted in this platform for cloud removal, classification, and post-classification routines (links to the scripts available in the Supplementary Materials—Table S2).

For each tile, best-pixel annual mosaics were produced using a combination of pixels from distinct Landsat scenes gathered during the year in consideration. The temporal window defining the seasonal limits was evaluated at the state level because of the high variations in annual precipitation over the extent of the Cerrado. On the basis of the annual precipitation patterns for each state, we identified the optimum period (OP) of images to compose the annual mosaics, by integrating the six months from the end of the rainy season to the end of the dry season for each tile (Figure S1). This procedure aimed to reduce NV commission errors provided by a greener mosaic (for example, by considering images from the end of the rainy season), or NV omission errors by considering a drier mosaic, composed by images from the end of the dry season (July to September).

After the definition of the initial and final dates of the optimum period of the mosaic, a cloud and cloud shadow masking procedure, as well as a data dimensionality reduction based on pixel median, were conducted. The cloud and cloud shadow pixels were removed by a mask that was produced with the temporal dark outlier mask (TDOM) (see details of the TDOM in [39]) and complemented using the quality assessment bitmask (BQA) (see details in https://www.usgs.gov/land-resources/nli/ landsat/landsat-collection-1-level-1-quality-assessment-band). The median algorithm was applied in each of the original bands, resulting in one final value per pixel and spectral band. The aggregation of this pixel-based composition of the annual mosaics was used for classification. To better represent seasonality, we used all normalized difference vegetation index (NDVI) values of the pixel in each year to produce dry and rainy median bands. The year-based NDVI values of each pixel were divided into quarters and the median pixels of the quarter with high NDVI were considered the rainy image and the median of the quarter with lower NDVI were considered the dry image.

#### 2.2.2. Empirical Decision Tree

In order to derive a greater feature space used to train the statistical decision tree (SDT) classifier, an empirical decision tree was built with the most relevant parameters as inferred by the statistical evaluation of the input variables. This empirical decision tree (EDT) covered the period of 2000–2016, and sought to identify areas classified as the same NV over the entire period (stable samples). The factors selected for the EDT included the median of the original Landsat bands; spectral vegetation indices; and vegetation, non-photosynthetic vegetation, soil, and shade fractions derived from these bands (Table 1). A layer of slope data from the ALOS (Advanced Land Observing Satellite) global digital surface model with a 30 m resolution was also included (https://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm).

This evaluation was conducted according to our feature space, which consisted of a sample of pixels selected based on 262 field inventory plots of the three NV types (see Figure 1 for the location of the plots within the biome) as well as samples from pasture and agriculture fields based on visual inspection. This procedure sought to identify the best metrics for highlighting the differences between NV types and major land-use classes such as pasture and agriculture fields (Figure S2). Among the metrics considered, those with the highest performances were the green vegetation (GV) index, non-photosynthetic vegetation (NPV) index, and soil and shade fractions, in addition to the green vegetation and shade (GVS) index and the savanna ecosystem fraction index (SEFI). While GVS highlights the difference between roughness and greenness of the vegetation, captured as the shade formed by the tree canopy [40], SEFI represents the combination of the amount of shade provided by the green vegetation with the non-photosynthetic materials and soils. Both GVS and SEFI were essential for separating forests and savannas from grasslands and other land use classes (pasture and agriculture) owing to the combination of vegetation heterogeneity and roughness, the amount of green material, and bare soil. Except for SEFI, which was developed for this study as an adaptation of the normalized difference fraction index (NDFI), the sub-pixel fraction images were based on the sub-pixel modeling and spectral library by Souza Jr. et al. [40].



Once the main input variables were selected, an EDT classification was semi-automatically created and calibrated with the variables for each tile and each year. A total of 2924 tiles (172 × 17 years) were calibrated based on the standard deviation values of each variable for each node, and manually adjusted using visual interpretation whenever necessary. The EDT is a conditional control statement algorithm used to support multistage decisions as a tree-like model consisting of a root node, several interior nodes, and several terminal nodes [42,43]. We used Fmask [45] as the first EDT node, which is a routine developed within the GEE platform to separate clouds, cloud shadow, and snow. Higher values of Fmask tend to be cloud and cloud shade, which, in this classification, were classified as non-observed areas. Non-observed areas on flat terrain were classified as water. Once the non-observed areas and water were classified, the remaining pixels were classified based on the V1 node defined by the SEFI variable. The combination of these fractions was used to distinguish the denser vegetation types from the sparser ones, capturing the differences between forests, savannas, and grasslands (Figure 3).

**Figure 3.** Empirical decision tree classification scheme based on the values of the spectral metrics in 262 known field inventory plots in the Cerrado biome, seeking to separate native vegetation classes (grassland, savanna, and forest) from land use-related classes (mostly agriculture and pastures) and other non-vegetated areas (e.g., urban areas, bare soils). Values below each split refer to the factors at each node above it.

The V1 node represented the main node of the EDT classification scheme, where areas with medium and high SEFI values were classified as forest or savanna with higher tree density. In the upper branch of this node, the GVS index (V2 node) was used to define the first separation of the savanna from other classes. This node is followed by the GV index (V6 node), which aimed to separate young crops, dense forests, and savannas, with higher levels of GV and GVS, from sparser forests and savannas, and drier, more mature crops. In these last two branches, NPV (V7 node) was used to separate savanna from forest, as the former has a higher content of non-photosynthetic material than the latter. Finally, shade (V10 node) was used to distinguish areas classified as forest, but that were, in fact, non-observed areas shaded by topography. The lower branch of the tree is related to areas with a more open vegetation structure. Therefore, nodes V3, V4, and V5 were used to separate open savanna vegetation types from grasslands, and grasslands from pasture and agriculture fields. This branch (specifically, node V3) also included the definition of non-vegetated areas (e.g., urban areas, bare soil). The main variables used to define the nodes in this last branch were the combination of NPV and soil, and the amount of shade, which represents the heterogeneity of surface cover.

Given the lack of robust temporal datasets of land cover maps for the entire Cerrado to be used as reference for machine learning classification (MLC), we used the resulting annual land cover maps from the EDT classification from 2000 to 2016 (Mapbiomas Collection 2.0 classification of Cerrado biome with distinct vegetation types, available in https://mapbiomas.org) as a reference for the distribution of the random training samples for the MLC. This land cover database also used the TerraClass LULC map for 2013 [12] as the reference for the spatial distribution of the three main native vegetation types.

#### 2.2.3. Statistical Decision Tree Classification

On the basis of the time series maps produced with the preliminary EDT classification, frequency maps were built to demonstrate the spatial and temporal consistency of each class, indicating the number of times in which each pixel was consistently classified as a given class throughout the period from 2000 to 2016. This routine created a map with frequency values ranging from 0 to 17, which is the total number of years in that period. The resulting values of this procedure ended up representing the probability of each pixel belonging to a given class. This probability map was used as the reference map to collect training samples for the SDT routines.

The application of the SDT to map NV types over 33 years was based on a machine learning approach with two main steps. In the first step, a new reference dataset of annual maps was created for an updated period from 2000 to 2017 (MapBiomas Collection 2.3, available in https://mapbiomas.org). In this initial SDT classification, the random forest classification algorithm (Breiman, 2001) available in the GEE platform was trained using 50,000 samples (pixels) per tile selected randomly in the areas mapped with higher frequency/consistency for each one of the three main NV types. The minimum number of years in which a pixel would have to be classified to consider a given class as consistent varied between classes (Table 2). The final number of training samples, therefore, varied between tiles as a function of the consistent area availability. Each combination of tile and year had its training sample dataset for application in the random forest classification round (number of trees = 100). The results of this classification were visually assessed to identify areas of spatial inconsistency between adjacent tiles. Additional training samples were included from neighboring tiles to increase consistency in areas of spatial discontinuity.

In the second step, the NV maps resulting from the first random forest classification model were used to produce a new map of consistently classified NV classes. Over this map, new sampling points were selected to train the final map classification for the entire period (1985–2017) (MapBiomas Collection 3.1, available in https://mapbiomas.org). To address the issue of the spatial discontinuity between adjacent tiles in this final classification, the automatic sampling of points considered a buffer of 50 km around each tile. Such a procedure ensures that part of the training samples (about 10%) was representative of the variation found in the contact zone between tiles. Finally, additional samples were selected visually over the annual mosaics to complement the training samples in tiles with few consistently classified NV classes. The STD classifications were based on a larger number of spectral metrics including the ones extracted from the optimum mosaic and used for the EDT (Table 1), as well as other metrics retrieved for the entire annual dataset collection (Table S3).

**Table 2.** Probability thresholds were used to define areas of consistent classification for each land use and land cover (LULC) class. Values inside parentheses indicate the minimum number of years a pixel had to be classified as a given class to be considered consistent, considering a total of 17 years for the initial classification from 2000 to 2016, and 33 years for the final classification from 1985 to 2017. NA = not applicable. SDT, statistical decision tree.


#### *2.3. Post Classification*

A series of spatial and temporal filters were applied to the resulting maps in the GEE platform. The spatial filter segmented and indexed the classes of each collection into contiguous regions, which were subsequently identified and reclassified based on the following criterion: areas less than or equal to 0.5 ha (i.e., approximately 5 pixels) are reclassified based on the majority of the neighboring classes. A temporal filter was applied to identify and correct class transitions throughout the time series (i.e., 3 to 5 consecutive years), as well as to classify pixels with no data caused by cloud cover. For example, a pixel classified as non-forest in a given year *ti* (where *i* = 1985–2017), and forest in year *ti* − 1 and *ti* + 1, was reclassified as forest for the year *ti*. Several transition rules were defined and applied to be used in the temporal filter to deal with specific phenological and land use transitions (Table S4).

#### *2.4. Integration with MapBiomas Cross-Cutting Themes*

In order to implement the final MapBiomas land use and land cover maps for the Cerrado biome, the results of the Cerrado NV classification were integrated with other MapBiomas cross-cutting theme maps (e.g., pasture, agriculture, coastal zone, and urban infrastructure classes), which were independently developed [41,45]. This integration is hierarchical and follows prevalence rules to combine the classification results from all themes. For instance, urban areas and agriculture (i.e., crop fields) had prevalence over NV classes. More details are described in the ATBD (algorithm theoretical basis document) available at the MapBiomas website (http://mapbiomas.org/). After integration, the last step is spatial filtering to remove isolated class patches smaller than half a hectare as well as to remove noise caused by Landsat data misregistration. This spatial filter procedure is the same as described above in Section 2.3.

#### *2.5. Accuracy Assessment*

To assess the accuracy of each year and the NV classes, we used a set of 21,000 independent sampling points based on visual interpretation of Landsat data by three expert analysts (Figure S3). The sampling design and visual interpretation were performed by collaborators at the Image Processing and Geoprocessing Laboratory at the Federal University of Goiás, Brazil (LAPIG/UFG), using in the Temporal Visual Inspection web platform (tvi.lapig.iesa.ufg.br) [46]. The number of pixels collected to compose the reference dataset was pre-determined by statistical sampling techniques considering four tiles of the Cerrado subdivision (described in Section 2.1) as a minimum unit of analysis as well as six classes of slope, according to the shuttle radar topography mission data. The accuracy analysis was based on the approach proposed by Stehman et al. [47], using confusion matrix, overall accuracy, and omission and commission errors. The evaluation of a pixel in a given year was considered valid only when two or three interpreters agreed on the class observed in that pixel. The accuracy was then calculated using metrics that compare the class mapped with the class observed by the interpreters following the good practices proposed by Olofsson et al. [48]. We report year-based accuracy estimates with circa 5% error for each class in the mapping. The accuracy was reported for two levels of disaggregation: Level 1, which encompasses all three vegetation types as one class (NV); and Level 2, which separates all three types of NV.

#### *2.6. Native Vegetation Net Loss*

Once the annual NV maps were spatially and temporally filtered, NV net loss was calculated. The concept of NV net loss considered in this paper comprised the difference between the first and the last NV maps of the time series, representing the area difference of the Cerrado vegetation types in 1985 in comparison with 2017. Owing to the characteristics of the annual NV maps, which were built independently for each year, the net loss represented land cover change undergone by the NV over the past 33 years, accommodating both gains and losses over the whole time series.

#### *2.7. Stability of Native Vegetation*

The stability of the Cerrado NV was defined as the number of times in which a pixel was classified as the same given class (NV and non-NV) in the 17-year period initial SDT classification and in the 33-year period final SDT classification, indicating class consistency over time. The individual annual land cover maps were reclassified into four classes (forest; savanna; grassland; and non-NV land cover classes—agriculture, pasture, water, and non-vegetated areas). Next, they were overlaid and reduced to a single map to build this class consistency map. The NV stability map was classified into four classes for the final SDT classification (1985–2017): (i) stable NV areas classified as the same NV type in the 33 years of analysis; (ii) unstable NV areas, which had been classified as more than one vegetation type over time; (iii) stable non-NV areas, which had never been classified as NV over the time series; and (iv) NV areas converted to other non-NV land cover classes at some point in the time series, regardless of whether they subsequently recovered or not. The stability map was overlaid with the 2017 NV map indicating the current stable and unstable NV areas as well as the NV areas under recovery. The unstable NV areas and the NV areas converted to other land cover classes indicate natural and anthropogenic instability in the biome over the past three decades, respectively.

#### **3. Results**

#### *3.1. Accuracy of Native Cerrado Vegetation Mapping*

The strategy of using the semi-automatic calibrated EDT to retrieve the first set of annual consistent NV maps from 2000 to 2016 was fundamental for the improved performance of SDT classification. It guided the application of the SDT for the entire time series (1985 to 2017) in the highly complex Cerrado landscape, which lacked reference maps for gathering sets of training sampling points. The machine learning approach used here, which combined the SDT based on randomly selected samples in areas previously classified by the EDT as stable over the first time series (2000–2016), increased accuracy at Level 1 (where all NV types were aggregated as one NV class) by 4%: accuracy at Level 1 for the EDT alone was 83%, while accuracy for the final SDT classification (built upon the EDT-derived samples) was 87%.

The average overall accuracy of the final SDT classification at Level 2, analyzing the three NV classes separately, averaged 71%, ranging from 67% to 74% over the 33 years. The average balanced accuracy of forest (84%) was consistently higher than the accuracy of savanna (73%) and grassland (73%) throughout the time series (Figure S4). A confusion matrix was produced by aggregating the 33 annual confusion matrices into a single, average matrix (presented in Table S5). Also, omission and commission errors for all three classes throughout the time series are presented in Table S6. All classes presented high commission errors, with higher errors in the grasslands (Tables S5 and S6). These results suggest a relatively high degree of confusion between savanna and forests, and between savanna and grasslands, as expected in the naturally heterogeneous Cerrado. This is especially the case of *cerradão*, which, even in the field, can be alternately classified as a dense typical savanna or sparse forest. Confusion was higher with grasslands. Although the accuracy decreased from Level 1 to Level 2, as expected, it increased from the beginning to the end of the time series, and from the first to the final classifications.

#### *3.2. Spatial and Temporal Patterns of the Cerrado Native Vegetation*

The time series maps resulting from the SDT machine learning classification using the random forest algorithm indicated that, in 2017, 112 million hectares (Mha) (55%) of the original Cerrado range was still covered by native vegetation (Table 3). Among the three NV types, savanna is the most abundant. This class covered 52% (52.5 Mha) of the NV area in the biome in 2017 (i.e., 26% of the total area of the biome), followed by forest and grassland covering 36% (37.9 Mha) and 22% (22.2 Mha) of the total Cerrado NV, respectively (18% and 11% of the total area of the biome, respectively). Between 1985 and 2017, the Cerrado faced an overall net loss of 24.7 Mha, representing 10% of the original Cerrado distribution and 18% of the existing NV in 1985, resulting in 55% of the original NV distribution. Forests faced a net loss of almost half the total NV loss in the period, with 75% of this loss happening from 1985 to 2000. For savannas, a net loss of approximately 11 Mha could be observed, but, in contrast with the forests, the largest proportion of the net loss in the savanna happened after the year 2000 (52%) (Figure 4). Grasslands remained practically stable throughout this period, with an observed net loss of 1.9 Mha, representing 1% of the original range of the biome (Figure 5).

In the past 33 years (1985 to 2017), the native vegetation in the Cerrado biome has been declining at an average rate of 0.5% year (748,687 ha yr<sup>−</sup>1), particularly affecting forests and savannas (Figure 5, Table 3). Although savannas and forests presented similar amounts of area converted to other land use classes (around 11.3 million hectares), proportionally, forests lost more NV area than savannas (23% of forest loss in relation to 18% of savanna loss by 2017). Moreover, the forests presented a higher conversion rate from 1985 to 2017 (0.7% yr−1) compared with savannas (0.5% yr−1) and grasslands (0.2% yr<sup>−</sup>1), as the original area of forest is almost one-fourth of the original area of savanna (Table 3).


**Table 3.** Native vegetation area in the Brazilian Cerrado at the beginning (1985) and the end (2017) of the time series, and net loss of native vegetation over the whole period between 1985 and 2017. Absolute and proportional net loss rates per year were also calculated. NV = native Cerrado vegetation.

**Figure 4.** Area (hectare) occupied by the three native vegetation types (forest in dark green, savanna in light green, and grassland in beige) in the Brazilian Cerrado over the time period of 1985–2017. Mha = million hectares; NV = native Cerrado vegetation.

The remnant NV is located mostly in the center and northwestern part of the biome, more specifically in Mato Grosso (19%), Tocantins (18%), Maranhão (15%), Minas Gerais (18%), and Goiás (12%) (Table S7; Figure S6). These states account for almost 80% of the remaining Cerrado NV in 2017. Other states such as Bahia, Mato Grosso do Sul, and Piauí accounted for 19% of the current NV area in the biome (Table S7). Mato Grosso was the state that presented the highest levels of NV lost in the past 33 years, accounting for one-fourth of the whole NV net loss in the Cerrado from 1985 to 2017 (this state contains approximately 17% of the Cerrado area). The NV type lost that lost the most in this period in the Mato Grosso was savanna, followed by forest (Figure S7). Other states with important net loss of NV were Goiás and Mato Grosso do Sul (17% each), followed by Tocantins (13%), Maranhão (10%), Minas Gerais (8%), Bahia (7%), and Piauí (2%) (Table S7). Most of the NV net loss in Mato Grosso occurred between 1995 and 2005 (Figure S8), a period with large scale expansion of soybean in this state [49]. Mato Grosso do Sul and Goiás faced considerable losses in the 1985–1995 time period, being considered older agriculture frontier. The most recent NV net loss is happening in the states of the Matopiba region (Maranhão, Tocantins, Piauí, and Bahia). These four states together represent 55% of the loss of NV that happened recently from 2005 to 2017, representing the new region of Cerrado where agriculture is growing at the expense of NV loss [11,50].

Only 14% of the remaining Cerrado NV is protected in indigenous lands and conservation units. Grasslands are the most protected NV types (22% of their total area), followed by forests (17%) and savannas (9%) (Table S8). In the states where the majority of the NV conversion has already taken place, the remnant NV is mostly located within protected areas (states of São Paulo, Mato Grosso do Sul, and Mato Grosso) (Figure 5).

**Figure 5.** Comparison between the first (1985) and last (2017) maps of the time series, showing the net loss of the three predominant native vegetation classes in relation to the total area of the Brazilian Cerrado. Mha = million hectares.

The distribution of the three NV classes and their net losses also varied depending on the regions of the Cerrado (Figure 6). Forests are dominant in the transition of the Cerrado to the Amazon biome, for example, in the northwestern portion of the states of Maranhão and western Mato Grosso. These two areas concentrated the largest NV losses. Savannas are currently concentrated mainly in the northeast of the biome (states of Tocantins, Bahia, and Piauí). The states of Mato Grosso, Mato Grosso do Sul, and Goiás presented the largest savanna losses along the time series. Grasslands are mainly clustered in the state of Tocantins and surrounding areas. Only a few tiles presented more than 50% of grassland loss over time.

**Figure 6.** Temporal distribution of (**A**) forest, (**B**) savanna, and (**C**) grassland per year and per tile from 1985 to 2017 in the Brazilian Cerrado. Tiles with a reduction of over 50% in each NV class over the time series are highlighted in red.

#### *3.3. Stability of the Cerrado NV Classes*

Forests were the vegetation type that presented the highest intra-annual map consistency, mostly concentrated in the northern portion of Maranhão state and western portion of Mato Grosso along the border with the Amazon biome (Figure 7; Table 4). Approximately 43% of NV areas mapped as forest from 1985 to 2017 were consistently mapped as forest from 27 to 33 times. Savanna was the second NV class in terms of temporal consistency, with 36% mapped consistently for more than 27 years over the biome. Concentration of savanna is found especially in Bahia and Piauí states. For grasslands, consistently mapped areas comprised 32% of the total grassland mapped, especially located in Tocantins and in the bordering Mato Grosso and Bahia states. Stable areas are primarily located in large fragments of NV (Figures 7 and 8).

**Figure 7.** Classification probability maps of three native vegetation classes (**A**—forest, **B**—savanna, and **C**—grassland) in the Brazilian Cerrado, indicating the levels of consistency (low: 1–11 years; medium: 12–26 years; and high: 27–33 years) observed for each class in the 1985–2017 period.

**Table 4.** Consistency of the native vegetation classes in terms of the number of years in which the pixels were classified as the same class. Three consistency categories were considered: low: 1–11 years; medium: 12–26 years; and high: 27–33 years.


The stability map of the final SDT classification indicated that 36% of the Cerrado is composed of stable NV (classified as the same NV class over the entire time series) (Figure 8). Unstable areas of NV, which varied among different types of NV class, occupy 7% of the biome. Meanwhile, 38% of the biome was covered by NV that was converted to other land cover classes (e.g., pasture and crop fields), and 19% of areas that were classified as non-NV throughout the time series. These results suggest that the majority of the standing vegetation in 2017 is stable in terms of NV type (65%), while 12% varied among NV classes (natural instability) and 23% was converted at some point in time to other non-NV classes and recovered to NV by the end of the period (anthropogenic instability; Table 5).

**Figure 8.** Analysis of stability of native vegetation (NV) and other non-NV classes in the Brazilian Cerrado from 1985 to 2017, presenting stable NV and non-NV areas, as well as the unstable areas that either changed to other NV classes or to other non-NV land cover classes.

**Table 5.** Out of the area classified as native vegetation (NV) in 2017, the total area that did not change class (stable), the total area that changed among other NV classes (natural instability), and the total area that had been previously converted (anthropogenic instability followed by NV recovery).


#### **4. Discussion**

#### *4.1. Innovative Machine Learning Approach to Map Temporal Dynamics of Cerrado Native Vegetation*

The combination of a priori EDT with STD classification was able to adequately discriminate the three main NV types from a heterogeneous and seasonal biome, which is the case of the Brazilian Cerrado, over the entire time series of the Landsat mission (1985–2017). The use of cloud computing and the GEE platform has allowed the generation and calibration of automatic or semi-automatic methods for continuously monitoring land use and land cover with a medium to high spatial and temporal resolution over decades [51,52]. This procedure is a milestone in terms of creating robust baseline reference maps that can be used to train machine learning algorithms for monitoring and detecting changes in NV classes in complex and highly seasonal landscapes. This is especially relevant in regions where reference maps for guiding times series classification are insufficient or completely lacking. This is especially important for savanna ecosystems such as the Cerrado, which are currently undergoing changes owing to severe pressure for conversion, as well as facing degradation related to climate change [6,10,11].

Even though the Landsat image collection does not represent the most adequate imagery for capturing intra-annual or phenological variability of the NV because of the 16-day interval of image acquisition, it has been commonly used to map changes in NV in highly complex and variable ecosystems [22,26,53]. In addition, the Landsat mission is the only one that currently provides a historical perspective of change within a period of at least three decades of imagery with medium to high spatial resolutions [31,32]. It can also provide a consistent measure of the inter-annual variability of NV over the entire Earth's surface. Some other initiatives have generated classification maps for the entire Brazilian Cerrado biome using Landsat free-archives (Table S1). The most widely used approaches in Brazil are the visual interpretation and traditional supervised classifications. These processes are very slow and costly, and classification is possible only in specific time periods [6,12,29,30] and was never applied to the entire Landsat series. Therefore, the generation of a semi-automatic method applied to long time series to discriminate the main types of NV present in the Cerrado biome with levels of accuracy compatible with existing initiatives was innovative and our main goal.

The results of this effort offer, to the scientific community, the first long-term annual dataset of NV (forest, savanna, and grassland) compatible with some of the few existing single-year NV maps for the biome. The average overall accuracy at Level 1 of 87% from 1985 to 2017 was similar to that obtained by Sano et al. [15] in their land cover map of the Cerrado for 2002 (90% overall accuracy), and by the TerraClass Cerrado Initiative for 2013 (80% overall accuracy). The distinction between NV classes from other land use types (Level 1 aggregation) presented an overall accuracy of 71%, which was the same as that reported by Sano et al. [15] (71% overall accuracy), but using much fewer points (315 field data validation points compared with 21,062 validation points used in this study) and only for a single year (2002). The accuracy increased from the beginning to the end of the time series, representing a gain of stability when we have a greater quality and quantity of images available.

The main challenges in distinguishing NV classes in the Cerrado include the misclassification between grasslands and planted pasture (ca. 8%), as well as the confusion between savannas and the other two NV types (forest and grasslands) [15]. Savannas are often misclassified as forests (omission error, ca. 39%), and grasslands are misclassified as savannas (commission error; ca. 46%) (Table S5). About 20% of pixels classified as forest were, in fact, savanna, while approximately 75% of the pixels classified as forest in the reference data were correctly classified (Table S5). Even though we had considerable errors of omission and commission at the edge of the NV types, this long time series allowed us to identify those areas where the stable vegetation is concentrated, and those where confusion among vegetation types is expected. This contributed greatly to the definition of the more unstable areas in the biome, and highlights where the map can and must be improved.

Moreover, the confusion among NV types occurs because they are distributed along a natural gradient of vegetation structure, with the savannas ranging from dense to very sparse woodlands dominated by grasses [7]. The study conducted by Ribeiro and Walter [7], or even the system of classification of vegetation proposed by the Brazilian Institute of Geography and Statistics (IBGE), subdivided the Cerrado into at least seven more detailed vegetation types, known as phytophysiognomies. The classification of these phytophysiognomies, which are defined by specific ranges of vegetation structure, is even more challenging. One way to improve this classification is by adding structural characteristics to the spectral components of the NV, such as height. Some studies have demonstrated the ability of LiDAR (Light Detection and Ranging) technology to differentiate between Cerrado vegetation types [54,55]. However, this time series represents the best comprehensive temporal data set on NV distribution in Cerrado.

#### *4.2. Temporal and Spatial Dynamics of the Cerrado Native Vegetation*

Up until 2017, remnant NV covered about 55% of the Brazilian Cerrado, although it was very unevenly distributed over the biome. While NV comprises 90% in the northern part of the biome, only 15% is left in its southern portions. The area classified as NV in 2017 included some areas of secondary NV. Although the approach used in this study generates temporally independent annual land cover maps that do not differentiate primary NV areas from regenerating areas, this proportion of remnant NV (55%) is compatible with those found in other studies [6,12].

Savannas still represent the predominant NV type in the Cerrado, occupying 52% of the NV area in 2017. Forests and grasslands occupy 36% and 22%, respectively. The prevalence of savannas is consistent with the results obtained by other authors [6,12]. This distribution is explained by the spatial configuration of the Cerrado NV, forming a gradient of mosaics related to a series of continuous environmental characteristics such as topography, soil, and climate [7].

Over the last three decades, the annual net loss rate of the Cerrado NV was 0.50% per year or 748,687 ha per year, which is 20% lower than the official annual gross deforestation estimates for the past 10 years (944,521 ha yr−1) [12]. This difference was expected as our concept of NV net loss includes NV areas that recovered after conversion, while gross deforestation estimates by the National Institute of Spatial Research (INPE) [12] do not include them. Beuchle et al. [23] reported a 0.60% net loss rate from 2000 to 2010 for the Cerrado biome, which is comparable to our observed average rate. Most of the net loss occurring in the first half of the time series (1985 to 2000) was in forests owing to cropland expansion.

The majority of the NV net loss occurred in the southern part of the Cerrado, in the states of Goiás, Mato Grosso do Sul, and Mato Grosso (Table S7), and mainly in the early stages of the time series, so that they are considered as older frontiers of agricultural expansion in the country. In these regions, most of the remaining NV mapped in 2017 is located within protected areas. The newest agriculture frontier in the Cerrado, responsible for large-scale deforestation, is located in the northern part of Cerrado, in a region known as MATOPIBA (southern portion of states of Maranhão and Piauí, north of Tocantins and west of Bahia) [11,56]. Native vegetation conversion to agricultural areas in the Cerrado biome is related to biophysical characteristics and infrastructure improvements [57].

Savannas and forests were the most affected NV types in terms of net loss due to the increase of pastures and agriculture in the biome along with the time series. Even though savannas are the most abundant vegetation type in the biome, forests faced most of the net loss over the time series. While savannas lost 11 Mha in 33 years, forests accounted for a comparable net loss in absolute terms, even though they represent two-thirds of the area of savanna. Grasslands correspond to the areas mostly unsuitable for agriculture usually owing to them being located over shallow and poor soils, and despite being mostly unprotected by law [57]. Detecting conversion of grasslands was quite difficult because of the spectral confusion between native grasslands and planted pastures, usually requiring observations of phenological timings [26]. These timings are easily missed by monitoring approaches that do not capture intra-annual changes, which is corroborated by the lower accuracies observed for this class.

#### *4.3. Stability of the Cerrado Native Vegetation Classes*

In a highly seasonal and heterogeneous biome, vegetation stability evaluation is a challenge that can be overcome with consistent long-term information. We managed to identify areas where specific types of NV are dominant, as well as areas with high instability in terms of NV mapping. This strategy demonstrated that the forests presented the highest consistency over time compared with savannas and grasslands in the Brazilian Cerrado. On the other hand, grasslands were the NV with the lowest mapping consistency over time. These patterns are expected and can be explained by the effects of seasonality on the spectral response of NV types dominated by grasses, which are more susceptible to drought [15,20]. These results also help to guide further improvements in classifying the unstable NV areas using either higher temporal and/or spatial resolution imagery.

The consistency maps can be also used to suggest whether the main drivers of instability are either natural (e.g., seasonality, fire) or anthropogenic (e.g., NV conversion to agriculture). In this way, we identified that 36% of the Cerrado is covered by NV that was mapped consistently over time as a specific NV class, so it can be considered as stable. Only 7% of the biome presented NV classes that were highly unstable in terms of mapping, probably associated with natural disturbances. Other areas of high NV instability were related to areas that, at some point in time, were converted to other land uses (e.g., agriculture and pasture), representing 38% of the biome, and were considered anthropogenic. This analysis suggests that, from 55% of the Cerrado area with NV, only 43% remained as NV during at least 33 years. When the stability map was masked by the 2017 NV map, it revealed that the majority (62%) of the NV types were stable over time, while 23% were in the process of recovery. This indicates that around 12% of the Cerrado biome is currently in the process of regeneration. These numbers are fundamental for assisting public policies towards Cerrado conservation.

#### **5. Conclusions**

This study was the first to apply a combination of EDT and SDT classification techniques for mapping different types of native vegetation in the Cerrado biome, using a long time series of satellite data (1985–2017). The use of Landsat archive and cloud computing freely available in the Google Earth Engine platform allowed the generation and calibration of a semi-automatic method for monitoring the spatiotemporal land cover dynamics over three decades. We managed to obtain annual-based estimates of the remnant NV in this biome dominated by a very heterogeneous landscape, marked seasonality, and intensive human land occupation. The results showed that approximately 45% of the Cerrado NV was converted into some type of land use and land occupation by 2017. The conversion is occurring rapidly, especially in the northern part of the biome where most of the remnant NV is found nowadays. This study also showed some areas where the NV appeared as a mosaic of grasslands, savannas, and forests, indicating a limitation of Landsat data to discriminate these three vegetation types properly, perhaps because of its moderate spatial resolution (30 m). Despite our promising results, discriminating different NV types in highly seasonal and heterogeneous ecosystems such as the Cerrado remains challenging, especially if the goal is to discriminate between phytophysiognomies rather than general classes. As a future investigation, we suggest including other satellite data such as the Sentinel-1 (radar) and Sentinel-2 (optical) images obtained with 10 m spatial resolution and repeat pass of five days, which are available on the Internet without cost.

**Supplementary Materials:** The following supplementary materials are available online at http://www.mdpi.com/ 2072-4292/12/6/924/s1, **Table S1.** Characteristics of the existing native vegetation maps covering the entire Brazilian Cerrado and produced by semi-automatic methods and visual interpretation of varying remote sensing data; **Table S2.** Scripts used in the initial and final statistical decision tree classification of the native Cerrado vegetation; **Table S3.** Spectral metrics, indexes, and other variables used to train the random forest SDT classification; **Table S4.** Rules of the temporal filters applied to classify native Cerrado vegetation. RG = General Rule; RP = First Year Rule; RU = Last Year Rule; FF = Forest; SV = Savanna; GL = Grassland; AG = Mosaic of Agriculture and Pasture; AR = Rocky Outcrop; WB = Water; and NO = Non-Observed; **Table S5.** Confusion matrix of NV classes at level 2. Other land cover classes include agriculture and pasture fields and non-vegetated areas. UA = user´s accuracy; PA = producer's accuracy; **Table S6.** Omission and commission errors of the 33 classified maps; **Table S7.** State-level

distribution of native vegetation (NV) and NV net loss from 1985 to 2017; **Table S8.** Distribution of the native Cerrado vegetation (NV) inside protected areas (indigenous lands and conservation units) from 1985 to 2017; **Figure S1.** Mean monthly precipitation and normalized difference vegetation index (NDVI) data from forest and savanna vegetation. Gray rectangle corresponds to the six-month optimum period for deriving the Landsat mosaic of the Brazilian Cerrado; **Figure S2.** Boxplots representing the distribution of the dominant land cover classes of the Brazilian Cerrado over six spectral metrics included in the empirical decision tree classification. GVS = green vegetation and shade; GV = green vegetation; NPV = non-photosynthetic vegetation; SEFI = savanna ecosystem fraction index; **Figure S3.** Location of 21,000 sampling points used for accuracy assessment. Other = water, non-vegetated, agriculture, and pasture fields; **Figure S4.** Overall accuracy for the three classifications (empirical decision tree (EDT), initial statistical decision tree (SDT), and final SDT), considering the aggregated legend; **Figure S5.** State-level proportion of native vegetation (NV) in 2017, other non-NV land cover types in 2017, and NV net loss from 1985 to 2017. MS—Mato Grosso do Sul, GO—Goiás, MT—Mato Grosso, TO—Tocantins, MA—Maranhão, MG—Minas Gerais, BA—Bahia, PI—Piauí, SP—São Paulo, and DF—Federal District; **Figure S6.** State-level net loss of native vegetation (NV) from 1985 to 2017. Mha = million hectares. See Figure S6 for state identification; **Figure S7.** State-level native vegetation (NV) net loss in three different periods: 1985–1995, 1995–2005, and 2005–2017. M ha = million hectares. See Figure S6 for state identification.

**Author Contributions:** Conceptualization, A.A. and J.Z.S.; Methodology, A.A., M.R., F.L., C.B.M., and J.Z.S.; Validation, A.A., F.L., C.B.M., I.C., I.A., and J.Z.S.; Formal analysis, A.A., F.L., C.B.M., B.Z., I.C., V.A., and J.Z.S.; Investigation, A.A., F.L., C.B.M., V.A., J.P.F.M.R., J.Z.S., I.C., I.A., M.M.C.B., E.E.S., M.B., V.R., and V.P.; Writing—original draft preparation, A.A., J.Z.S., and B.Z.; Writing—review and editing, A.A., J.Z.S., F.L., B.Z., C.B.M., M.M.C.B., M.R., E.E.S., and M.B.; Visualization, A.A., B.Z., F.L., C.B.M., V.A., J.P.F., M.R., V.V., and I.C.; Supervision, A.A.; Project administration, A.A. and J.Z.S.; Funding acquisition, A.A., J.Z.S., and M.B. All authors have read and agreed with the published version of the manuscript.

**Funding:** This research was conducted by the MapBiomas Initiative. It was financed by the Moore Foundation, with the support of The Nature Conservancy, as well as the UK Space Agency's International Partnership Programme (IPP) under the Global Challenge Research Fund (GCRF), with the support of Ecometrica.

**Acknowledgments:** The authors would like to thank the MapBiomas team and Google Earth Engine for providing access to the Landsat cloud processing. The authors are grateful for the field plot locations and accuracy analysis from University of Brasília and Federal University of Goiás/LAPIG, respectively.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study, in the collection, analysis, or interpretation; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Environmental Drivers of Water Use for Caatinga Woody Plant Species: Combining Remote Sensing Phenology and Sap Flow Measurements**

**Rennan A. Paloschi 1,\*, Desirée Marques Ramos 2, Dione J. Ventura 1, Rodolfo Souza 3, Eduardo Souza 4, Leonor Patrícia Cerdeira Morellato 2, Rodolfo L. B. Nóbrega 5, Ítalo Antônio Cotta Coutinho 6, Anne Verhoef 7, Thales Sehn Körting <sup>1</sup> and Laura De Simone Borma <sup>1</sup>**

	- <sup>3</sup> Department of Biological and Agricultural Engineering, Texas A&M University, College Station, TX 77843, USA; rodolfo.souza@tamu.edu
	- <sup>4</sup> Academic Unit of Serra Talhada, Federal Rural University of Pernambuco, Serra Talhada, PE 52171-900, Brazil; eduardo.ssouza@ufrpe.br
	- <sup>5</sup> Department of Life Sciences, Imperial College London, Buckhurst Road, Ascot SL5 7PY, UK; r.nobrega@imperial.ac.uk
	- <sup>6</sup> Department of Biology, Federal University of Ceara, Fortaleza, FL 33612, Brazil; italo.coutinho@ufc.br
	- <sup>7</sup> Department of Geography and Environmental Science, The University of Reading, Reading RG6 6AR, UK; a.verhoef@reading.ac.uk
	- **\*** Correspondence: rennan.paloschi@inpe.br

**Abstract:** We investigated the water use of Caatinga vegetation, the largest seasonally dry forest in South America. We identified and analysed the environmental phenological drivers in woody species and their relationship with transpiration. To monitor the phenological evolution, we used remote sensing indices at different spatial and temporal scales: normalized difference vegetation index (NDVI), soil adjusted vegetation index (SAVI), and green chromatic coordinate (GCC). To represent the phenology, we used the GCC extracted from in-situ automated digital camera images; indices calculated based on sensors included NDVI, SAVI and GCC from Sentinel-2A and B satellites images, and NDVI products MYD13Q1 and MOD13Q1 from a moderate-resolution imaging spectroradiometer (MODIS). Environmental drivers included continuously monitored rainfall, air temperature, soil moisture, net radiation, and vapour pressure deficit. To monitor soil water status and vegetation water use, we installed soil moisture sensors along three soil profiles and sap flow sensors for five plant species. Our study demonstrated that the near-surface GCC data played an important role in permitting individual monitoring of species, whereas the species' sap flow data correlated better with NDVI, SAVI, and GCC than with species' near-surface GCC. The wood density appeared to affect the transpiration cessation times in the dry season, given that species with the lowest wood density reach negligible values of transpiration earlier in the season than those with high woody density. Our results show that soil water availability was the main limiting factor for transpiration during more than 80% of the year, and that both the phenological response and water use are directly related to water availability when relative saturation of the soil profile fell below 0.25.

**Keywords:** plant water availability; tree phenology; phenocams; Sentinel-2; MODIS

#### **1. Introduction**

A better understanding of plant water availability and water use is of great importance for reliable assessments of ecosystem's resilience to droughts [1]. Water availability is critical for plant growth, inducing phenological transitions and, ultimately, plant survival.

**Citation:** Paloschi, R.A.; Ramos, D.M.; Ventura, D.J.; Souza, R.; Souza, E.; Morellato, L.P.C., Nóbrega, R.L.B.; Coutinho, Í.A.C.; Verhoef, A.; Körting, S.; et al. Environmental Drivers of Water Use for Caatinga Woody Plant Species: Combining Remote Sensing Phenology and Sap Flow Measurements. *Remote Sens.* **2021**, *13*, 75. https://doi.org/ 10.3390/rs13010075

Received: 31 October 2020 Accepted: 16 December 2020 Published: 28 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Additionally, information on plant water balance also allows the development of more realistic soil water assessment tools and land surface models, which is a widely acknowledged requirement for research related to the plant–soil–atmosphere continuum [2].

Plant species have their own specific adaptive mechanisms to cope with droughts [3,4], which are particularly important in water-limited ecosystems such as seasonally dry tropical forests (SDTFs).

The SDTF is a unique biome that occurs in low-latitude areas of fertile soils with a low annual precipitation (ranging from about 250 to 1500 mm year−1) and a prolonged dry season that extends for five to six months [5–7]. The vegetation ranges from tall forests with closed canopies to scrublands rich in succulents and thorn-bearing plants [7,8]. The long water-limited period has been shown to be selective for deciduous, thorny, and succulent plant species that show a marked leaf senescence during the dry season followed by synchronous leaf flushing at the beginning of the rainy season [3,4,7–9]. The deciduity and absence of a grassy layer are important characteristics that distinguish SDTFs from other mild seasonally dry tropical biomes such as the savannas [8].

In South America, SDTFs are largely represented by the Caatinga vegetation, the predominant vegetation in northeast Brazil and located in a semi-arid region that extends over 800.000 km<sup>2</sup> [10–12]. The climate is predominantly hot and dry, with temperatures around 26 ◦C and high evapotranspiration (1500 to 2000 mm year−1) coupled with low annual rainfall (400 to 800 mm year<sup>−</sup>1) [11,13]. This results in high water deficits, which are aggravated by the short rainy season that usually lasts three to five months and show erratic rain episodes [10,13]. The water deficit is even more severe during catastrophic severe droughts that may last three to five years [11].

For most Caatinga plant species, vegetative and reproductive structures develop exclusively during the rainy season [3,14]. In fact, the fast metabolic response, which is unique to the Caatinga, allows species to produce a synchronous leaf flush with the onset of the rainy season [3,14].

Caatinga tree species are able to cope with long periods of drought by exhibiting different transpiration-reducing leaf morphological and photosynthetic characteristics. They also display a considerable variability in wood density and related water storage properties [3,14]. Lima et al (2012) show a high correlation between wood density and water storage in Caatinga vegetation and relate these characteristics to a seasonal phenology. Several studies have shown that high wood density and thick cell walls tend to protect plants from cavitation but, in general, both are associated with low water storage in stem tissues [15–19]. On the other hand, low wood density and thin cell walls increase the chances of cavitation but water storage may be higher and the maintenance of a high water potential may be prolonged [15–17]. The combination of these diverse plant traits produce different phenological evolutions throughout the growing season that depend on different plant water use strategies [18–20].

Eddy covariance (EC) and remote sensing (RS) techniques have been used to evaluate terrestrial environments because they allow vegetation systems to be studied at different scales: from local to global scales. The use of RS is gaining importance as satellites provide low-cost products and images with increasing spatial and temporal resolutions. The advances in technology have allowed several RS vegetation indices (VIs) to be used, improving our understanding of temporal and spatial changes in plant communities [21–23]. The direct relationship between water use and phenological response observed by RS has been used in models that seek to estimate evapotranspiration (ET) and gross primary production [24], mainly through the use of empirical or semi-empirical parameterisations, taking advantage of vegetation indices such as normalized difference vegetation index (NDVI) [25]. This approach is particularly successful in the Brazilian semi-arid regions due to the strong synchronicity between phenology and plant-water availability [3].

An alternative approach to monitor vegetation phenology is near-surface RS, which uses in-situ automated cameras (phenocams) [26]. This technique, in combination with EC measurements and remote sensing, has been applied in Brazilian SDTFs [4] to investigate

the drivers that regulate phenological patterns and vegetation responses to seasonal and severe droughts.

Regarding water use, EC and RS can be used to estimate ET [25], a flux that encompasses processes other than transpiration, such as evaporation of soil water, canopyintercepted water, and other free water surfaces. However, RS generally provides discontinuous time series of ET (e.g., due to cloudy conditions) [27]. Moreover, none of these tools allow direct measurements of transpiration at the individual tree or species level. Hence, these techniques cannot explain species-specific strategies that are strongly dependent on individual structural, phenological, and physiological traits. For such, in-situ or near-surface methods make monitoring possible.

Continuous in-situ estimation of transpiration is possible with sap flow sensors. Among the methods used to estimate sap flow, thermal dissipation probes (TDP) have been widely used and improved throughout the years [28]. Information on transpiration is essential for studies related to plant water use, including those in SDTFs [29,30]. Sap flow measurement techniques can provide time series of diurnal and seasonal water fluxes to estimate plant water use. The TDP method, suggested by Granier [31,32], is widely applied to estimate tree transpiration because the sensor system is simple to construct, easy to install, and relatively inexpensive [33,34].

When analysing time series and patterns of water use, vapor pressure deficit (VPD) has been considered as the main environmental driver of plant transpiration under most conditions, representing the atmospheric demand Grossiord 2019 [35]. performed continuouslylogged sap flow monitoring experiments in humid tropical forests and observed that sap flow was highly dependent on VPD. However, for the less humid regions studied, very high VPD led to a reduction in transpiration as a result of the strong negative dependence of stomatal opening on VPD Grossiord et al. Similarly, Butz et al (2018) monitored sap flow of both deciduous and non-deciduous species and showed that evergreen species have a stronger relationship with seasonal VPD variation than deciduous ones, and a slightly higher dependence of sap flow on soil moisture than on VPD. Overall, these studies indicate that in more water-limited systems, species will be more dependent on soil water availability.

Here, we combine several datasets and approaches, as well as ground truth data, in order to test the efficiency of remote sensing data to monitor a significant physiological process such as water use. We aim to investigate the relationship between plant water use and phenology for Caatinga woody plant species at both the species and community level using different scales: moderate RS spatial resolution, high RS spatial resolution, and nearsurface cameras. In addition, we intend to evaluate the environmental drivers and factors that limit plant phenology (leaf flush and senescence) and transpiration. We hypothesize that: (1) phenology at the community level (represented by RS data) and at the species level (represented by near surface remote sensing data) are fully explained by soil water availability; (2) RS data for the Caatinga vegetation are strongly correlated to variations in water use strategies among species; (3) variations in the sap flow signal are fully explained by changes in water availability.

#### **2. Materials and Methods**

#### *2.1. Study Site*

The study site (Figure 1) is located in central-northern part of the Brazilian state of Pernambuco, in the municipality of Serra Talhada (7◦58'12" S, 38◦23'06" W, 455 m a.s.l.). The soils are predominantly Aridisols Argid and Entisols orthents [36]. The area has a relatively smooth relief and comprises various xerophilous vegetation types [37]. The local climate is Bsh (arid-steppe-hot arid) according to the Köppen system in a transition region to the Aw (winter dry season) [38–40], with a mean annual rainfall of 680 mm yr−<sup>1</sup> and a mean temperature of 23.8 ◦C [40]. The experimental site includes a 50 × 100 m monitoring plot (Figure 1) which is part of the Nordeste project plot network [41], and a 10 m tall eddy

covariance (EC) flux tower (site ID BR-CST in AmeriFlux network; 7.9682◦ S, 38.3842◦ E) located 200 m away from the monitoring plot.

**Figure 1.** Study site: Map of the study area showing eddy covariance (EC) flux tower location (where the phenocam is installed), and the monitoring plot (where sap flow and soil moisture sensors were installed); location of the study site in the context of the Brazilian Caatinga. Spatial resolution: 15 cm.

Our study was carried out from April 2018 to March 2019, during which the in-situ pluviometer registered a pronounced rainy season from December to April (total precipitation of 499 mm), whereas the dry period lasted from May to November (total precipitation of 40 mm). Regarding the total rainfall in April, which marked the beginning of our experiment, that month was considered the wet-to-dry transition period. These precipitation values and their timing are consistent with the normal climate found for this region [40]. Therefore, the results found in this work can be extended to other years with normal rainfall.

#### *2.2. Experimental Strategy*

The experimental design is shown in the flowchart (Figure 2). In terms of ground data, the experimental strategy consists in: (i) measuring the sap flux density (F*d*) on individual tree species found within the monitoring plot and normalizing it among the sampled species (F*dn*); (ii) upscaling the normalized flux density (F*dn*) to the level of the plot (or community; F*dnc*); (iii) comparing the F*dn* with the soil water availability. The purpose of these measurements was to evaluate the main drivers affecting the physiological response of Caatinga trees (i.e., the plant-water availability and VPD). To test the relationship between plant physiology patterns (sap flow density) and leaf phenology, we used VIs obtained from different sources. Besides being used as a proxy of leaf phenology, VIs from different sources were also evaluated in terms of their ability to represent certain physiological process, such as intra-annual variability of sap flow.

**Figure 2.** Flowchart of the experimental design.

#### 2.2.1. Plant Species

A total of 24 species were found in the plot, of which 11 showed a dominance over 1.0: *Anadenanthera colubrina* (Vell.) Brenan, *Aspidosperma pyrifolium* Mart. & Zucc, *Astronium urundeuva* Engl., *Cenostigma nordestinum* Gagnon & G.P.Lewis, *Cereus jamacaru* DC., *Commiphora leptophloeos* (Mart.) J.B. Gillett, *Croton echioideus* Baill., *Erythroxylum pungens* O.E.Schulz, *Mimosa acutistipula* Benth. *Piptadenia flava* Benth. and *Senegalia polyphylla* (DC.) Britton & Rose.

We selected five species to monitor sap flow density, which accounted for 80% of the total dominance: *C. leptophloeos*; *A. colubrina*; *S. polyphylla* and; *A. pyrifolium*; *C. nordestinum*. All selected species are deciduous and were selected based on their abundance registered in the inventory of the monitoring plot and the Nordeste project plot network [41], and based on their relative abundance, as indicated in Table 1.

**Table 1.** Tree species monitored, wood density (*Wd*, g·cm−3), mean diameter at breast height (DBH) mean species height (H.), dominance in the plot (D.), estimated sum of the sapwood area per species per species, percentage of the sapwood area in relation to the total sapwood area in the plot.


*Wd* values obtained from [3].

All plant specimens were collected and identified by botanists at the Herbarium of the State University of Feira de Santana (HUEFS) Bahia, Brazil. The inventory included all plant individuals with a diameter at breast height (DBH) over 5 cm; the average height of trees in the plot was 5.2 m. Abundance of each species was determined by establishing the frequency of individuals of a given species within the monitoring plot.

We estimated sapwood depth using the linear regression given by 0.377 \* DBH −1.075, R<sup>2</sup> of 0.67 (*p* < 0.05 for both coefficients). Additionally, we estimated the total sapwood area for all individuals of a given species, calculated the sapwood area of each species, and estimated the percentage of the total active xylem area for each species Table 1.

#### 2.2.2. Ground Measurements

Sap flux density (F*d*,gm−<sup>2</sup> min−1) for the selected individuals was measured using Granier's TDP method (see Supplementary Materials, Section S1C). The sap flux density at the individual level (F*d*) was first normalized, yielding F*dn*, and then scaled-up to the community level, F*dnc*, considering the relative sapwood area of each species (Table 1). At the community level, F*dnc* was calculated as the weighted average considering the relative contribution of the sapwood area for each species. The environmental drivers and plant traits used in the sap flow and phenology analysis are described in Table 2. The environmental drivers and plant traits used in the sap flow and phenology analysis were: vapour-pressure deficit (VPD) and plant water availability—expressed in terms of soil water content (S*w*) and relative soil water saturation (*Se*,*pro f* ; Table 2). S*<sup>w</sup>* is the amount of water for the total soil profile of 77.5 cm. *Se*,*pro f* is a fraction between; dependent on the actual profile soil moisture, *θpro f* , and the difference between the saturated profile moisture content, *θs*,*pro f* , and residual profile moisture content, *θr*,*pro f* (see Supplementary Materials Equation (S2)). For more details, see Supplementary Materials, as indicated in Table 2.


**Table 2.** Summary of measured or derived environmental variables: category of the variable; name; description; units; section number. See the Supplementary Material for more details.

#### 2.2.3. Remote Sensing Data

Information on land surface phenology and vegetation indexes (VIs) was obtained from remote sensing (RS) sensors which operate at different spatial and temporal scales: remote sensors MODIS/Terra + Aqua (250 m products MYD13Q1 and MOD13Q1) and MSI/Sentinel-2A + B (10 m), and near-surface remotely sensed data, obtained by one phenocam (Mobotix AG-Germany) installed at the top of the EC flux tower (see Supplementary Materials, Section S1A). The VIs derived from these RS products are NDVI (MODIS and Sentinel-2A), GCC (Sentinel-2A and phenocam) and SAVI (Sentinel-2A), as presented in Table 3.

Because of the rapid dynamics of the Caatinga vegetation, which responds to rain and soil water content within a few days, a short revisit time satellite was necessary. The Sentinel-2 satellite was chosen because of its spatial resolution (10 m) and mainly because of its revisit time (3–10 days). The MODIS sensor was chosen for comparison purposes. Despite its moderate spatial resolution (250 m) considering the dimensions of our plot, the high revisit frequency and well known product quality provides a simple and effective way to compare the performance of the Sentinel-2 index. Since our objective is to analyze forest change over time, comparing the variables with a product resulting from a very low revisit time is essential (Terra and Aqua satellites cross every day, morning and afternoon, providing temporal VI products with much less cloud probability). The phenocam provided the field truth for the region and also enabled us to evaluate the phenology of each species separately.

**Table 3.** Remote Sensing data: Variable; Sensor/Satellite, spatial resolution; temporal resolution; respective section. Please see the Supplementary Materials for more details.


\* pixel composition from daily image acquisitions; \*\* camera spatial resolution depends on the distance from camera's nadir.

The proportion of canopy coverage in relation to soil exposure is also seen by Sentinel-2 and MODIS. The coverage was considered spatially homogeneous since no significant differences (*p* > 0.5) were found between the temporal behavior of the sentinel individual pixels and individual species. This was due to the proximity of trees from different species, sometimes less than two meters, but also due to the spatial autocorrelation of the pixels. Each pixel influences its neighborhood [42], reducing the effective pixel resolution. It is worth mentioning that the spatial miss-registration in the Sentinel-2 images series vary by around 12 m (more than one pixel) but can be greater than 3 pixels according to [43]. For all these reasons, it was not possible to analyze the temporal behavior of Sentinel-2 pixels individually. Thus, a temporal profile was extracted considering the pixel mean of the monitoring plot area.

Regarding the phenocam images, the regions of interest (ROIs) were selected at the species and at the community level. At the species level, we selected pixels corresponding to the tree crowns of the four species found within the phenocam's field of view (Figure 3). *S. polyphylla*, which represents only 1.7% of the total sapwood area, was not present within the camera's field of view. We then extracted the GCC for each ROI and calculated the mean GCC for each species (GCC*ns*,*e*. At the community level, we selected the ROI pixels corresponding to the entire vegetated area in the image, excluding bare soil and the tower (Figure 4). The community ROI includes trees, shrubs, and eventual forbs if present. The GCC was extracted from the community's ROI. Conservatively, the same ROI was used for the community during the entire study period, including the dry and rainy seasons. Figure 4 shows the contrast between the seasons: the fully developed tree crowns in the rainy season and the leafless crowns during the dry season.

#### *2.3. Data Analysis*

In order to establish relationships between plant water use and environmental drivers, we compared the temporal variability of F*dnc* with the environmental data obtained from the micrometeorological flux tower (i.e., VPD, R*<sup>n</sup>* and T*air*) and with the relative saturation, *Se*,*pro f* . To identify the relationship at the community level, we used the same approach to compare F*dnc* with NDVI (MODIS and Sentinel-2) and GCC*S*2.

The VIs (GCC, SAVI, and NDVI) used to compare with the results obtained from the phenocams (GCC*ns*) were calculated for the tower region; we assumed that the tower region consisted of a circle around the flux tower with a radius of 50 m, which is compatible with the camera field of view. The VIs (Table 3) used to compare with the sap flow (F*dn* and F*dnc*) data and with *Se*,*pro f* were calculated using the pixels that covered the monitoring plot area (Figure 1).

To compare the VIs obtained from different scales, we used the transition dates of start (SOS) and end (EOS) of the growing season. The length of the growing season and associated phenological transitions, SOS and EOS [21], were calculated from the phenocam GCC and the RS vegetation indices GCC, SAVI, and NDVI based on Alberton et al. 2019.

**Figure 3.** Example of a hemispherical image used for the proximal remote sensing phenological analyses at the study site, PE-Brazil. The areas selected in white represent tree crowns of individuals from four species: A, *Anadenanthera colubrina*; B, *Aspidosperma pyrifolium*; C, *Cenostigma nordestinum* and D, *Commiphora leptophloeos*.

**Figure 4.** Hemispherical image used for the proximal remote sensing phenological analyses at the study site, PE-Brazil; (**A**) In the rainy season day of year (DOY) 107, 2018. (**B**) In the dry season DOY 256, 2018; the area selected in white represents the entire vegetation sampled as the region of interest (ROI) of the community for the extraction of phenological indices.

This method uses the confidence intervals of curve derivatives to identify changes in phenology [4]. First, we fitted a generalized additive model (GAM) using the species and community GCC as the response variable fitted by the sequence of consecutive days (time) of the phenological observations, which was used as an independent smoother variable to produce the phenological curves. We then calculated which regions of the curve had derivatives and detected when the derivatives were increasing (to determine SOS) or decreasing (to determine EOS) [4]. We considered SOS as the first day detected on the derivative at the left side of the GCC curve and EOS as the last day of the derivative at the right side of the curve. The transition dates calculated from the phenocam GCC were verified by visual inspection of the digital images and adjusted when necessary (for more details see Supplementary Materials Section S1D).

In addition, to evaluate whether variations in wood density could explain differences in water use and transition dates, we assessed species' wood density (*Wd*, g·cm<sup>−</sup>3) and the difference between F*dn* time series and near-surface GCC for each species.

Finally, the F*dnc* data were compared with the VI that showed the greatest correlation with F*dnc*. Thus, we compared NDVI indices (NDVI*modis* and NDVI*S*2) with *Se*,*pro f* using scatter plots in order to explicitly assess the dependence between variables across their data range.

#### **3. Results**

#### *3.1. Hydroclimatological Drivers of Sap Flux Density*

Figure 5a shows that the net radiation, R*n*, was relatively low between April and August. This is part of the dry and cooler period where shortwave and longwave incoming radiation are lower than during the wet period, while albedo and longwave outgoing radiation are relatively more frequent, as previously observed in the Caatinga [22,44]. R*n* remained approximately constant after September. The ET, shown in Figure 5b, varies between 0 and 125 mm month−1, and the lowest values were found between August and November, when rainfall, (P) was negligible (see Figure 5a). Despite the low P in May and June, ET is still considerable because of the total water stored (S*w*) in the soil profile (see Figure 5c), which allows transpiration to continue. It is worth noting that for leaves that are well-coupled to the atmosphere (i.e., with open stomata), the transpiration at the leaf level can be approximated by the product of the leaf-to-air VPD and the stomatal conductance according to Fick's law of diffusion [45]. However, in this region, our results show that intra-annual ET was inversely related to VPD (*p* < 0.05) (Figure 5b), most likely because, despite the higher VPD, plant species are mostly leafless from September to November.

The ET is in phase with S*w*. The latter varied from a maximum of 184 mm at the beginning of April to a minimum of 74 mm at the end of November. However, after September S*w* remained fairly constant (approximately 75 mm) until November when the rains started (Figure 5c). The highest air temperatures, T*air*, coincided with the lowest values of ET, underlining the harsh climatic conditions during the dry season. T*air* varied between 24 ◦C in July and 28.5 ◦C in November.

The community-level sap flow, F*dnc* (Figure 5d), is in phase with S*<sup>w</sup>* and ET. However, ET increased rapidly during the first rainy month, while F*dnc* increased gradually, suggesting that a large part of the ET recorded during the first rainy month may be due to the evaporation of intercepted water and/or to soil water evaporation. Furthermore, the F*dnc* varies in synchrony with the vegetation indices GCC*S*2, NDVI*S*2, and NDVI*modis* (Supplementary Materials, Figure S2).

**Figure 5.** Climate variables, soil water availability, and community-level sap flow: (**a**) Net Radiation (R*n*,Wm−2) and rainfall (P, mm month−1); (**b**) vapour pressure deficit (VPD, hPa) and EC evapotranspiration (ET, mm month<sup>−</sup>1); (**c**) soil water storage (S*w*, mm) and monthly mean air temperature (T*air*; ◦C); (**d**) Normalized sap flow for the plot community (F*dnc*, dimensionless).

#### *3.2. Transpiration, Soil Water Condition, and Remote Sensing Data*

The normalized sap flux density at the species level, F*dn*, was strongly affected by soil conditions when *Se*,*pro f* was below 0.25 (Figure 6), indicating the high dependency of transpiration on stored water and demonstrating the threshold when plant water stress starts to limit transpiration. GCC and F*dn* increased rapidly after the first rain, even with low water availability (*Se*,*pro f* around 0.18), for all species except *A. pyrifolium*. After December 2019, the sap flow signal varied reasonably, but still followed the evolution of *Se*,*pro f* . This is due in part to climate variations, but also to the fact that a number of sensors failed during this period. Thus, the average F*dn* for certain months was not based on all four sap flow sensors. All individuals from the studied species had a fully developed canopy at the beginning of the sap flow measurements. Therefore, our study period included two transition dates, the EOS of the ongoing growing season (2017–2018) and the SOS of the second growing season (2018–2019). Moreover, leaf fall still occurred after the EOS in the rainy season in 2018, and a SOS with new leaf flushing was detected at the beginning of the next rainy season in December of 2019. All species showed a negative correlation to daily VPD since all are deciduous and lost their leaves, halting transpiration precisely in the periods when VPD was highest.

Despite their approximate correspondence (Figure 6), as both variables exhibit a strong seasonality, the correlation between F*dn* and GCC obtained from the proximal RS was poor (see Table 4, last column; the highest correlation (0.51) was found for *A. colubrina* and *C. nordestinum*, species with high wood density). In fact, F*dn* and F*dnc* have a stronger relationship with all RS indices, especially with NDVI (Table 4). NDVI*modis* data better explained the variation in F*dnc* than NDVI*S*2, Table 4 shows a Pearson correlation coefficient of 0.92 (*p* < 0.01) versus a value of 0.88 (*p* < 0.01).

**Figure 6.** (**a**) Rainfall (mm day−1) and soil water profile relative saturation (*Se*,*pro f* , dimensionless); and normalized near surface species GCC (GCC*ns*,*e*, dimensionless) , and species averaged sap flow density (F*dn*, dimensionless), together with *Se*,*pro f* for (**b**) *C. nordestinum* and *Se*,*pro f* ; (**c**) *S. polyphylla*; (**d**) *A. colubrina*; (**e**) *C. leptophloeos*; (**f**) *A. pyrifolium*.


**Table 4.** Pearson correlation coefficient between F*dn* and NDVI*modis*, NDVI*S*2, GCC*S*2, SAVI*S*<sup>2</sup> and GCC*ns*, respectively.

obs.: \* = *p* < 0.5.

The transition dates (in format DD/MM/YY) calculated for each species were the following: *A. pyrifolium*: EOS = 10/08/18, SOS = 10/01/19; *C. nordestinum*: EOS = 05/08/18, SOS = 08/12/18; *C. leptophloeos*: EOS = 12/06/18, SOS = 11/12/18; *A. colubrina*: EOS = 03/08/18, SOS = 07/12/18. Based on visual inspections of the images, we found that during the period between EOS and SOS, all individuals of the four species were leafless. The development dates for the tower region estimated by the sensors (Sentinel-2 and MODIS) can be seen in Table 5. We found that NDVI*modis* was the index with the highest correlation to species' water use, and its transition dates were the most similar to the GCC*ns* results (Table 5).

**Table 5.** VI Transition dates (day of year/year) and bias in relation to near-surface GCC.


Both the seasonal evolution of transpiration and the phenological response of the vegetation show an asymptotic limit in relation to *Se*,*pro f* (Figure 7a,b), most likely because we are using *θs*,*pro f* and *θr*,*pro f* to normalise the data (please see the Supplementary Materials for an explanation of these parameters and of *θpwp* and *θ f c* used below). During a considerable part of the year, the moisture content of the soil layers that contain most of the roots are below wilting point (*θpwp*), yet above the residual moisture content (*θ f c*). Additionally, not all of the water stored in the soil pores is used for transpiration, as part of it is stored below the root zone or lost to soil evaporation.

**Figure 7.** Scatter plots of (**a**) community level normalized sap flow (F*dnc*) versus *Spro f* (dimensionless), (**b**) NDVI*modis* and NDVI*S*<sup>2</sup> versus *Spro f* , (**c**) NDVI*modis* versus F*dnc*, (**d**) NDVI*S*<sup>2</sup> versus F*dnc*.

#### **4. Discussion**

This study explored the sensitivity of remotely sensed indices to measure water use in the Caatinga. Our findings indicate that (i) the water use is highly dependent on soil moisture conditions; (ii) differences in wood characteristics seem to directly affect differences in when species sap flow ceases; (iii) with increasing resolution, from satellites with low resolutions to cameras that allow monitoring of individual tree species, new sources of uncertainties, previously masked by low spatial resolution, reduce correlations between RS indices and individual and community measurements of water use. Notwithstanding, the high resolution of the cameras has the benefit of allowing isolated events, such as the SOS and EOS, to be detected for different species.

It is remarkable how strongly F*dnc* is related to *Se*,*pro f* , especially when considering daily values (Figure 6), which shows relative saturation together with rainfall, as well as greenness and sap flow per species. The strong seasonality, reflected in the highly variable sap flow values observed for all monitored Caatinga plant species, is evident and in agreement with previous studies [3,4,46]. Seasonality is particularly evident with regards to the timing of leaf shedding (EOS) and the drop in soil water storage (S*w*) (Figure 5d). Moreover, the fact that seasonally varying values of F*dnc* are only very weakly and negatively related to changes in VPD corroborates the idea that this vegetation is relatively independent of the atmospheric demand when considering intra-annual variation. This is not the case for the diurnal variation of sap flow (data not shown), which remains dependent on VPD as shown by Butz et al. (2018). These findings most likely reflect the fact that we considered a single value of F*dnc* and VPD for each day, thus the diurnal dependence (where sap flow increases as VPD increases) between these variables was not considered. The same effect occurs between the intra-annual variation of R*<sup>n</sup>* and F*dn* (Figure 5).

The inverse relationship between VPD and ET is also partially caused by the fact that, during the dry season, all trees shed their leaves and water uptake reduces to near-zero, despite the increase in VPD. Furthermore, during leaf-on periods with low atmospheric humidity, it is the strong negative dependence of stomatal opening on VPD that counteracts the concurrent increase in driving force, as mentioned in the introduction. Although the VPD values tend to be lower during the rainy season, thereby reducing potential transpiration, ET is high due to an increased water availability and related presence of leaves. Moreover, there is a strong relationship between VPD and T*air* (Figure 5c), given the temperature depends on the saturated vapour pressure as calculated by Teten's formula [47]; around 57% of the VPD variation is caused by the variation in T*air*.

F*dnc* appears to be independent of *Se*,*pro f* when it reaches values higher than 0.25 (for reasons explained in the Results section, see Figure 7b). When *Se*,*pro f* was below 0.25, there was a clear reduction in sap flow, which became even more pronounced with values below 0.18. When *Se*,*pro f* values were around 0.16, F*dnc* became negligible, and NDVI reached its minimum (0.34 for NDVI*modis* and 0.26 NDVI*S*2), indicating a minimum NDVI for the monitored Caatinga species. Although F*dnc* was poorly correlated to water availability when *Se*,*pro f* values were >0.25 (indicating that the lack of radiation is a stronger driver), values of *Se*,*pro f* remained below 0.25 for more than 80% of the monitored period, indicating that for at least 80% of the year, water availability was the dominant environmental driver of transpiration. This dependence of F*dnc* on *Se*,*pro f* indirectly explains the high correlation between F*dnc* and NDVI, as shown in Table 4, since NDVI also strongly depends on *Se*,*pro f* (Figure 7b).

In the dry season, when F*dn* was approaching 0, for four of the monitored species (Figure 6) the timing difference seems to be due to differences in wood density, *Wd*. However, for the species with high F*dn*, this difference is subtle. As can be surmised from Figure 6, the order in which F*dn* approached 0 was inversely correlated with the order of *Wd*: *C. leptophloeos* (0.28 g·cm−3) at the end of April; *A. pyrifolium* (0.54 g·cm−3) at the end of June; *A. colubrina* (0.59 g·cm−3) in mid July; *C. nordestinum* (0.66 g·cm−3) in mid to late July. Although this study only compared the *Wd* of four species, our results indicate that low *Wd* is a limiting factor for these plants, particularly considering the duration in which

they can maintain foliage, since the early senescence observed for low *Wd* species may be part of the plant's water conservation strategy as is stomatal control, which ultimately prevents the plant from cavitation [15–17]. Thus, the high *Wd* promoted by thick cell walls plays an important role in protecting plants from cavitation, allowing the plant to continue extracting water even from relatively dry soils.

Since *Wd* is directly related to the capacity of Caatinga plant tissues to store water [3], species with higher wood densities, which have larger stores of water, may be able to survive extremely dry periods. Moreover, the species with the lowest *Wd*, *C. leptophloeos* has the shortest sap flow period (Figure 6) and exhibits a peculiar greenish photosynthetic bark [48], which increases plant survival during the dry season [49]. Unfortunately, our data set did not allow us to gather evidence of a higher leaf water potential maintained over a longer period for the species with lower wood density.

*C. leptophloeos* was not dominant in the monitoring plot and behaved differently, from a phenological perspective, from the other species: leaf flushing began earlier and occured at a slightly faster rate. This species also has a very early onset of senescence (Figure 6f), as was previously reported by [3] and attributed to the species low wood density and high water storage characteristics. Despite the fact that the high storage water capacity increases plant survival during dry periods, its low wood density suggests it cannot extract water from soils with very negative water potentials. Therefore, to avoid cavitation, this species shows an early EOS. Furthermore, it is expected that species with considerably different phenologies and low dominance will have a low correlation with the overall phenological timing of the plot-level vegetation. This explains the low correlations of *C. leptophloeos* F*dn* with the vegetation indices, which reflects the entire plot (Table 4).

The GCC*ns* values of *A. pyrifolium* decreased considerably in the dry period but did not reach their minimum until the onset of the dry season, contrary to the drop in F*dn* and its ranking with regards to *Wd*. However, low GCC values indicate a long leafless period, which was confirmed by the visual inspection of digital imagery and the negligible values of transpiration. The young green leaves of *A. pyrifolium*, with high photosynthetic and transpiration rates, were produced one month after the onset of the rainy season, unlike the other species in which leaf flushing began only a few days after the rains. Moreover, it is worth mentioning that the individuals monitored using GCC were not the same ones monitored with the sap flow sensors, despite the fact that they belong to the same species.

The index adjusted to the soil, SAVI*S*2, showed higher EOS bias (-47 days), compared to NDVI*S*<sup>2</sup> (-42 days) (Table 5), but a considerable lower SOS bias (-18 days) than NDVI*S*<sup>2</sup> (-42 days). Moreover, GCC*ns* presented a higher correlation with SAVI*S*<sup>2</sup> (0.74) than NDVI*S*<sup>2</sup> (0.72) (Supplementary Materials, Table S2). Although the differences may seem small, taking into account that the dates for EOS and SOS extracted from GCC*ns* are the most realistic representation of phenology in this work (since they were confirmed with visual analysis), we can say that SAVI*S*<sup>2</sup> better represents phenology in terms of leaf presence. However, regarding plant water use, SAVI*S*<sup>2</sup> showed a lower correlation with F*dnc*, indicating that the best vegetation index representing phenology is not necessarily the best VI representing plant water use.

F*dnc* exhibited a stronger linear relationship with NDVI derived from MODIS and Sentinel-2 (R<sup>2</sup> 0.92 and R<sup>2</sup> 0.88, respectively) than the other indices (Table 4; Figure 7). A plausible explanation for the lower performance of the NDVI*S*<sup>2</sup> is that with an increase in the resolution, the pixel heterogeneity and number of pixels per image will increase, which means that the Sentinel-2 images will be noisier. Another explanation is that the MODIS algorithm has already undergone numerous revisions, while Sentinel-2 processing may not be mature enough. In the present work, the automatic cloud detection and cirrus correction filter offered by sen2core was not totally effective, requiring a subsequent visual analysis, suggesting that the algorithm still needs improvement. The last and most likely reason is that the higher temporal resolution of MODIS increases the chance of resulting in a better image composition; therefore, the lower temporal resolution of Sentinel-2 has

a higher risk of yielding poor quality images, even when accounting for the filters and quality ratings.

Compared to the near-surface GCC*ns*, the GCC*S*<sup>2</sup> data show a stronger correlation with the sap flow data. Theoretically linked to leaf age, both GCC indices decrease when senescence occurs for all monitored species, as expected. However, GCC*ns* displayed a slight increase before the end of the dry season (Figure 6). This unexpected rise follows the first sporadic rains and produced a temporary disconnect with F*dn* and soil water storage, causing the relatively poor correlations (Table 4). Other likely explanations for these lower performances include: signal contamination by shrub vegetation, wet/dry terrain, difficulty in selecting pixels, and even the wind causing leaf movements. However, an increase in GCC*ns* clearly signals the beginning of leaf flushing, thus it could be used in studies investigating when the growing season begins, the rate of leaf establishment, and how long the growing season lasts. Therefore, near-surface GCC data are important as it allows for individual monitoring of tree species.

Our results regarding the transition dates and water use indicate that the NDVI*modis* provided the best fit and most reliable proxy for the community phenology, but both datasets performed well. The choice of VI is of paramount importance to understand the relation of phenology and plant water use at both the species and community levels [21]. Therefore, detection of the near remote phenology through the use of digital cameras (phenocams) can fill the gap between satellite and on-the-ground observations, improving the detection of which VI better represents the vegetation changes observed on the ground (see [21]). On the other hand, phenocam GCC also enhanced our understanding of the sap flow dynamics in individual plants. The fact that the decrease in sap flow is not coupled with the decrease in GCC*ns* at the end of the rainy season in 2018 (Figure 6) indicates that the leaf drop of most species is preceded by the cessation of sap flow. As expected, both sap flow and GCC*ns* remained low during the dry season. Nevertheless, while GCC*ns* remained high and at approximately constant values after leaf flushing, the sap flow was variable during the 2018 rainy season. Regardless, the strong correlation between F*dnc* and both NDVIs shows that the recorded phenological behaviour of the tree vegetation in the plot, as reflected by fluctuations in vegetation indices such as NDVI, represents the overall profile soil moisture conditions well and its effects on the canopy exchange processes for this Caatinga vegetation.

Our findings compare remote sensing products related to vegetation status at different temporal and spatial scales, with the aim to analyse the water use of Caatinga vegetation. Sentinel-2 and MODIS satellite products also provide data from various other spectral bands that have not been used in our analyses. Bands that could be used to generate other indices, for example, which could be explored with advanced statistical methods, and combined with in-situ data that are complementary to ours or with data from other satellite platforms. Therefore, our study addresses the hypotheses set out at the beginning of the paper, but points to paths in which future work could be further explored.

#### **5. Conclusions**

Given the hypotheses raised, we can conclude the following:


Our results provide meaningful information about the physiological responses of dry forests in the Caatinga area to environmental variables. They also show the ability of different scales of SR products in detecting variations in plant phenology, which is essential information to interpret and analyse the soil water availability and plant flux density. Moreover, our findings identified when to use SR data as a proxy for phenological and physiologal processes in the Caatinga vegetation.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2072-429 2/13/1/75/s1, Table S1: Soil characteristics estimated: *θs*, *θ f c*, *θwp*, *θr*; the *θmax* and *θmin* observed for each monitored soil layer depth (in cm3 cm<sup>−</sup>3) and; the r2 of the Durner model versus observed water retention curve, for each layer, Figure S1: Scatter plot of measured individuals sapwood depth (mm) and their respective DBH (mm), Table S2: Pearson correlation table—near-surface GCC versus orbital VI. Figure S2: Community sap flow and phenological response: (A) Normalized sap flow for the plot community (F*dnc*, dimensionless), GCC*S*<sup>2</sup> and SAVI*S*2. (B) F*dnc*, NDVI*modis* and NDVI*S*2.

**Author Contributions:** Conceptualization, R.A.P. and L.D.S.B.; Formal analysis, R.A.P.; Funding acquisition, L.D.S.B.; Investigation, R.A.P., D.J.V., R.S. and L.D.S.B.; Methodology, R.A.P., D.M.R., L.P.C.M. and L.D.S.B.; Project administration, L.D.S.B.; Resources, E.S. and L.D.S.B.; Software, R.A.P., D.M.R. and R.S.; Supervision, L.D.S.B.; Validation, R.A.P., R.S., R.L.B.N. and A.V.; Writing—original draft, R.A.P. and L.D.S.B.; Writing—review & editing, R.A.P., R.L.B.N., Í.A.C.C., A.V., L.P.C.M., D.M.R. and T.S.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financed by the São Paulo Research Foundation (FAPESP) grant #2015/50488- 5, by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001, and Newton Fund/NERC/FAPESP NE/N012488/1. A Verhoef and R Nobrega were supported by the Newton/NERC/FAPESP Nordeste project: NE/N012488/1. L.P.C.M. receives a research fellowship from the National Council for Scientific and Technological Development (CNPq). DMR receives a FAPESP fellowship (grant FAPESP #2017/17380-1).

**Acknowledgments:** We thank the Government Company of Agricultural Research from the Semiarid (Embrapa semi-árido); the Academic Unit of Serra Talhada (UAST); the Academic Unit of Garanhuns (UAG) from Rural Federal University of Pernambuco (UFRPE); the National Observatory of Water and Carbon Dynamics in the Caatinga Biome-NOWCDCB, for, among others, allowing the use of tools, academic installations, providing technical assistance and labour to carry out the installation of sensors and maintenance of equipment; Luciano Paganucci de Queiroz and his team, for leading the work of the Herbarium at State University of Feira de Santana in the species identification; Jonathan Lloyd for assisting in the management of financial and human resources and for his contributions in developing the methods.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

