*2.1. Study Area*

The study area focuses on Vietnam's southernmost district, Ngoc Hien, Ca Mau province, located in the Southern Mekong Delta between latitude 8◦33–8◦45N and longitude 104◦4245"–105◦354"E, spanning an area of 743 km<sup>2</sup> (See Figure 1). The district has been well-studied for its importance as a major aquaculture hub and its significant reserves of Vietnam's largest and last remaining old-growth mangrove forests, including the internationally acknowledged RAMSAR site of Mui Ca Mau (2012) and UNESCO Biosphere Reserve (2009) [42–44]. The landscape supports both ecologically important mangrove ecosystems and the economic livelihoods based on aquaculture.

#### *2.2. Remote Sensing Data Pre-Processing*

This study makes use of the archives of Landsat's later missions embodied by the Landsat-7 and Landsat-8 multispectral imagery available through GEE's public data catalogue of atmospherically corrected surface reflectance data. We have made use of all available 30 m spatial resolution bands of both missions, this implies: two short-wave infrared (SWIR) bands and four/five visible and near-infrared (VNIR) bands for Landsat-7 and Landsat-8, respectively. The study area is centered within the path-rows of 125–054 and 126–064 with an average 16-day revisit time. The Landsat-7 and -8 Quality Assessment bands and calculated F-mask were used to filter out pixels containing clouds, cirrus, cloud shadow, and atmospheric contamination of the reflectance signal.

Table 1 gives an overview of the available images for each year. Based on the spectral reflectance of all available images annual median composites are compiled that provide consistent cloud-free median images of spectral reflectance [45]. The malfunction of the Scan Line Corrector (SLC) of the Landsat-7 imager has resulted in that on average around a quarter of the data in a Landsat-7 scene is missing from the 31st of May 2003 onwards [46]. These products hence have considerable data gaps, but still maintain the same radiometric and geometric corrections as data collected prior to the SLC failure. The combination of high cloudiness of the region with SLC failure results in limited usability of the years 2003–2012. Therefore, our analysis deploys a cautionary interpretation of these years. Furthermore, years characterized by SLC failure are combined bi-annually to increase data availability, coverage of the region, and to lower uncertainties.

**Figure 1.** Location of study area.

**Table 1.** Overview of used images, per sensor (Landsat-7 and Landsat-8) per scene pathway, and missing pixels and anomalous land use transitions assessed with temporal data fusion. Years impeded by the SLC-malfunction are shaded in grey.


\*= Training year, † = Scan Line Corrector (SLC) malfunction.

#### *2.3. Land Cover Classification*

After pre-processing, the resulting cloud-free median multispectral annual composites are used to characterize land use, and the land use changes over time. The land use classification scheme of our study takes into consideration four dominant land uses within the Ngoc Hien district, namely (1) Dense Mangrove Forest, (2) Sparse Mangroves, (3) Aquaculture/Waterbodies, and (4) Built-up and Barren lands. Dense mangrove forest is defined by a minimum of 30% canopy cover. Vegetated mangrove areas that are 10–30% crown cover are classified as sparse mangroves.

We conducted a supervised classification to develop land use maps. There are several classification algorithms available within GEE, including; Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes, and Support-Vector Machine (SVM). Our study opted for the commonly used CART classifier which has produced relatively high accuracies when applied to Landsat data in numerous settings [19,23,24,26]. More specifically, several studies have reported the highest accuracy for CART land use classification of coastal wetlands and mangroves using GEE compared to other classifiers [25,27]. Most importantly, we ran trails in the study area for both CART and RF in which the first yielded the highest classification accuracy (94–96% for CART, against 89–94% for RF, respectively). GEE code implementations of both approaches and its validation against test data can be found in the Supplementary Materials Table S1.

Within CART, a decision tree (DT) classifier was instantiated and trained on field data using GEE's default parameters. The CART algorithm runs through a series of nodes that recursively split the input data in such a way that there is a decrease in entropy and an increase in information gain after the split. GEE's CART uses the Gini Impurity Index to decide the input features which will provide the best split at each node. A tabular overview of the exact decision rules for building the model can be found in Supplementary Materials Table S2. One disadvantage of the DT classifier is the considerable sensitivity to the training dataset. A small change to the training data can result in a very different set of subsets and can result in overfitting [19,47]. Nevertheless, our training and validation data relies on extensive fieldwork, including 514 georeferenced points gathered in-situ across the Ngoc Hien district in 2015, subdivided into the four classes; dense mangroves (*n* = 247), sparse mangroves (*n* = 72), waterbodies/aquaculture ponds (*n* = 120), and built-up and barren lands (*n* = 75). We used 70% of the field data for training and the remaining 30% for validation, thereby estimating the classification errors independently.

Following the initial training of the classifier, it is then deployed backward (LS-7) or forward (LS-8) through the time series based on spectral/change information of the surface reflectance data available in the composite datasets. Based on this method, land cover maps are generated from the surface reflectance of pre-processed yearly median composites between 2001 and 2019. The workflow of GEE pre-processing and land use classification is presented schematically in Figure 2. GEE code can be accessed through the URLs published in Supplementary Materials Table S1.

#### *2.4. Post-Classification Optimization through Time Series Temporal Data Fusion*

The longitudinal temporal data of the Landsat archives enabled the use of neighboring time points to cross-validate findings, fill in missing data through temporal data fusion, detect and revise illogical land use changes in the post-classification analysis [33,39–41,46,48].

Gap-filling through consideration of the temporal dimension in years preceding and following a missing value has allowed us to interpolate missing pixels to assure spatial and temporal continuity over the study area. The approach for temporal interpolation follows an adaptation of inverse distance weighting methods applied to discrete land use classification data, the basic assumption is that a temporally distributed variable at short distance is generally more similar than at larger distance [49,50]. The applied approach integrates land use classes of adjacent years weighted by a power (*p* = 1.5) of the distance (*d*) to the year of interpolation, formulated in an equation as:

$$\hat{z}\_{i=0} = \max \{ \sum\_{i=1}^{n} \frac{1}{d\_{z,i}^p} \}$$

in which *i* = 0 indicates the time point to be interpolated with predicted land use (*z*ˆ), and which *i* = *n* indicate the *n* years adjacent where land use (*z*) has been observed. As such, pixels missing valid observations are estimated by taking into account a seven-year pattern and scoring neighboring time points, in which observations nearest in time weight the heaviest. The seven-year time window corresponds with the consecutive years with full data availability (2013–2019). The land use class scored the highest will be used for gap-filling.

**Figure 2.** Data processing chain and workflow of the study separated in a repeated (1) GEE land use classification process and (2) post-classification optimization based on the temporal integration of land use classification output.

Similarly, further optimization of classification results can be achieved by taking into account that land use changes usually occur characterized by a logical transition [40,41]. Land use changes and transitions follow ecological rules [40,41]. For instance, the growth of dense mangrove forests takes at least multiple years. Understanding these land use transitions can help setting rules determined by ecology and feasibility to detect illogical land use transition from remotely sensed time series.

In our study, we have opted for a post-classification approach to detect and revise illogical land use (See Table 2). The assessment of uncertainties and reclassification by the CART classifier based on the temporal exclusion of certain illogical transition rules (e.g., maximum a posteriori (MAP) classification rule in a Bayesian framework [41]) has not been embedded within the GEE platform. Therefore, a post-classification approach that scores the likelihood of land use classes to occur based on temporal context is deployed. Table 2 provides an overview of the anomalies detected. Similar to the gap-filling of missing pixels, illogical land use transitions are revised by distance weighting of neighboring time points (analyzing a seven-year pattern) of the pixel under scrutiny to determine what land cover time

is most probable to be in place. The entire workflow from pre-processing, land cover classification, post-classification optimization is presented schematically in Figure 2's diagram.

**Table 2.** Scheme of logical inter-annual land use changes. Color scheme indicates a classification of mangrove forest land use change trends. The color scheme also applies multi-annually.


We tested the gap-filling algorithm's predictive performance using a k-fold leave-one-out strategy. In these tests, we purposely removed a datapoint to run the gap-filling algorithm to predict the land use based on temporal neighbors. We assessed the percentage of correctly predicted land use classifications. We have conducted this for all available pixels and years in the analysis.

Taken together, temporal integration of GEE's individual land use classification maps into a gap-free sequence of maps that follow logical land use transitions enabled us to analyze the land use trends occurring throughout the available data. Land use changes were grouped into four categories illustrated through the color scheme in Table 2. Only mangrove forest changes were considered in the trend maps (other land use changes remain white). Moreover, we used 150 × 150 m box averages to highlight general trends and reduce uncertainties by spatial regularization.
