*2.3. Methodology*

Figure 2 sets out our working flow chart, which comprises three main parts: (1) The processes for the Landsat-X mangrove extent unsupervised classification; (2) the procedures to process the SPOT-7 image to classify mangrove age and fusion with the Sentinel-1 images; and (3) the processing chain including image pre-processing, speckle filtering, fusing of the VH and VV layers with the SPOT-7 image, and supervised classification of mangrove types. Minor steps such as clipping the region of interest, post-classification to convert the classified image to vector, confusion matrix (contingency matrix) calculation, band math, band conversion, etc. are not included in order to simplify the figure. Basic tasks in remote sensing image processing, like atmospheric correction [40,41], image resampling (done only for Landsat-2), SAR image pre-processing (radiometric calibration, terrain correction, and data conversion/select band to export single layer) and speckle filtering are well

documented [42,43], hence they are not described in detail here. The core tasks of classifying mangrove extent, age, and species, and image fusion are explained in the following sub-sections.

**Table 2.** Summary of remote sensing data used (X refers to the Landsat mission of 2, 5, and 8; L1TP is data processing level 1 with precision terrain corrected; BQA stands for band quality; MSS is Multispectral Scanner Sensor; TM stands form; OLI is Operational Land Imager; Mul and Pan are short for multispectral and panchromatic bands, respectively; GPL is geometric processing level; RPL is radiometric processing level; and GRD is ground-range detected. V and H are vertical and horizontal, respectively, and coupled letters of VH and VV indicate SAR cross-polarizations).


**Figure 2.** Flowchart of methodology used for mapping mangrove extent, age, and species. X refers to the mission number of the used Landsat images; ANN, DT, RF, SVM stand for artificial neural network, decision tree, random forest, and support vector machine, respectively; Mul and Pan are short forms of multispectral and panchromatic bands, respectively; GS and PCA indicate Gram–Schmidt and principal component analysis image fusion methods; V and H are vertical and horizontal, respectively, and coupled letters of VH and VV indicate Synthetic Aperture Radar (SAR) cross-polarizations.

#### 2.3.1. Mangrove Age Classification

Mangrove age and growth estimations are typically quantified by means of in situ dendrometer techniques [44] and internodes [45]. However, few studies have attempted to define classifiers dealing with mangrove age estimations from remotely sensed data. We elected to use artificial neural network (ANN), decision tree (DT), random forest (RF), and support vector machine (SVM) methods from among the many available for the mangrove age estimation because (1) they are robust image supervised classification methods; (2) the advancements in machine learning (ML) approaches to model complex class signatures and accept a variety of training data [46]; and (3) because they are routinely found to have higher accuracies than the maximum likelihood method [47]. Selection of these four methods allowed us to compare results and identify the best performing method using the SPOT-7 image and the same training dataset from the field survey (Section 2.2.1).

ANN classification has been used in a wide range of applications in remote sensing. The theory and algorithm are explained in detail by Schalko ff (1992), Foody (1996), and Dreiseitl and Stephan (2020) [48–50]. Generally, ANN classification is achieved with a fundamental layered, feedforward network architecture (Figure 3) comprising a set of processing units organized in layers. Layers are connected by a weighted channel to every unit [50]. The training data are used to compute the di fference (error) between the desired and actual network output; then the error is fed backward to the input layer through the network, with the weights linking the units altered in proportion to the error. The process is repeated until the error rate reaches an acceptable value of above 60% agreemen<sup>t</sup> between the classified and ground-truth data. Although the ANN algorithm has some advantages, in remotely sensed data classification this method has limitations when dealing with highly heterogeneous land cover types (mixed pixels) and the network can become static when the number of neurons exceeds ten [7]. In this classification, some primary parameters describing the number of neurons, maximum number of iterations, and error change are adjusted to values of 3, 300, and 0.1, respectively. The selected training method was back propagation with a weight gradient term of 0.1 and moment term of 0.5.

**Figure 3.** Classification of remote sensing data by an artificial neural network (adapted from Foody 1996) where *Wij* is the weight that connects the *j*th unit with its *i*th incoming connection; *Oi* and *Oj* are the value of the *i*th incoming connection and *j*th output connection; and λ is a gain parameter, which is often set to 1.

While the conventional statistical and neural/connectionist classifiers create a single membership for each pixel at the same time, the decision tree (DT) classifier solves the problem of label assignment using a multi-stage or sequential approach [51]. The labeling process is a chain of simple decisions based on sequential test results rather than a complex decision. In terms of DT construction, there is a univariate DT, splitting features orthogonally to the axis, testing a single feature at a time while the multivariate DT splitting rule at internal nodes can di ffer depending on the complexity of the data and classification problem, using one or more features simultaneously. The multivariate DT is considered able to generate more accurate results than the univariate DT [52]. Two high-driven parameters of maximum tree depth and regression accuracy were set at values of 7 and 0.01.

The random forest (RF) classifier is a nonparametric and ensemble technique proposed by Breiman (2001: 5) [53], which is a "combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest". Random forests contain many decision trees, with each tree built from a random subset of training data with a random subset of predictor variables. Since the RF algorithm consists of a parametric model for prediction, it is di fferent from traditional statistical methods [54]. Feature/feature combinations are selected using bagging, a method used to generate a training dataset by randomly drawing on replacement N examples, where N is the size of the original training set [55]. The RF approach is recommended as it has the advantage of using fully grown trees that are not pruned compared to other decision tree methods [56]. The parameters set for this method were a maximum tree depth of 10, regression accuracy of 0.01, and truncate pruned tree (yes).

Support vector machine (SVM) [57,58] is a supervised non-parametric statistical learning technique that provides good classification results from complex and noisy data [59,60]. The statistical learning theory is derived in the SVM classification system that separates the classes with a decision surface maximizing the margin between the classes. The surface is called the "optimal hyperplane" and the data points nearest the hyperplane are called "support vectors" [60]. Dealing with a large high resolution image, the SVM classifier is time-consuming to process, hence it provides a hierarchical, reduced-resolution classification process, which enables the performance to be shortened without significantly degrading the outcomes. In this study, we selected radial basic function for the Kernel type and set Gamma in a Kernel function of 0.25 and penalty parameter of 100.
