2.3.1. Data Acquisition

The mangrove AGB in the CGBRS was estimated by fusing the ALOS-2 PALSAR-2 L-band dual polarimetric data level 2.1 obtained in high-sensitivity mode with Sentinel-2 (S-2) MSI images. Table 2 presents the S-2 and the ALOS-2 PALSAR-2 data at the study site, acquired on 23 and 24 March during the 2018 dry seasons, respectively.


**Table 2.** Acquired earth observation data for this study.

To pre-process the satellite remotely sensed data, we resampled both multispectral bands of Sentinel-2 and the dual polarization model of ALOS-2 PALSAR-2 data at a ground sampling distance (GSD) of 10 m. The satellite images were processed as described in Section 2.3.2. To validate the model's performance and optimize the hyperparameters for AGB retrieval in the CGBRS, the model was combined with the measured field data. Figure 3 is a flowchart of the satellite-image processing and the generation of mangrove AGB estimation models using the ML techniques in the current study.

**Figure 3.** Flowchart for satellite-image processing and the generation of AGB models based on ML techniques.

#### 2.3.2. Satellite Image Processing

Two scenes of the ALOS-2 PALSAR-2 Level 2.1 data acquired on 23 March 2018 during the dry season were download from https://auig2.jaxa.jp/ips/home, the website of the Aerospace Exploration Agency (JAXA). The DN (Digital Number) of the ALOS-2 PALSAR-2 imagery was converted to normalized radar sigma-zero using Equation (1):

$$
\sigma^0 \text{ [dB]} = 10. \,\text{log}10 \,\text{(DN)}^2 + \text{CF} \tag{1}
$$

where σ0 is backscatter coe fficients, and CF is the calibration factor. For HH and HV polarizations, CF = −83 dB [44]. Equation (1) converts the DN of each pixel to sigma naught (σ0) in decibel (dB).

Two scenes of the Sentinel-2 (S-2) Level-1C sensors acquired on 24 March 2018 during the dry season were retrieved from Copernicus Open Access Hub of the European Space Agency (ESA). The radiometric and geometric corrections of the S-2 data were made to the UTM/WGS84, Zone 48 North projection at top-of-atmosphere (TOA) reflectance [45]. The S-2 MSI Level-1C data were processed to Level-2A at the bottom-of-atmospheric (BOA) reflectance using the Sen2Cor algorithm of ESA (http://step.esa.int/main/third-party-plugins-2/sen2cor/). The S-2 and ALOS-2 PALSAR-2 images were processed by the SNAP toolbox, and the modeling process was performed in Python 3.7 environment using the Scikit-learn library [46].

#### 2.3.3. Transformation of Multispectral and SAR Data

As a commonly employed method in previous mangrove AGB retrievals [13,47,48], image transformation was applied to the multispectral and SAR data of the present study. The image transformation of SAR data involves a combination of multi-polarizations such as HV/HH, HH/HV, and HH-HV, as suggested in [26]. Meanwhile, multispectral data are transformed using the vegetation indices, as each index is sensitive to mangrove structure and biomass. Table 3 shows the seven vegetation indices chosen for mangrove AGB retrieval at the CGBRS after referring to related studies [49–51]. The 23 predictor variables included five variables of ALOS-2 PALSAR-2 data (HV, HH, HV/HH, HH/HV, and HH-HV), 11 multispectral bands of S-2, and seven vegetation indices. Using the predictor variables, we computed the explanatory variables in the prediction model of mangrove AGB retrieval (Table 3). Figure 4 illustrates the image composites of di fferent sensors and vegetation indices, along with the SAR transformation, in the study area.


**Table 3.** List of vegetation indices used in the current study.

**Figure 4.** Illustrations of input variables in the study area. (**a**) Pseudo color composite of Sentinel-2 (RGB: Bands 8-4-3), (**b**) Pseudo color composite of ALOS-2 PALSAR-2 (RGB: HH-HV-HH/HV), (**c**) NDVI, (**d**) SAR transformation (HH-HV).

#### *2.4. Selection of Machine Learning Model*

To identify the best model for AGB retrieval in CGBSR, we compared the performances of several ML techniques (XGBR, GBR, GPR, RFR, and SVR). The SVR model best predicted the mangrove AGB in a coastal area of North Vietnam [9], whereas the RFR model delivered the best monitoring results of mangrove biomass changes in South Vietnam [10]. Therefore, SVR and RFR were selected for the present study. The other ML algorithms were chosen because they are commonly used for solving regression problems in various fields [40–42].

#### 2.4.1. Gradient Boosting Decision Trees Algorithms

#### a. Gradient Boosting Regression (GBR)

GBR is an ensemble-based decision tree method that boosts the performance of weak learners to those of stronger ones. Each regression tree of the GBR learns the residual of each tree conclusion. The main purpose is to reduce the previous residuals and thereby decrease the model residual along the gradient direction. The results of all regression trees are integrated to give the final result [52,53]. The GBR model can handle mixed data types and is robust to outliers [54]. As GBR has not been widely applied to mangrove biomass estimation, it was considered for testing in the present study.

The parameters to be determined are the learning rate, number of trees, minimum number of samples required at a leaf node, maximum depth, and the number of features for the best split. The hyperparameters of the GBR model were optimized by five-fold cross-validation (CV) techniques.

#### b. Extreme Gradient Boosting Regression (XGBR)

The Extreme Gradient Boosting (XGB) algorithm, proposed by Chen and Guestrin [55], is a novel GBR technique that develops strong learners by an additive training process. To resolve the drawbacks of weakly supervised learning, the additive learning is divided into two phases: A learning phase fitted to the entire input data, followed by adjustment to the residuals. The fitting process is repeated many times until the stopping criteria are achieved. This algorithm is based on "boosting decision trees", which handle both classification and regression tasks in weakly supervised machine learning by the additive training strategies. The XGBR technique alleviates the undesired over-fitting problem.

The XGBR algorithm optimizes the loss function not by the first-order derivative (as in GBR) but by an e fficient second-order expression. To avoid the over-fitting problem, the objective function treats the model complexity as a regularization term, and the regular term is added to the cost functions [55]. The XGBR model is quite generalizable and avoids both over-fitting and under-fitting. It also supports parallel computing to reduce computational time.

The parameters of XGBR are those of the GBR algorithm, and an additional parameter gamma (γ) representing the minimum loss of further partitioning a leaf node of the tree. The larger the γ, the more conservative is the algorithm. The XGBR model was also optimized by five-fold CV in the Python environment.

#### 2.4.2. Support Vector Regression (SVR)

SVM is a supervised learning technique based on the statistical learning theory developed by Vapnik [56]. This method is widely used for classification and regression tasks in computer vision, pattern recognition, and environmental problems. SVR is an SVM method that solves specific regression problems. A nonlinear kernel function in SVR transforms the dataset into a higher dimensional feature space, where the data can be treated by simple linear regression. In this study, the selected kernel function was the radial basis function (RBF), the most widely adopted kernel for optimizing forest AGBs in prior studies [29,50].

The SVR model is generally configured by three hyperparameters: Epsilon (ε), the regulation parameter ( *C*), and the kernel width ( γ) of the RBF. In the present study, these parameters were optimized through five-fold CV.

#### 2.4.3. Random Forests (RF)

RF [57] is the most common bagging model applied to both classification and regression problems. For training, RFR creates multiple uncorrelated trees from a randomly selected subset of 2/3 of the total samples (in-bag). The remaining 1/3 of the total samples (out-of-bag, OOB) are used for estimating the OOB error and validating the method. A tree is grown from in-bag samples with *m* features for optimizing the split at each node. In the absence of pruning, the tree reaches its largest possible extent. The RFR model produces (1) an OOB error and (2) the relative importance of each variable. From these outputs, it assesses the prediction accuracy and the contribution of each variable.

RFR is a high-performance non-parametric method that processes nonlinear data without overestimation during the training and testing phases. Accordingly, it has been widely employed in remote sensing [58,59]. The RFR requires the number of trees and the number of features *m* for the split. In this study, both RFR parameters were optimized by five-fold CV in the Python environment.

#### 2.4.4. Gaussian Processes (GP)

Based on the non-parametric Bayesian theory, GPs are applicable to both classification and nonlinear regression problems. The GPR model learns the fit function from a small dataset using various kernels, finding the probability distribution that best describes the data. The input data are assumed to follow a multivariate Gaussian distribution, and the noise is independent of the data measurements [60]. The mean vector and covariance matrix are estimated from the training data by mean and covariance functions, respectively, creating a detailed posterior distribution from which the confidence interval and uncertainty of the prediction results can be interpreted. The mean value of a GP represents the best estimation from the model, and the variance (σ2) helps to measure the confidence level. GPs are well-known as good predictors of biophysical parameters [61].

## *2.5. Model Evaluation*

#### 2.5.1. Input Data for Model Running

To create the input data for training models, the 121 sampling plots were divided into training set (80%) and testing dataset (20%) using the well-known Scikit-learn [46] library in Python programming environment. Because the measured plot size (500 m2) greatly exceeded the image pixel size (10 m), all satellite data were smoothed through a median filter with a window size of 5 × 5 pixels in the SciPy library [62].

#### 2.5.2. Hyperparameters Tuning in XGBR, GBR, RFR, SVR, and GPR

Hyperparameter tuning is often required when optimizing machine learning techniques. In this work, the parameters of each ML model were optimized by grid searching and five-fold CV. The results are listed in Table 4.


**Table 4.** Optimized hyperparameters of the ML applied in this study.

In the GPR, we combined the RBF with a length scale of 100 and WhiteKernel with a noise level of 1.0. The hyperparameters and kernels were maintained during the training and testing phases.
