**3. Methods**

The analysis of potentially separable LC classes was conducted using the time-series of optical NDVI values and radar polarization bands. A total of seven cloud-free scenes of the S2 L2A were used to calculate NDVI profiles to identify areas containing different vegetation and agriculture characteristics [56].

#### *3.1. Jeffries–Matusita (JM) Distance*

To evaluate the spectral similarity between the LC classes in the reference dataset used for this research, Jeffries–Matusita (*JM*) distance was calculated [57,58]. This spectral separability measure compares distances between the distribution of classes (e.g., *A1* and *A2*), which are then ranked according to this distance, following the equation:

$$fM\_{A\_1A\_2} = \text{2 } \star \text{ (1 } - \text{ } \text{ $\varepsilon$ }^{-B}\text{)}\tag{1}$$

where *B* represents the Bhattacharyya distance [56]:

$$B\_{A\_1 A\_2} = \frac{1}{8} (\overline{m}\_{A\_1} - \overline{m}\_{A\_2})^2 \frac{2}{S\_{A\_1}^2 + S\_{A\_2}^2} + \frac{1}{2} \ln \left[ \frac{S\_{A\_1}^2 + S\_{A\_2}^2}{2S\_{A\_1} + S\_{A\_2}} \right] \tag{2}$$

where *mAi*are average values of LC classes *A1* and *A2*, and *SAi*are their covariance matrices.

#### *3.2. Random Forest (RF) Classification*

For this research, RF classifier was chosen due to the simple parametrization, feature importance estimation in the classification, and short calculation time [3]. Therefore, optimization of the RF hyperparameters and feature importance estimation as input for vegetation mapping will be explained in Sections 3.2.1 and 3.2.2, respectively.

## 3.2.1. Hyperparameter Tuning

RF consists of several hyperparameters, which allow users to control the structure and size of the forest (*ntree*) and its randomness (e.g., number of random variables used in each tree—*mtry*). Default values for the *ntree* and *mtry* parameters are 500 and the square root of the number of input variables, respectively. Therefore, a grid search approach with crossvalidation was used in this research for hyperparameter tuning, and optimal parameter values were determined as those that produced the highest classification accuracy.

#### 3.2.2. Feature Importance and Selection

During the training phase, RF classifier constructs a bootstrap sample from 2/3 samples of the training dataset, whereas the remaining samples, which are not included in the training subset, are used for internal error estimation called out-of-bag (OOB) error [59]. The random sampling procedure was repeated ten times, allowing to compute average performances with confidence intervals. Afterwards, by evaluating the OOB error of each decision tree when the values of the feature are randomly permuted, the relative importance of each feature can be evaluated [60]. In such a way, mean decrease in accuracy (*MDA*) can be expressed as [24]:

$$MDA\_j = \frac{1}{n} \sum\_{t=1}^{n} (MP\_{tj} - M\_{tj}) \tag{3}$$

where *n* is equal to *ntree*, and *Mtj* and *MPtj* denote the OOB error of tree t before and after permuting the values of predictor variable Xj, respectively [61]. MDA value of zero indicates that there is no connection between the predictor and the response feature, whereas the larger positive of MDA value, the more important the feature is for the classification.

Another measure for calculating the feature importance is based on the mean decrease in Gini (MDG), which measures the impurity at each tree node split of a predictor feature, normalized by the number of trees. Similar to MDA, the higher the MDG, the more important the feature is. Similar research in the remote sensing community is not united on which measure to use for feature selection using RG in classification tasks. Belgiu and Dragut [62] reported that most studies in their review used MDA, whereas Cánovas-García and Alonso-Sarría [60] obtained the highest accuracy for all of the classification algorithms using MDG. Since former research used pixel-based and latter object-based classification, MDA was used as a measure for feature selection in this research.
