**3. Methodology**

The methodological approach of this study involved the acquisition and pre-processing of satellite data, selection of training and validation data, supervised classification of the images with the Random Forest (RF) algorithm, evaluation of classification accuracies, and mapping and analysis of the results. The following flowchart (Figure 2) illustrates the methodological approach which is summarized in three main points in the description.

**Figure 2.** Methodological flowchart of the study.

### *3.1. Selection and Pre-Processing of Satellite Images*

Since cloud-free images providing complete coverage of the study area for the target year were difficult to find, image composition was performed. This consisted of filtering all images with admissible cloud cover set to a certain threshold (Table 1) to create a mosaic of images around each target year. Referring to methods that are frequently used in the literature, several authors had performed image composition based on the temporal aggregation of data, applying the calculation of statistical parameters (mean, median, and

maximum or minimum values) on the pixels of a pre-defined image time series [39,40]. Others simply used all available Landsat images in their study area to compose image time series [41,42]. According to [24], the most popular strategy for selecting input images for an annual cloud-free composite is using images that have been acquired over three years. In this study, the annual data composite was targeted from the year 1985, with a five-year step to better perceive disturbance lapse times. Unfortunately, there were problems of poor quality and insufficiently filtered images below the set cloud thresholds, together with gaps in the data covering the study area. Faced with these difficulties, only six image composites were created, from all available data in the target years, or occasionally, one or two years on either side of the target years (i.e., 1985, 1990, 2000, 2005, 2015, and 2020). Image composites were formed by applying a cloud mask *QA\_PIXEL Bitmask* (provided with the data) to the image collections. Cloudy pixels were maintained (by removing the mask) when no other non-cloudy pixels were available to replace them from the entire time period around the target years. These were placed into the cloud class so that the entire extent of the study area could be considered when facilitating later surface analyses. We initially composited these images only from the best-available pixels derived from Landsat data [43]. Nevertheless, given that some parts of the area remained without data under the constraints of the filters, we calculated the median of all pixels that met these imposed filters.

Several vegetation indices were also calculated and added as bands to the image composites to see what improvements they could bring to the classification process. These were NDVI, NDBI, NDWI, and BSI (Table 2).


**Table 2.** Formulas of the used vegetation indices.

Note: *ρ<sup>R</sup>*, *ρG*, *ρ<sup>B</sup>*, *ρNIR*, and *ρSW IR*1 represent the reflectance of red, green, blue, near-infrared, and short-wave infrared bands, respectively.

Since the study area was characterized by major land-cover classes, including vegetation (dense dry forest, open forest, and savannah), crops and fallow land, buildings and bare soil, and water bodies, we selected these vegetation indices to better characterize them. NDVI has been widely used over many decades to monitor vegetation dynamics in terrestrial ecosystems and remains the most popular index that is used for vegetation assessment [52,53]. Using NIR and SWIR bands, NDWI is commonly and successfully used in the detection and mapping of surface water bodies [54] and the improvement of terrain illumination differences and atmospheric effects. Furthermore, the BSI has been proposed as a more reliable estimator of vegetation status where vegetation covers less than half of an area [51,55]. Ref. [56] has shown that combining NDVI, NDWI, and NDBI data could refine several aspects of urban features and appearance while removing cloud-related noise in image classifications. Based upon these findings, these indices were combined with the classic bands of Landsat data, given that the former are expected to contribute to the development of a more nuanced classification scheme [57]. Using the vector data, the resulting image composites were then clipped with the study area to limit processing within this area.

### *3.2. Selection of Training and Validation Data*

For each target year, sample points were selected based on the land cover that was detected through visual interpretation or by relying upon archival high-resolution Google Earth data from periods as close as possible to the target years. Reference data was collected from various sources for the different target years. For the year 2020, we used data collected in the field as explained in Section 2.2. Reference data based on high-resolution archive images mainly concerned the years 2000, 2005, and 2015. For the years 1985 and 1990, when high-resolution images were not available on Google Earth, we relied on samples of land cover directly collected by visual interpretation on filtered Landsat images of these years. In addition to these data, the pixel values of the added vegetation indices bands were used to guide the selection of samples. Therefore, both visual interpretation and consultation of the pixels that were provided by these additional vegetation indices bands were used to make these selections.

In applying these sample selection methods to image composites of the target years 1985, 1990, 2000, 2005, 2015, and 2020, a total of 1007, 1102, 1219, 1278, 1372, and 1521 points were sampled per composite, respectively, to serve as training points. Each group of points represented the different land cover types. For the six target years, 7499 sample points were thus collected, some to serve as training samples (70%) during the classification of the composite images, and the remainder to validate the classification results (30%).

### *3.3. Image Classification and Evaluation of Accuracy*

Following the identification and pre-processing of images, we proceeded to classify the image composites with the classic Landsat bands, followed by a second classification with these same bands to which were added the vegetation indices to determine their effect on the quality of these classifications. As for the pre-processing, image classifications were performed using JavaScript codes in the GEE. For the selection of the appropriate classification method, several classification algorithms related to supervised machine learning have been used in the literature. These include Support Vector Machines (SVM), Classification and Regression Trees (CART), Stepwise Multiple Linear Regression (SML), and Random Forests (RF). We determined that supervised machine learning classifiers, such as Classification and Regression Trees (CART) and Random Forests (RF), were the most frequently used for this purpose. Furthermore, the use of RF classifiers leads to greater classification accuracy, even when applied to the analysis of data with higher noise levels [58–60]. This is confirmed in studies by [61], who evaluated 179 relevant classifiers from 17 families using 121 datasets. The authors concluded that RF provided the best classifiers. Therefore, we selected the RF algorithm because it yields results with excellent accuracies and can work efficiently on large datasets [62].

The different image composites that resulted from filtering according to the previously mentioned parameters were then classified in the GEE using the RF algorithm. The number of decision trees that were selected for this algorithm was made with reference to the literature, which generally indicates that the greater the number of trees, the better the results. According to [63], it is unclear whether the number of trees should simply be set to the largest computationally manageable value or whether a smaller number of trees might be sufficient or provide better results. [64] compared the performance of the RF model with different numbers of trees on 29 datasets and noted that a forest with 512 trees performs better than one with 1024 trees. They concluded that forest performance does not always improve substantially as the number of trees increases beyond a certain level. While it is commonly thought that tuning hyper-parameters can improve RF performance, [65] acknowledged that improvements achieved by adding trees decreases as more and more trees are added. Generally, RF works quite well with default values of hyper-parameters, and, according to these authors, typical default values for the number of trees for RF are 500 and 1000. Therefore, we chose to use 500 trees in the RF classification algorithm that was applied to the image composite classifications in this study as this number of trees has been widely used in the literature in various fields and mainly in land cover classification with very good results [60,66–70].

The image classifications for this study were based on seven main land cover classes (Table 3). The definition of these classes was based on the Yangambi land classification system [34] appropriate to the West African context which was used during the 2016 National Forest Inventory (IFN) [71]. However, to take into account the limited capacity of available images to discriminate between different land cover, some classes have been aggregated into other larger classes.

**Table 3.** Description of LULC categories used in the classification.


The original spectral bands B1, B2, B3, B4, B5, and B7 from Landsat 5 and 7, together with B2, B3, B4, B5, B6, and B8 from Landsat 8, were used as inputs to the RF model for the first classification. For the second classification, an ensemble combining these same bands with the four aforementioned vegetation indices was used as input, but with the same training and validation samples.

Based upon random selection in the model, 70% of the collected data were used as training samples when classifying the composite images, while 30% were used as validation data for the classification results. The accuracy of the classifications that were performed on each image composite was then evaluated. For each image composite, we calculated traditional metrics for evaluating the accuracy of image classification, which are the producer accuracy (PA), the user accuracy (UA), the overall accuracy (OA), and Cohen's kappa coefficient ( *K*) [72].

The different metrics are defined by the following equations [73]:

$$OA = \begin{pmatrix} 1/\mathbb{N} \end{pmatrix} \sum\_{i=1}^{r} n\_{ii} \tag{1}$$

$$PA = n\_{ii} / n\_{icol} \tag{2}$$

$$
\Box IA = n\_{ii} / n\_{irow} \tag{3}
$$

$$K = N \sum\_{i=1}^{r} n\_{ii} - \sum\_{i=1}^{r} \left( n\_{icol} \, n\_{irow} / N^2 \right) - \sum\_{i=1}^{r} n\_{icol} \, n\_{irow} \tag{4}$$

where *nii* is the number of correctly classified pixels in a category; *N* is the total number of pixels in the confusion matrix; *r* is the number of classes; *nicol* is the column total (reference data); and *nirow* is the row total (predicted classes).

Ref. [74] defines the main parameters of classification accuracies, such as OA, as the ratio of the number of correctly classified pixels to the total number of pixels in the class, and Kappa, which refers to the proportion of error reduction between the considered classification and a completely random classification. According to [73], OA represents the ground truth classes that are correctly classified by the analyst (error of omission), while UA is the percentage of pixels that do not really belong to the reference class but are engaged in other ground truth classes (error of commission).

Following these evaluations of the classification accuracies of the image composites, the results were exported from the GEE for formatting in mapping software. The land-cover maps were finalized in ArcMap 10.6.1, while land-cover conversion maps were produced using the semi-automatic classification extension that was recently developed with python code by [75], and which is usable in QGIS 3.6. The annual rate of forest cover change (r) and annual deforestation (R), which have been defined by [76], were also calculated for the periods between the selected target years of this study and between 1985 and 2020 by applying Equations (5) and (6), as follows:

$$\mathbf{r} = \left(\frac{1}{\left(t\_2 - t\_1\right)}\right) \* \ln\left(\frac{A\_2}{A\_1}\right) \* 100\tag{5}$$

$$\mathcal{R} = \frac{A\_2 - A\_1}{t\_2 - t\_1} \tag{6}$$

where *t*1 is year 1, *t*2 is year 2, *A*1 is forest area in year 1, and *A*2 is forest area in year 2.
