Next Article in Journal
Research on the Detection Method of Martian Atmospheric Temperature and Pressure Profile Based on Laser Occultation Technology
Next Article in Special Issue
Exploring Stand Parameters Using Terrestrial Laser Scanning in Pinus tabuliformis Plantation Forests
Previous Article in Journal
EventSegNet: Direct Sparse Semantic Segmentation from Event Data
Previous Article in Special Issue
Examining the Impact of Topography and Vegetation on Existing Forest Canopy Height Products from ICESat-2 ATLAS/GEDI Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Canopy Height of Forest–Savanna Mosaics in Togo Using ICESat-2 and GEDI Spaceborne LiDAR and Multisource Satellite Data

Centre d’Applications et de Recherches en Télédétection (CARTEL), Département de Géomatique Appliquée, Université de Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(1), 85; https://doi.org/10.3390/rs17010085
Submission received: 4 December 2024 / Revised: 17 December 2024 / Accepted: 25 December 2024 / Published: 29 December 2024
(This article belongs to the Special Issue Lidar for Forest Parameters Retrieval)

Abstract

:
Quantifying forest carbon storage to better manage climate change and its effects requires accurate estimation of forest structural parameters such as canopy height. Variables from remote sensing data and machine learning models are tools that are being increasingly used for this purpose. This study modeled the canopy height of forest–savanna mosaics in the Sudano–Guinean zone of Togo. Relative heights were extracted from GEDI and ICESat-2 products, which were combined with optical, radar, and topographic variables for canopy height modeling. We tested four methods: Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN). The RF algorithm obtained the best predictions using 98% relative height (RH98). The best-performing result was obtained from variables extracted from GEDI data (r = 0.84; RMSE = 4.15 m; MAE = 2.36 m) and compared to ICESat-2 (r = 0.65; RMSE = 5.10 m; MAE = 3.80 m). Models that were developed during this study can be applied over large areas in forest–savanna mosaics, enhancing forest dynamics monitoring in line with REDD+ objectives. This study provides valuable insights for future spaceborne LiDAR and other remote sensing data applications in similar complex ecosystems and offers local decision-makers a robust tool for forest management.

1. Introduction

The rapid rise in global greenhouse gas (GHG) emissions since the debut of the Industrial Revolution has led to considerable changes in the Earth’s climate, which has been the subject of much research over recent decades. The Intergovernmental Panel on Climate Change (IPCC) has determined that this increase in anthropogenic GHG emissions is the primary driver of climate change, which can push temperatures beyond the thermal tolerances of many species [1,2]. According to current trends, global warming that is linked to these gases will likely exceed 1.5 °C within several decades despite aggressive emissions reduction strategies [3]. Removing CO2 from the atmosphere by favoring nature-based solutions (protected areas and forests) could contribute greatly to climate change mitigation [4,5,6].
Forests constitute one of the largest reservoirs of terrestrial carbon, thereby playing a vital role in offsetting the aforementioned climate changes and regulating the global carbon balance. Annually, they contribute to about 50% of net terrestrial primary production, store about 45% of the planet’s active carbon, and sequester around 33% of anthropogenic emissions [7,8,9]. Unfortunately, tropical forest ecosystems, which maintain the global ecological equilibrium, are constantly threatened by deforestation and degradation for economic purposes. Timber extraction and forest clearing for other land uses are among the largest sources of anthropogenic carbon emissions [10]. In this context, the implementation of the Paris Agreement on climate change and the 2030 Agenda for Sustainable Development (adopted in 2015) by UN member nations require quantitative studies that would provide essential data for monitoring forest dynamics [11,12,13].
The canopy is the uppermost layer of vegetation formed by branches and leaves of tall trees overlooking the undergrowth in a forest [13]. Canopy height is a key structural parameter for monitoring forest biomass, and its precise estimation is crucial for quantifying biophysical parameters like aboveground biomass, carbon storage, biodiversity and many other parameters to which it is strongly linked [14,15,16]. Traditional manual forest inventories provide accurate information but are labor-intensive and limited in scale. Consequently, remote sensing has been employed with field measurements to estimate canopy height over large spatial extents [16,17]. The remote sensing observations are provided by various platforms. These include, for instance, multi-spectral optical data that are derived from Landsat [18,19], Sentinel 2 [20,21] or SPOT5 [22,23], as well as synthetic aperture radar (SAR) data from Sentinel 1 [24,25], TerraSAR-X [26,27], TanDEM-X [28,29] or ALOS PALSAR [30,31]. Signal saturation is often a substantial limitation when using optical data or short-wave radar data (X and C bands), particularly in dense forests [32,33,34,35]. The introduction of LiDAR (Light Detection And Ranging) has enabled notable advances in the estimation of canopy height because of their capacity to detect vertical structures in the forest [16]. The most frequently used applications in forestry are based on telemetry from airborne laser scanning (ALS) [36,37] and terrestrial laser scanning (TLS) [38,39]. However, ALS and TLS exhibit spatiotemporal limitations, given that it is generally difficult to apply them over large areas and on a regular basis due to their high costs of acquisition and signal occultation, particularly in dense forests [40,41,42,43].
Airborne or satellite platforms have made it possible to extend canopy height estimation from local to global spatial scales [13]. The first platform, i.e., Ice, Cloud, and Land Elevation Satellite (ICESat), carried the Geoscience Laser Altimeter System (GLAS). Between 2003 and 2009, this sensor made it possible to estimate the height of forests on a global scale on circular footprints with a diameter of about 60 m and a spacing of about 170 m along the transects [9,44,45,46]. The second ICESat-2 satellite was launched in September 2018 and carried the Advanced Topographic Laser Altimetry System (ATLAS), which uses photon-counting LiDAR technology. Between 88°S and 88°N, the laser produces three pairs of beams, thereby making it possible to obtain altimeter parameters on Earth’s surface for continuous monitoring of polar glaciers. In parallel with its main mission, this satellite also acquired terrestrial measurements of forest cover and vegetation. For the scientific community, this represents an important database for mapping plant biomass and for estimating carbon inventories at a global level [17,47]. The latest LiDAR instrument that was launched into space by NASA in December 2018 is the Global Ecosystem Dynamics Investigation (GEDI) system, which operates aboard the International Space Station (ISS) between 51.6°N and 51.6°S. GEDI is a multi-beam laser altimeter that measures parameters of vertical canopy structures at a very high sampling rate, thereby allowing forest height and wood volume estimation across different types of forest ecosystems, topography, and latitudes [13]. Data that are acquired by GEDI have been increasingly used to estimate forest height and forest biomass [48,49,50]. These data consist of an impressive number of samples, which offer great potential for estimating canopy heights in complex savanna and forest mosaics.
ICESat-2 and GEDI LiDAR data are point clouds that permit the height of forest cover to be estimated, but only within ground acquisition footprints rather than in a spatially continuous manner over large areas [51]. In contrast, optical or radar data provide continuous spatial coverage but cannot, alone, allow the direct extraction of vertical profiles of the canopy. Therefore, the complementarity of different types of data can be exploited to map the height of the forest cover [52,53,54,55]. To accomplish this task, machine learning models are being increasingly used to combine vertical LiDAR profiles with spectral or backscatter attributes [16,56]. For example, Li et al. [15] used variables derived from Sentinel 1 and 2 and Landsat-8 images over Northeast China to extrapolate ICESat-2 canopy height from the footprint level to the regional level, using Deep Learning (DL) and Random Forest (RF) models, with r correlations of 0.78 and 0.68, respectively. Zhu et al. [9] used stepwise regression and RF approaches to estimate canopy heights in the United States. They obtained better results with RF using GEDI variables (R2 = 0.93; RMSE = 2.99 m) compared to ICESat-2 (R2 = 0.78; RMSE = 4.62 m). Sothe et al. [57] carried out continuous mapping of the forest cover height of Canada from the combination of GEDI and ICESat-2 data with PALSAR and Sentinel data. They found that both LiDAR products overestimated canopy height compared to ALS data, but GEDI outperformed ICESat-2, with an average difference of 0.9 m vs. 2.9 m and RMSE of 4.2 m vs. 5.2 m, respectively. To map China’s forest canopy heights, Liu et al. [58] used neural network-guided interpolation to merge GEDI and ICESat-2 data. They then compared the height of the forest cover that was interpolated with the GEDI validation footprints (R2 = 0.55; RMSE = 5.32 m), followed by drone-LiDAR validation data (R2 = 0.58, RMSE = 4.93 m) and, finally, with field-collected data (R2 = 0.60; RMSE = 4.88 m).
The aforementioned examples show real potential for using GEDI and ICESat-2 data alone or in combination with other spatial data. However, they also raise several questions, which depend upon the ecosystems that are being considered. Of particular interest is the following: Can the use of GEDI or ICESat-2 data (alone or in combination) with multisource optical or radar satellite observations make it possible to estimate satisfactorily the canopy heights of complex mosaics of forests and savannas in a tropical environment? Our study attempts to answer the question through analyses of the forest–savanna mosaics of the Sudano–Guinean zone of West Africa, particularly in Togo, where research of this type is almost non-existent. Our main objective is to develop models for estimating the height of the canopy in forest–savanna mosaics using a combination of space LiDAR, optical data, and radar data. The specific objectives that we pursued are (1) to analyze the performance of models predicting canopy height using ICESat-2 and GEDI and covariates in these forest ecosystems; (2) to develop canopy height prediction models that are adapted to forest–savanna mosaics; and (3) to generate continuous mapping of canopy height in these forest types from these discontinuous satellite LiDAR data. To accomplish these goals, optical and radar co-variables, such as spectral reflectances, vegetation indices, texture, and backscatter variables, are derived from the spatially continuous satellite data, which were then integrated with those derived from ICESat-2 and GEDI, using machine learning models.

2. Materials and Methods

2.1. Study Area

Our study was conducted in Ecological Zone 4, southwest Togo. The Togolese Republic is a coastal nation in West Africa that is bordered on the north by Burkina Faso, by the Atlantic Ocean (Gulf of Guinea) to the south, by Benin to the east, and by Ghana to the west. The country is subject to a tropical Sudano–Guinean climate, with rainfall varying across four seasons from 1000 to 1600 mm/year in the southern regions. The average temperature is 27 °C [59]. Between 1939 and 1957, the classification of 14.2% of the country’s land area into protected areas (classified forests, national parks and reserves) served to preserve its forest cover. Today, many of these areas have been encroached on by human populations seeking arable land and wood for energy. Vegetation types are composed of Sudano–Guinean forests, which are located in mountainous areas of the country, gallery forests along the main waterways, dry forests or dense tree savannas in the arid northern half, and tree savannas in the south and center. Five ecological zones have been designated, spanning the country.
Located in the southern Togo Mountains, Ecological Zone 4 (6397 km2) is one such subdivision that is characterized by the landscape variability of its ecosystems [60]. Always known as the most heavily forested of the country’s ecological zones, Zone 4 is dominated by interspersed semi-deciduous forest and mosaics of Guinean savanna, with the latter having been degraded in recent years by the combined effects of slash-and-burn agriculture, vegetation fire and logging [61,62]. The natural vegetation of this zone is mainly made up of species such as Cola gigantea, Millettia thoningii, Morinda lucida, Sterculia tragacantha, Antiaris africana, Holarrhena floribunda, Margaritaria dioscoidea, while planted agroforestry species include Coffea arabica, Theobroma cacao, Mangifera indica, Albizia gizya, Albizia adiantifolia, Persea americana, Anacardium occidentale, Tectona grandis, Senna siamea [61]. Over the past three decades, this zone has lost more than 27% of its forest cover. Yet, it remains the most forested and least degraded of the country’s five ecological zones, with 63.59% of forest cover compared with 9.74% to 29.02% in the other ecological zones [63]. These particular ecosystems, sometimes made up of sparse forests or small, low-growing branching trees, present major challenges for studies based on satellite and spatial lidar data [61,63]. The study area is presented in Figure 1.

2.2. Data Collection and Preprocessing

The data that were collected and organized prior to use in this research come from seven sources (see the methodological flowchart in Section 2.3), separated into two main categories. The first category concerns remote sensing data, notably optical, radar, topographical, and satellite LiDAR sources; the second category consists of dendrometric field data and ancillary data. The general parameters of these data sources are summarized in Table 1, while the variables that are extracted and their extraction methods are described in Section 2.
Spatially discontinuous data from the Global Ecosystem Dynamics Investigation (GEDI) and Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) satellites have been downloaded from NASA’s Land Processes Distributed Active Archive Center (LPDAAC) website (https://lpdaac.usgs.gov/, accessed on 10 February 2020). It should be noted that GEDI data are products of the International Space Station (ISS). The granules are GEDI data reduced in size from one full ISS orbit to four segments per orbit [66]. The structures of the ground footprints of these satellite LiDAR data are illustrated in Figure 2a,b for ICESat-2 and GEDI, respectively.
Figure A1 (Appendix A) shows elevations of the ground surface (a) and canopy top (b) extracted from photon returns in ATL08 data acquired by ICESat-2 in the southwest of the city of Badou (7°35′8″N, 0°36′33″E). Continuous Sentinel 1, Sentinel 2, and SRTM data were collected from archives of the Google Earth Engine platform. It should be noted that the term “radar” used in this research refers only to data or variables from Sentinel 1 C-band dual-polarization SAR and does not include other types of SAR with longer wavelength bands.
Regarding field data, a field campaign conducted from October 2020 to February 2021 collected dendrometric parameters within 303 rectangular ICESat-2 footprints (17 m by 100 m). In these footprints, In these footprints, we measured tree total height using a clinometer (Suunto Oy, Vantaa, Finland), and tree diameter at breast height (DBH, 1.3 m) using a diameter tape, for all trees with a diameter of 10 cm or greater. We supplemented these data with measurements of 877 plots from the second National Forest Inventory (IFN2). Additionally, we utilized an existing 2020 land use map as auxiliary data to identify the most forested areas.

2.3. Methods

To develop canopy height prediction models, we extracted and processed multisource variables from the collected data. The modeling process incorporated relative canopy heights and other variables derived from satellite LiDAR data (GEDI and ICESat-2), along with variables extracted from spatially continuous data. Following preprocessing, we extracted prediction variables from radar (Sentinel 1), optical (Sentinel 2), and topographical (SRTM) data. These variables include native bands, vegetation indices, texture measurements and topographical variables. Figure 3 illustrates the methodological flowchart of this study, depicting the utilization of these diverse data sources.

2.3.1. Feature Extraction

From Sentinel 1, Sentinel 2, and SRTM data, variables were extracted or calculated to serve as independent variables when predicting canopy height. The latter is made up of native bands, vegetation indices, texture measurements, and topographical variables from the most used [67,68]. They included 29 variables for radar data, 28 for optical data, and 3 for topographical data. We used JavaScript code to extract these continuous variables from GEE archives. Table A1 (Appendix B) contains the summary list of the total of 60 variables that were resampled at 30 m resolution, particularly those variables which were not already measured at 30 m so that they could be used with the other covariates. Formulas used for calculating some variables, together with their respective references, are provided in Table A2 (Appendix C).
From the GEDI L2A granules of the GEDI data, we extracted location parameters (latitude, longitude) and relative height values as percentiles (RH50, RH55, RH60, RH65, RH70, RH80, RH85, RH90, RH95, RH98 and RH100), which are frequently used for canopy height prediction. Other variables that are considered useful for modeling forest structure, such as beam type (coverage, full power), data quality indicator, and sensitivity of the waveform to penetrate vegetation, were extracted. For the ICESat-2 data, we extracted the powerful beams from ATL08 products, representing the best option for detecting ground and canopy photons, according to Neuenschwander and Pitts [69]. We used Matlab R2018b when extracting these ICESat-2 and GEDI variables. Relative heights were also calculated for each of the plots, given that the field measurements were taken from individual trees.

2.3.2. Dataset Preparation

Since the variables extracted from the data could not be used directly to develop the canopy height models, they were prepared for that purpose. This involved validating the relative heights, filtering the data, and calculating zonal statistics.
  • Validation of satellite LiDAR data
In order to use the relative heights extracted from ICESat-2 data as predictors of canopy height, we validated them with those that were collected from their ground footprints. To this end, they were compared with those calculated from field data. Given that the field data were collected on individual trees, we first aggregated them into data which could be easily compared to those extracted from ICESat-2 by calculating their statistics per plot. For each plot, the statistics calculated from the field data are the minimum, the maximum, the first quartile, the mean, the median and relative heights of 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, and 98%.
The variable that was extracted from the ICESat-2 data and used for this comparison with field data is the relative height at 98% (RH98), designated h_canopy. Once these field variables had been calculated, the latter were matched with them by plot in an Excel table, which was used to establish a correlation matrix. The Pearson correlation values obtained between the field variables and relative height, h_canopy, allowed us to use it as reference data for modeling canopy height. It should be noted that this validation using field data only concerns ICESat-2, given that we did not have field data from the GEDI footprints to conduct a similar analysis.
  • Data filtering
Since each dataset is characterized by different internal parameters, we applied several preliminary filters to retain only the relevant ones for the analysis [70]. During the extraction of relative heights, other variables were extracted from satellite LiDAR and auxiliary data to use them as filtering parameters to clean the database used for modeling [71].
For ICESat-2, we considered the number of canopy-classified photons (n_ca_photons) and spacecraft orientation (sc_orient). Footprints with low canopy photons were eliminated. The sc_orient parameter (0 or 1) was used to identify strong and weak beams based on satellite ascension and ground track position [72,73]. For GEDI, we utilized beam type (coverage or full power), data quality indicator (Quality_flag), waveform sensitivity to vegetation penetration (Sensitivity) and firing time (delta_time) [74,75,76]. These parameters allowed for footprint selection according to nine configurations presented in Table 2.
The geographical coordinates extracted from the ICESat-2 and GEDI data were used to display them on a GIS land cover map to identify the land cover classes into which each of their footprints fell. This action made it possible to select footprints that fell into the forest classes and to remove from the database those that fell into other classes, such as “crops and fallow land”, “buildings and bare soil,” and “grassy savanna”. The data filtering stage, therefore, made it possible to remove from the database any values that could have added noise to the modeling process or reduced the quality or performance of the prediction models being developed [72,73]. Filtering in relation to the location of LiDAR data footprints in forest areas was conducted in ArcGIS 10.8.1. Filtering of other parameters that were extracted from LiDAR data was conducted with a code developed for this purpose using Python 3.10.12.
  • Calculation of zonal statistics
To properly integrate the data during modeling, zonal statistics were calculated on the continuous variables from the ICESat-2 and GEDI data footprints. For each variable extracted from the Sentinel 1, Sentinel 2, and SRTM data, the mean, median, and standard deviation of the pixel values were calculated for each footprint of the satellite LiDAR data superimposed upon it. These three calculated statistics were evaluated at the start of modeling to select the one that provided the most accurate canopy height prediction results. Figure A2 (Appendix D) illustrates an example of the GEDI and ICESat-2 footprints superposed over Sentinel 1 for the calculation of zonal statistics.
For each of these variables, the statistics were matched with relative heights and other variables extracted from the satellite LiDAR data to form a table for each variable. These data tables per variable were then grouped by data source (optical, radar and topographic) to provide the databases used in modeling. Once the data had been extracted and prepared, we proceeded to the models’ development.

2.3.3. Modeling

The extracted and prepared variables were used for modeling, which consisted of variable selections, predictive model development, and evaluation. Given that these continuous variables provide different information, we analyzed different scenarios for combining them, as presented in Table 3, to make the best possible choices.
To explore the influence of height classes on modeling, we divided the data into two groups of height classes. Because of their small number, the data with canopy heights ≤ 5 m (]2–5]) formed the first class in these two groups. Group 1 contained seven classes increasing in 5 m increments from the upper bound of the first class (i.e., (]2–5]), (]5–10]), (]10–15]), etc.). Group 2 contained 10 height classes in which the step size or increment was set at 3 m relative to the upper bound of the first class (i.e., (]2–5]), (]5–8]), (]8–11]), etc.). Like the first class, canopy heights > 30 m (]30–50]) constituted a separate class in each of the two groups. When data were filtered (see Section 2.3.2), individuals with heights < 2 m or >50 m were deleted from the database with reference to the field data.
The database created in the previous step is such that D = x 1 , h 1 , , x N , h N , where x i = x i , 1 , , x i , P , N is the set of sampled GEDI or ICESat-2 footprints, P R represents the set of extracted attributes, and h N represents the canopy height value that was extracted from the N observations of GEDI or ICESat-2 footprints. With these input data ( D ), one part ( D T = 80%) trained the models, and the other part ( D t = 20%) tested these models to evaluate their performance.
  • Features selection
We used feature importance estimation during the feature selection process, which is a crucial step in machine learning workflows, to identify and focus on the most relevant variables contributing to model prediction. Random Forest (RF) constructs decision trees using bootstrap samples of the training data, with each tree using a random subset of features at each split to promote diversity [77]. “Out of bag” (OOB) samples, not used in training individual trees, are essential for estimating feature importance. In RF, feature importance is typically measured by the mean decrease in impurity (MDI) or the mean decrease in accuracy (MDA) [78,79]. The OOB error is calculated before and after randomly permuting each feature for every tree, with the average difference in these errors across all trees providing an estimate of feature importance [80,81].
Additionally, we employed the recent SHAP (SHapley Additive exPlanations) approach for a more interpretable feature importance assessment. SHAP is a game-theoretic method that explains machine learning model predictions, enhancing interpretation [82,83]. It provides a unified, model-agnostic framework for explaining predictions by attributing importance to input features, thus improving model transparency and trustworthiness [84,85,86]. When testing the four modeling algorithms, we applied the SHAP method [87,88], which enabled us to select certain predictors with significantly greater impacts on model performance.
  • Development of prediction models
We evaluated four machine learning (ML) algorithms, namely Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN). The goal was to assess their relevance in order to select the best model, which was automatically optimized using an automated learning tool. The objective is to construct a model M θ according to the equation
M θ : h ^ = f θ x N
with θ representing parameters and hyperparameters of the model to be learned and h ^ is predicted height. This model should be capable of predicting the height that minimizes the cost function
a r g m i n x D T l h ^ , h
where l represents the loss function. Equation (2) is a general form of the loss function, but here, we have used the Root-Mean-Square Error (RMSE) as the measure of loss.
For more details on the mathematical formalisms of the algorithms tested, see Appendix E for a brief presentation. Our objective here is to minimize this RMSE by reducing the discrepancy between the predicted and actual values. To achieve this goal, we employed model optimization techniques that enhance the models’ ability to perform effectively on new unseen data. To address this, we implemented a grid search technique which explores various parameters for identifying the optimal set of hyperparameters [89]. Additionally, we applied K-Fold cross-validation, a robust evaluation strategy that segments the dataset into multiple subsets, across which a comprehensive assessment of the models’ performance is performed. It thereby effectively mitigates the risk of overfitting and enhances the models’ overall reliability and generalizability. The ranges of these selected hyperparameters are shown in Table 4 below.
We applied the grid search method consistently across all four models. While this technique automates the search process, it is important to note that its effectiveness is still largely contingent on the manually defined search space [90,91]. Yet, these ML models are configured by a set of hyperparameters with values that can substantially affect their performance, which means that we cannot know whether a given technique is truly better or simply better tuned [89,92]. To address this limitation and further streamline the optimization process, we employed two of the most popular Automated Machine Learning (AutoML): TPOT (Tree-based Pipeline Optimization Tool) and AutoGluon. This advanced approach takes into account the particularities of the input data and offers a more sophisticated and hands-off method for fine-tuning hyperparameters, leading to optimal performances [93,94]. These AutoML approaches help us to compare their output models performances with that of the algorithm that offered the most accurate results among the four being tested.
  • Performance evaluation of the developed models
ICESat-2 canopy height was validated using Pearson’s Correlation Coefficient (r) with reference to the measurements before being used as reference data for canopy height prediction modeling. The prediction models and canopy height maps that were subsequently produced were evaluated, comparing one to another and with existing models. The comparisons were made using traditional performance indicators, i.e., correlation r, RMSE (Root-Mean-Square Error) and MAE (Mean Absolute Error) to retain the models that stood out in terms of their performance. Equations (3) to (5) show the mathematical expressions of the three indicators:
r = i = 1 n x i x ^ y i y ^ i = 1 n x i x ^ 2 i = 1 n y i y ^ 2 ,
R M S E = 1 n i = 1 n x i y i 2
and
M A E = 1 n i = 1 n x i y i
where x i represents the ith observed value of the canopy height data extracted from ICESat-2 or GEDI, and y i is the ith predicted value; x ^ and y ^ are, respectively, the means of all x i and y i , while n represents the total number of canopy height samples from ICESat-2 or GEDI.
These parameters were used to validate models developed from ICESat-2 data on the basis of data collected in the field from their footprints. In order to ensure that the developed models can generalize to areas not included in the training process, we performed an independent validation of the developed models outside the training data. To do this, we used the predicted canopy heights made with these models on the plots of the second National Forest Inventory (NFI2), even outside the study area, to compare them with the in situ data (see Section 3.6). The same evaluation parameters were calculated for this independent validation to evaluate the models’ performance for generalization. These parameters also enabled us to compare the cartographic products of our study to those of other authors.

2.4. Forest Height Mapping and Comparison with Existing Products

Following the evaluation of our developed models, those obtaining the best performance in predicting canopy height on the satellite LiDAR footprints were retained for cartographic inference. Indeed, the execution of these models yielded predicted canopy heights in only LiDAR footprints, leaving blank areas between them. To move from spatially discontinuous to continuous data, we first created a stacked multi-band image, using the variables that had contributed the most to prediction, according to variable importance defined in Section 2.3.3. Given that each band in this stacked image was considered a predictor, the execution of the trained model uses the content of each pixel and their corresponding values in the underlying bands to produce a new height pixel. Several iterations of the process on all pixels of the stacked image complete the image of the heights that is produced. Figure A3 (Appendix F) illustrates the cartographic inference that is performed with the models, which were developed from GEDI or ICESat-2 data. The resulting canopy height maps were formatted in ArcGIS.
To analyze the differences between the two types of LiDAR data that were used, some statistical metrics of the map resulting from cartographic inferences with the GEDI-based model were compared with those obtained using the ICESat-2-based model. These two maps were also compared to similar data existing locally or globally to obtain an understanding of the particularities that are related to our study area. The performance evaluation parameters that are presented in Section 2.3.3 were also used during these comparisons. In summary, from satellite lidar data and multisource satellite data, we extracted variables that were prepared and selected for the development of forest canopy models. These models were then used for continuous canopy height mapping of the study area. These maps were analyzed and compared with each other and with existing maps in the area. The methodology described above enabled us to obtain the results presented in Section 3 below.

3. Results

3.1. Validation of the Reference Data

The correlation matrix presented in Table 5 compared 98% relative heights extracted from ATL08 data of ICESat-2 with those calculated from field data. In this table, the values that were derived from the field measurements are shown, ranging from the minimum value to the 98% relative heights. They are compared with one another and were compared respectively with h_canopy, which is RH98 height that was extracted from the ICESat-2 data.
The last row of the correlation matrix summarizes the correlations (r) between h_canopy and the field data (Table 5). These positive correlations range from weak to moderate associations, i.e., the last being 98% relative height (r = 0.53; RMSE = 4.85; MAE = 3.84). We used these ICESat-2 data as reference data and this relative height as a predictor of canopy height in the different modeling scenarios examined in this study.

3.2. Selection and Combination of Multisource Variables

The scenarios where different combinations of variables were applied in preliminary modeling allowed not only to evaluate the importance of variables but also to test four different algorithms: RF, SVM, XGBoost, and DNN. The data source variables used in the different tested scenarios S1 to S7 can be seen in Table 3. Table 6 indicates, in the form of a heatmap, the performance metrics of the canopy height prediction models for the seven scenarios applied to the four algorithms. At this stage, only ICESat-2 data were used in order to select the scenario and the algorithm to be used in all final modeling of this study, both ICESat-2-based models and GEDI-based models.
The performance metrics for S7 across the four models show weak consistency (Kendall’s W = 0.48, p = 0.23, W = 1, where there is complete agreement among rankings). RF ranked highest, while DNN performed poorest, but this poor performance of DNN, despite its higher complexity, would be linked to the fact that the dataset lacks the feature dimensionality required for its effective training. Among the seven RF scenarios, the three metrics (r, RMSE, MAE) demonstrated strong consistency (W = 0.865, c2r = 15.57, df = 3, p = 0.0014), which is reflected in the heatmap. RF scenarios ranked from worst to best: S3 (Topographic) ≤ S2 (Radar) ≤ S6 (Radar–Topographic) < S1 (Optical) = S4 (Optical–Radar) < S5 (Optical–Topographic) ≤ S7 (Optical–Radar–Topographic). Figure 4 illustrates variable importance for RF/S7, with optical variables contributing most significantly to the model.
During modeling, the SHAP method allowed us to select about 20 predictors that exerted the strongest effects on model performance. Indeed, the importance of these features in estimating canopy height was further evaluated. For example, Figure 5 depicts the results for RF and XGBoost algorithms.
Fourteen importance features were common to the two algorithms. Yet, in the absence of unique features, those features that were shared by RF and XGBoost were very consistent in their respective rankings (W = 0.953, c2r = 24.77, df = 13, p = 0.025). Graphs of other evaluations of the impact of variables in models’ development using the SHAP method are presented in Figure A4 in Appendix G.

3.3. Modeling Canopy Height Using ICESat-2 Data

Variables selected during the preliminary modeling stage and the RF algorithm that resulted in remarkable performances allowed us to obtain the following results. It must be emphasized that only footprints containing more than 50 photons were considered to ensure that representative data were analyzed. Ultimately, 9781 ICESat-2 footprints were distributed in two groups of height classes that were defined in Section 2.3.3. The results of different metrics obtained during the training and testing phases with the RF algorithm are reported in Table 7 below for ICESat-2 data modeling.
When modeling by dividing the data by height class, slightly higher metrics were obtained with Group 2 (r = 0.58; RMSE = 5.33; MAE = 3.92) compared to Group 1 (r = 0.51; RMSE = 5.46; MAE = 4.05), but no group improved model performance over that without a grouping.
Subsequent use of AutoML TPOT (see Section 2.3.3) made it possible to optimize the learning architecture by intelligently exploring thousands of pipelines (i.e., processing chains from preprocessing to modeling). This exploration made it possible to find the appropriate pipeline and automatically choose the appropriate hyperparameters for the model that best suited our data. It should be noted that when running an AutoML model, cross-validation is performed at each iteration during the training phase.
TPOT contributed to an improved performance of the canopy height prediction model that was developed. We refer to this model as rf_icesat-2_rh98, a designation that refers to the RF algorithm, ICESat-2 data, and RH98 from which it was developed. The metrics obtained with this model are r = 0.65; RMSE = 5.10; and MAE = 3.80. The regression curve, indicating the dispersion of heights extracted from ICESat-2 compared to predicted values, is shown in Figure 6. An initial observation of the distribution of predictions on this graph compared with the 1:1 line shows that the model tends to overestimate small canopies (<10 m), to correctly estimate medium canopy heights ([10–20 m]) and to underestimate large canopies (>20 m).
AutoGluon was also applied to improve modeling performance. Given that it is a very complex ML system, AutoGluon is computationally very intensive, resource-intensive, and difficult to debug and may make inappropriate assumptions regarding both parameters and data types [58,95,96]. The results that we subsequently obtained (r = 0.64; RMSE = 5.12; MAE = 3.83) were slightly higher than those of a simple RF (see Table 6), yet they remain very close to those obtained with TPOT.

3.4. Modeling Canopy Height from GEDI Data

The same tools and methods that were used with the ICESat-2 data to inform the choice of hyperparameters and selection of variables with strong contributions to modeling were applied to predictions from the GEDI data. Table 8 summarizes performance metrics for nine configurations of canopy prediction models defined during data filtering (Section 2.3.2) and developed from GEDI.
Table 8 presents a heatmap of accuracy metrics for 63 prediction models that had been established, with seven relative heights under the nine configurations (see Table 2). For reference purposes, these models are designated as rf_gedi_configx_rhy, i.e., the model that was established with the Random Forest algorithm, based on GEDI data in configuration x, with relative height y, where x is the configuration number ranging from 1 to 9, and y is the percentage of relative height increasing from 75 to 100% in 5% increments. Through the use of the simple RF algorithm and considering only Pearson coefficients, the rf_gedi_config9_rh98 model attained relatively high performance (r = 0.80) compared to the other models.
The heatmaps for Pearson’s r, RMSE, and MAE metrics show similar trends: Configuration 9 performs best, while Configurations 6, 2, and 1 are worst. Other configurations fall between these extremes. Ranking correlations across configurations for each relative height category revealed strong concordance for Pearson’s correlations (Table 8). Visual assessments of r color ratings aligned with RF configuration rankings. RMSE and MAE rankings also showed consistent ordering from worst to best performance. Despite high concordance, rank ordering varied slightly among metrics, as error estimates increased with relative height percentiles, especially in worst-performing scenarios. The results of different metrics obtained during the training and testing phase with the RF algorithm are reported in Table 9 below for GEDI data modeling.
It should be noted that the results of modeling with the RF algorithm from RH98 in Group 1 (r = 0.73; RMSE = 4.95; MAE = 3.65) and in Group 2 (r = 0.74; RMSE = 4.93; MAE = 3.66) also remain lower than those obtained for the GEDI data without the height classes. The use of AutoML TPOT allowed us to produce a model with much higher performance (r = 0.84; RMSE = 4.15; MAE = 2.36) that was based on RF.
In total, 28,478 GEDI footprints that met the filtering criteria of Configuration 9 produced the more efficient model rf_gedi_config9_rh98. The regression curve in Figure 7 indicates a strong dispersion of GEDI-based relative heights compared to the predicted values. Here again, according to the distribution of predictions in relation to the 1:1 line, it can be seen that the model slightly overestimates small canopies (<10 m), makes a relatively better estimate of medium canopy heights ([10–25 m]) and underestimates larger canopies (>25 m). But this distribution is much more aligned or oriented more closely with the 1:1 line than the scatterplot depicted in Figure 6.
The application of AutoML AutoGluon to the GEDI data resulted in a model with good performance (r = 0.83; RMSE = 4.16; MAE = 2.65) compared to using the RF algorithm alone. The results are comparable to those obtained with AutoML TPOT. Table 10 shows the effects of using AutoML TPOT and AutoGluon in improving the performance of models developed with both types of satellite LiDAR data compared to models developed simply with the RF algorithm.
Both sources of data showed consistent improvement in performance metrics with the application of AutoGluon to the RF model, which was then followed by an improvement with TPOT, with progressively increasing r and progressively decreasing error values with each improvement (W = 0.975, c2r = 14.62, df = 5, p = 0.012). The mean (±SD) Pearson coefficient for GEDI was 0.823 (±0.021), while that of ICESat-2 was 0.633 (±0.021). RMSE and MAE estimates were consistently lower for GEDI compared to ICESat-2. The three performance metrics obviously differed between the two datasets, significantly so (1 − df directed contrast: Z = 11.57). The application of AutoGluon produced metrics that were slightly lower than those obtained with TPOT, which exhibited the best performance. We then continued analyses of the results of this study with the models developed using the TPOT method.

3.5. Forest Canopy Height Mapping from Developed Models

3.5.1. Forest Canopy Height Map Created from the ICESat-2 Based Model

The rf_icesat-2_rh98 is the model that was selected from the analysis of the ICESat-2 data. It was used to produce the continuous canopy height map of the study area at a spatial resolution of 30 m. The map encompassing Ecological Zone 4 is presented in Figure 8.
Regarding the performance of the model used to produce this map, the predicted minimum and maximum canopy heights are, respectively, 4.20 m and 38.75 m, while the predicted mean height (±SD) is 14.26 m (±4.24 m). Figure 8b present patterns that are very similar to what can be observed in the forested areas as evidenced by Google Earth high-resolution images (d). Likewise, Figure 8c present patterns that are very similar to what can be observed in the less forested areas as evidenced by Google Earth high-resolution images (e).

3.5.2. Forest Canopy Height Map from GEDI-Based Model

After all analyses were completed and improvements were made to the developed models (see Section 3.4), the rf_gedi_config9_rh98 model was retained for the GEDI data. It was used to produce a continuous canopy height map of the study area at a spatial resolution of 30 m. This map is presented in Figure 9 and follows the same format as Figure 8.
This continuous map is likewise the result of predicting canopy height using the model that learned best from GEDI (spatially discontinuous) data, relying upon a combination of continuous multisource variables, which provided the most accurate results when developing the model. Given the performance of the model used to produce this map, the predicted minimum and maximum canopy heights are, respectively, 2.56 m and 44.20 m, while the mean (±SD) height is 11.23 m (±5.17 m). Figure 9b present patterns that are very similar to what can be observed in the forested areas as evidenced by Google Earth high-resolution images (d). Likewise, Figure 9c present patterns that are very similar to what can be observed in the less forested areas as evidenced by Google Earth high-resolution images (e).

3.6. Comparative Analysis of Developed Models with Existing Products

No similar products exist for comparable studies that have been conducted locally in our study area to which our results could be compared. Regression was conducted on the field data collected from the ICESat-2 footprints, and those values predicted from these same plots made it possible to validate the model that was developed with these data (r = 0.54; RMSE = 3.11; MAE = 2.54). Given that we did not have field data for footprints of the GEDI products to which we could compare predictions of the optimal model established from these data, we used relative heights derived from data collected in the second National Forest Inventory (NFI2) to perform a similar regression. The only maps of canopy height available for the area are global maps that had been created by Lang et al. [97] and Potapov et al. [10]. From these maps, the canopy heights were extracted from footprints of the ICESat-2, GEDI, and IFN2 plots to compare them with their predicted values. The results of the different linear regressions between the extracted or existing data and our models’ predictions in the study area and on NFI2 plots. Performing these predictions and estimating correlations between them, even outside of the study area, have therefore allowed us to ensure that these models can generalize across different geographic areas (Table 11). For the sake of simplicity, we refer to Lang et al. [97] and Potapov et al. [10] as “Lang” and “Potapov,” respectively, in the following table and figures related to their canopy height mapping products.
This table shows that the predictions of canopy height made with the model based on data extracted from ICESat-2 are more closely correlated with those of the Lang map (r = 0.71, RMSE = 3.38, MAE = 2.55) than they are with those of Potapov (r = 0.62, RMSE = 3.80, MAE = 2.93). Similarly, predictions of canopy height using the model based on data extracted from GEDI are more consistent with Lang’s map (r = 0.65, RMSE = 5.50, MAE = 4.17) than with Potapov’s (r = 0.55, RMSE= 6.04, MAE = 4.64). On the other hand, the heights measured during NFI2 are closer to the canopy height estimates with the GEDI-based model (r = 0.63, RMSE = 3.40, MAE = 2.65) than they are to those predicted with the ICESat-2-based model (r = 0.55, RMSE = 3.65, MAE = 2.98).
In order to better compare the canopy height maps produced during this study, both with each other and with those of other authors, most notably Lang and Potapov, we performed image subtractions and then analyzed the results. Histograms of the maps resulting from the models developed by this study and those of their differences are presented in this section. The maps relating to these different images’ subtractions can be consulted in Figure A5, Figure A6 and Figure A7 in Appendix H, while the related discussions are presented in session 4. Figure 10 presents the histograms of the GEDI-based map, the ICESat-2-based map, and the map resulting from differences between these two maps.
The analysis of the histogram of the map that is based on GEDI (Figure 10a) reveals that more than 300,000 pixels have heights less than 5 m, while ICESat-2 (Figure 10b) shows less than 50,000 pixels in that range. This implies that the prediction model based on GEDI is much more sensitive to shorter canopies than ICESat-2. This response is also observed in the histogram of the resulting difference map (Figure 10c), which indicates most pixels are negative. The analysis of this last histogram further reveals that the model developed from ICESat-2 data overestimates heights most of the time compared to that developed from GEDI data. Nevertheless, the average of the deviations is relatively small, and it should be noted in this histogram (Figure 10c) that the curve of the normal distribution is completely flattened in the tails, tending towards 0. This means that pixels with large positive or negative deviations are very few and that the source maps from which these difference maps are derived are relatively close to one another. The results presented above meet the objectives of this study, and the models developed could be used to estimate the canopy height of forest–savanna mosaics in the Sudano–Guinean zone of West Africa. Nevertheless, some aspects that deserve to be nuanced, explained, or deepened are discussed in Section 4 below.

4. Discussion

This study combined multi-spectral, radar, and topographic data with spaceborne lidar data to develop canopy height prediction models in the forest–savanna mosaics of West Africa’s Sudano–Guinean zone. Variable importance in model development depends upon the method that is used for the assessment and the algorithm that is used for this purpose. For example, Figure 5 reveals that in predicting responses with the RF model (Figure 5a), the 10 most important covariates (in order) are swir2, swir1, slope, elevation, ndbi, ndii, vari, s1vv, mndwi, and blue bands. In contrast, predictions made with the XGBoost model (Figure 5b) consisted (in order) of 10 covariates: swir1, slope, ndbi, swir2, elevation, vari, nirnarrow, rededge2, blue, and arvi.
The results of this study reveal that optical and topographic data contributed most in the models’ development, while radar data contributed minimally. Our results are consistent with Xi et al. [17], who also noted that vegetation indices and topographic information from Sentinel-2 and SRTM data, respectively, contributed much more effectively to the establishment of canopy height prediction models compared to texture measurements and backscatter variables from Sentinel-1 radar data. These results are also supported by the study by Luo et al. [98], who concluded that Sentinel-2-derived variables significantly contributed to the canopy height estimation model, unlike backscatter coefficients and textural parameters derived from Sentinel-1. Radar backscatter signature depends on the spatial structure, dielectric properties, size, and geometry of the parts of the vegetation with which the signal interacts [99,100]. As our study area is characterized by small or generally spaced trees, a significant proportion of the total forest backscatter received by the radar sensor can be attributed to the forest floor. This situation may weaken or bias the sensitivity to the forest parameters in the C-Band Sentinel data considered [101,102]. The use of L- or P-Band radar data may eventually provide better contributions to canopy height estimation within complex tropical forest–savanna mosaics, such as our study area [97,103,104]. However, this is out of the scope of the current study and could be evaluated in forthcoming research.
In this research, GEDI data produced better models than ICESat-2, based on performance metrics and relative height RH98. In situ heights from NFI2 aligned more closely with GEDI model estimates. GEDI’s superiority over ICESat-2 is consistent with previous studies, even in different ecosystems. This is exemplified by Zhu et al. [9], whose GEDI models (RMSE: 3.61–4.23 m) outperformed ICESat-2 (RMSE: 4.76–10.23 m) when using relative height RH98 like in this research. This is also the case of Liu et al. [13], Liu et al. [58] and Zhu et al. [105], who likewise reported better estimates with GEDI. The possible reasons for the better performance of GEDI could be because ICESat-2 data, by averaging multiple laser pulses (about 143) over 100 m (Figure 2a), introduce imprecision, while GEDI data are more densely sampled with direct ground footprints (Figure 2b). This study collected ground data only in ICESat-2 footprints, limiting direct validation of GEDI estimates, even if the predictions of our developed models on the NFI2 plots outside the study area allowed us to form an idea of their scope for generalization. Future work should include ground data collection over GEDI footprints for improved validation in forest–savanna mosaics.
To compare our cartographic products with existing ones, we analyzed histograms of differences between our GEDI-based map and those of Lang and Potapov (Figure A8, Appendix I). The Lang model appears to slightly overestimate canopy heights compared to ours, while Potapov’s underestimates. These differences could be due to varying algorithms: we used RF, Lang et al. [97] used deep convolutional neural networks, and Potapov et al. [10] employed bagging regression trees. Future studies should investigate the effects of algorithm choice on model quality. The standard deviation of differences between our model and Lang’s is lower than with Potapov’s, suggesting our results are closer to Lang’s. However, both comparisons show relatively low mean deviations and normal distribution curves flattened at the tails, indicating overall similarity between the maps. Limited pixels with large deviations are likely due to image edge effects.
Histogram analysis showed that models sensitive to shorter vegetation perform better in our forest–savanna mosaics study area, characterized by small, scattered trees. The GEDI-based model outperformed ICESat-2, likely due to its higher footprint density and ability to estimate both tall and short canopies. Studies by Liu et al. [58] in China, Zhu et al. [9] in the United States, and Sothe et al. [57] in Canada showed good performance in more homogeneous forests, warranting further investigation into ecosystem effects on model performance. The choice of dependent variable significantly impacts model accuracy. We selected RH98 as optimal, aligning with Lang et al. [83], while Potapov et al. [10] used RH95. NFI2 data were closer to Lang’s map and our results than Potapov’s. Our study, along with Lang et al. [83] and Ngo et al. [92], confirms RH98 as the most appropriate metric for GEDI-based canopy height modeling.
Developing accurate canopy height prediction models requires ultimately considering the spatiotemporal coherence of independent covariates, ecosystem characteristics, and learning algorithms. Our study distinctively combines optical, radar, topographic, and satellite LiDAR data for forest–savanna mosaics canopy height estimation in the Sudano–Guinean zone in Togo, providing a foundation for future improvements in forest biomass modeling in such ecosystems.

5. Conclusions

In our quest for enhanced accuracy in estimating canopy height within the complex forest–savanna mosaics of the Sudano–Guinean zone in Togo, this study has made significant strides through innovative data integration and advanced modeling techniques. Our research leveraged a comprehensive approach, combining ICESat-2 and GEDI spaceborne LiDAR data with optical, radar, and topographic information. Among the four machine learning algorithms evaluated (RF, SVM, XGBoost, and DNN), Random Forest emerged as the most efficient in predicting canopy height across our study area. Notably, our GEDI-based model outperformed the ICESat-2-derived model, providing valuable insights for future remote sensing applications in similar ecosystems.
Our findings underscore the superiority of integrated data approaches over isolated data types in canopy height estimation. Optical and topographic data proved particularly influential, while radar data showed limited sensitivity. This observation opens avenues for future research to explore different radar data variables and their potential in forest structure analysis. Interestingly, our research revealed that prediction models based on grouping data by height classes did not improve performance compared to models without such classifications. These findings challenge assumptions and provide direction for future methodological approaches in canopy height modeling.
This study’s focus on the less-studied Sudano–Guinean zone of Togo adds valuable insights to the existing body of research on forest–savanna mosaics. By providing comprehensive performance metrics and comparing our results with existing global canopy height maps, we have contextualized our findings within the broader field of remote sensing and forest monitoring. Looking ahead, there’s a clear need to assess the effects of ecosystem characteristics on the quality of GEDI-based models, particularly in sparse and patchy vegetation typical of our study area.
In the short term, our results offer local decision-makers a valuable tool for forest management, favoring the use of GEDI data for canopy height estimation. Long-term implications include the pressing need for field dendrometric parameter estimation within GEDI footprints to further validate and refine our models. The validation of GEDI data and models in this eco-climatic zone is crucial for developing robust tools for estimating aboveground biomass and understanding forest dynamics and carbon fluxes. This research not only advances the methodology of forest canopy height estimation but also provides a foundation for adapting forest management practices to meet REDD+ requirements, contributing significantly to the evolving field of remote sensing in forestry and ecology, particularly in the challenging context of West African forest–savanna mosaics.

Author Contributions

Conceptualization: A.K. and K.G.; Methodology: A.K. and K.G.; Data collection: A.K.; Data processing: A.K., K.G. and G.A.F.K.; Modeling: A.K. and G.A.F.K.; Preparation of the initial draft: A.K.; Supervision: K.G.; Revision and Editing: A.K., K.G. and G.A.F.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Programme Canadien de Bourse de la Francophonie, through Global Affairs Canada (Government of Canada), under project number P-008649, as well as the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery grants RGPIN-2018-06101, RGPIN-2024-05199, and NSERC CREATE 543360-2020).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank all authors and reviewers for their guidance and contributions to the writing of this manuscript. We also thank students of the INFA (Tové), together with staff of the Laboratory of Botany and Plant Ecology (Faculty of Sciences, University of Lomé), for their invaluable assistance during the collection of the field data. W.F.J. Parsons translated the manuscript into English.

Conflicts of Interest

The authors declare no conflicts of interest. The funder had no role in the study design, data collection, analysis, interpretation, writing of the manuscript, or in the decision to publish the results.

Appendix A

Figure A1. Elevation of the soil surface (a) and the canopy surface (b), and photon returns. The counts of the different returned photons in the selected area are displayed (in square brackets) below the graphs in the image under the heading Confidence.
Figure A1. Elevation of the soil surface (a) and the canopy surface (b), and photon returns. The counts of the different returned photons in the selected area are displayed (in square brackets) below the graphs in the image under the heading Confidence.
Remotesensing 17 00085 g0a1

Appendix B

Table A1. Covariates extracted from Sentinel 1 (starting with S1), Sentinel 2, and SRTM data (the last three).
Table A1. Covariates extracted from Sentinel 1 (starting with S1), Sentinel 2, and SRTM data (the last three).
CovariatesDescriptionCovariatesDescription
S1vvVertical transmit, Vertical receive polarizationgreenSentinel2 B3
S1vhVertical transmit, Horizontal receive polarizationredSentinel2 B4
S1diffBands difference between VV and VHrededge1Sentinel2 B5
S1mdpsviModified Dual Polarimetric Sar Vegetation Indexrededge2Sentinel2 B6
S1npdiNormalized Polarization Difference Indexrededge3Sentinel2 B7
S1prodBands product between VV and VHnirSentinel2 B8
S1reptBands report between VV and VHnirnarrowSentinel2 B8A
S1rviRatio Vegetation Indexswir1Sentinel2 B11
S1sumBands sum between VV and VHswir2Sentinel2 B12
S1vhasmVH GLCM Angular Second MomentarviAtmospherically Resistant Vegetation Index
S1vhcontVH GLCM ContrastbsiBare Soil Index
S1vhcorrVH GLCM CorrelationeviEnhanced Vegetation Index
S1vhdissVH GLCM DissimilaritygndviGreen Normalized Difference Vegetation Index
S1vhenerVH GLCM EnergymndwiModified Normalized Difference Water Index
S1vhentVH GLCM EntropymsaviModified Soil Adjusted Vegetation Index
S1vhhomoVH GLCM Inverse Difference MomentmtviModified Triangular Vegetation Index
S1vhmaxVH GLCM MaximumndbiNormalized Difference Built-up Index
S1vhmeanVH GLCM MeanndiiNormalized Difference Infrared Index
S1vhvarVH GLCM VariancendviNormalized Difference Vegetation Index
S1vvasmVV GLCM Angular Second MomentosaviOptimized Soil Adjusted Vegetation Index
S1vvcontVV GLCM ContrastrdviRenormalized Difference Vegetation Index
S1vvcorrVV GLCM CorrelationrviRatio Vegetation Index
S1vvdissVV GLCM DissimilaritysaviSoil Adjusted Vegetation Index
S1vvenerVV GLCM EnergysipiStructure Insensitive Pigment Index
S1vventVV GLCM EntropysrSimple Ratio
S1vvhomoVV GLCM Inverse Difference MomentvariVisible Atmospherically Resistant Index
S1vvmaxVV GLCM MaximumvsiVegetation Structure Index
S1vvmeanVV GLCM MeanaspectSRTM aspect
S1vvvarVV GLCM VarianceelevationSRTM elevation
blueSentinel2 B2slopeSRTM slope

Appendix C

Table A2. List of co-variables used in this study.
Table A2. List of co-variables used in this study.
No.Feature Abbrev.DescriptionNative Band/FormulaReferences
1S1vvVertical transmit—vertical channel backscattering coefficients, dBVV[106]
2S1vhVertical transmit—horizontal channel backscattering coefficients, dBVH[106]
3S1diffBands difference between VV and VH V V V H [107]
4S1mdpsviModified Dual Polarimetric Sar Vegetation Index σ 0 V V 2 + σ 0 V V σ 0 V H 2 [108]
5S1npdiNormalized Polarization Difference Index V V V H / V V + V H [109]
6S1prodBands product between VV and VH V V × V H [107]
7S1reptBands report between VV and VH V V / V H [16]
8S1rviRatio Vegetation Index4 × VH/(VV + VH)[107]
9S1sumBands sum between VV and VH V V + V H [110]
10S1vhasmVH GLCM * Angular Second Moment i , j = 0 n 1 ρ i , j 2 [111]
11S1vhcontVH GLCM Contrast i , j = 0 n 1 P i , j i j 2 [111]
12S1vhcorrVH GLCM Correlation i , j = 0 n 1 P i , j i μ i j μ i / σ i 2 σ j 2 [111]
13S1vhdissVH GLCM Dissimilarity i , j = 0 n 1 P i , j | i j | [111]
14S1vhenerVH GLCM Energy i , j = 0 n 1 ρ i , j 2 [111]
15S1vhentVH GLCM Entropy i , j = 0 n 1 P i , j ( l n P i , j ) [111]
16S1vhhomoVH GLCM Homogeneity i , j = 0 n 1 P i , j / ( 1 + ( i j ) 2 ) [111]
17S1vhmaxVH GLCM Maximum m a x P i , j [111]
18S1vhmeanVH GLCM Mean i , j = 0 n 1 i P i , j ; i , j = 0 n 1 j P i , j [111]
19S1vhvarVH GLCM Variance i , j = 0 n 1 P i , j i , j μ i , j 2 [111]
20S1vvasmVV GLCM Angular Second Moment i , j = 0 n 1 ρ i , j 2 [111]
21S1vvcontVV GLCM Contrast i , j = 0 n 1 P i , j i j 2 [111]
22S1vvcorrVV GLCM Correlation i , j = 0 n 1 P i , j i μ i j μ i / σ i 2 σ j 2 [111]
23S1vvdissVV GLCM Dissimilarity i , j = 0 n 1 P i , j | i j | [111]
24S1vvenerVV GLCM Energy i , j = 0 n 1 ρ i , j 2 [111]
25S1vventVV GLCM Entropy i , j = 0 n 1 P i , j ( l n P i , j ) [111]
26S1vvhomoVV GLCM Homogeneity i , j = 0 n 1 P i , j / ( 1 + ( i j ) 2 ) [111]
27S1vvmaxVV GLCM Maximum M a x P i , j [111]
28S1vvmeanVV GLCM Mean i , j = 0 n 1 i P i , j ; i , j = 0 n 1 j P i , j [111]
29S1vvvarVV GLCM Variance i , j = 0 n 1 P i , j i , j μ i , j 2 [111]
30blueBlue bandB2[112]
31greenGreen bandB3[112]
32redRed bandB4[112]
33rededge1Red edge1 bandB5[112]
34rededge2Red edge2 bandB6[112]
35rededge3Red edge3 bandB7[112]
36nirNear-infrared (NIR) bandB8[112]
37nirnarrowNear-infrared narrow (NIR–narrow) bandB8A[112]
38wir1Short-wave infrared (SWIR1) bandB11[112]
39swir2Short-wave infrared (SWIR 2) bandB12[112]
40arviAtmospherically Resistant Vegetation IndexNIR − (2 × Red − Blue)/NIR + (2 × Red − Blue)[113]
41bsiBare Soil Index S W I R 2 + R e d N I R + B l u e S W I R 2 + R e d + N I R + B l u e [114]
42eviEnhanced Vegetation Index2.5 × (NIR − Red)/(NIR + 6Red − 7.5 × Blue + 1)[113]
43gndviGreen Normalized Difference Vegetation Index(NIR − Green)/(NIR + Green)[16]
44mndwiModified Normalized Difference Water Index(Green − SWIR)/(Green + SWIR)[115]
45msaviModified Soil Adjusted Vegetation Index 2 N I R + 1 2 N I R + 1 2 8 N I R + R e d / 2 [116]
46mtviModified Triangular Vegetation Index1.2 × [1.2(NIR − Green) − 2.5 × (Red − Green)][67]
47ndbiNormalized Difference Built-up Index ( S W I R 1 N I R ) /   ( S W I R 1 + N I R ) [117]
48ndiiNormalized Difference Infrared Index ( N I R S W I R ) /   ( N I R + S W I R ) [118]
49ndviNormalized Difference Vegetation Index(NIR − Red)/(NIR + Red)[113]
50osaviOptimized Soil Adjusted Vegetation Index 1 + 0.16 × ( N I R R e d ) /   N I R + R e d + 0.16 [119]
51rdviRenormalized Difference Vegetation Index N I R R e d N I R + R e d [120]
52rviRatio Vegetation Index(Red/NIR)[121]
53saviSoil Adjusted Vegetation Index1.5 × (NIR − Red)/(NIR + Red + 0.5)[122]
54sipiStructure Insensitive Pigment Index(NIR − Blue)/(NIR − Red)[67]
55srSimple Ratio(NIR/Red)[123]
56variVisible Atmospherically Resistant Index(Green − Red)/(Green + Red − Blue)[124]
57vsiVegetation Structure IndexNDVI/(1 − NIR)[125]
58aspectAspect [126]
59elevationElevation [126]
60slopeSlope [126]
* GLCM: Gray-Level Co-occurrence Matrix.

Appendix D

Figure A2. Illustration of the overlay of GEDI and ICESat-2 data onto the VH band from Sentinel 1 to calculate zonal statistics.
Figure A2. Illustration of the overlay of GEDI and ICESat-2 data onto the VH band from Sentinel 1 to calculate zonal statistics.
Remotesensing 17 00085 g0a2

Appendix E

  • Random Forest
RF is a learning algorithm that constructs a series of decision trees generated by training samples taken at random with or without replacement. It uses the decision trees t j P , Θ j as the training base, where j is the number of base trees. Given the training database D that is defined above for a particular realization θ k of Θ j (with k = 1 , . . . , j ), the trained decision tree is defined by t ^ p , θ k , D . Although this formulation follows Breiman [127], the random realization θ k is implicitly used to introduce randomness in two ways. First, “bagging” fits each tree to a randomly drawn sample of the original database. Second, when splitting a node, the best split is found on a randomly selected subset of p predictors independently at each node rather than all predictors.
Decision trees are then constructed without being pruned; the resulting trees are combined as a weighted average, which is presented in Equation (A1).
f x = 1 J j = 1 J t p
Although the RF algorithm has demonstrated its prediction ability, three parameters can be adjusted to improve its performance, depending on the situations and applications. These are the number of predictive variables that are randomly selected at each node (p), the number of trees in the random forest (J), and the size of the tree [128].
  • Support Vector Machine
The SVM algorithm is a supervised, non-parametric statistical learning technique, the initial purpose of which was to solve binary classification problems, which was later extended to regression problems [129]. The objective here is to model a function that would allow us to predict canopy height h ^ while maximizing the hyperplane, i.e., the margin between the predicted value ( h ^ ) and the actual value (h), for all the training data. In the case of a non-linear support vector regression approach, this involves applying a transformation ( ϕ : R n R α   s u c h   t h a t   x ϕ x ) from the space of the input data to a higher dimensional space in order to better predict the input data. Note that n and α are, respectively, dimensions of the space of the input data and transformed with α > n . A kernel function such as a radial basis function is generally used to solve the non-linearity problem [130,131]. From then on, linear regression in α dimensional space will be written as follows:
f x = < w , ϕ x > + b
where w = w 1 , . . . . . . , w N R n is the hyperplane coefficient vector, and b R is a scalar denoting bias.
  • Extreme Gradient Boosting
The XGBoost algorithm is a machine learning ensemble model, which is an efficient and scalable implementation of the Gradient Boosting Machine algorithm [132]. As a learning set approach, XGBoost uses multiple decision trees to achieve optimal prediction performance. The output of the model predicted by this approach will have the same decision rules as a classical decision tree model. Let K be the number of trees used, and the output prediction result is the sum of all the scores predicted by K trees, as shown in Equation (A3):
h ^ = k = 1 K f k x m , f k ϵ F .
XGBoost adopts the same gradient boosting as the Gradient Boosting Machine (GBM) algorithm [133] but provides a small improvement to the objective function by regularizing it, as presented in Equation (A4):
L θ = i l h ^ i , h i + k Ω f k .
Here, is the total objective function, θ represents the hyperparameters of the model, and ℓ is a differentiable convex loss function which measures the distance between the prediction h ^ i and the true value h i , the second term representing regularization, which reduces the variation in the output of the new tree. Detailed information regarding the XGBoost model can be found in the literature [134,135].
In order for the model to work efficiently, XGBoost also stores data in in-memory units for parallel learning, thereby allowing it to handle larger datasets and to run much more rapidly [134].
  • Deep Neural Network
DNN is a neural network that is organized into several hidden and densely connected layers, which characterize the input–output parameters of the network [136]. DNNs were selected because this architecture guarantees a high capacity for finding relationships between variables and for generating machine learning based on data representations [137]. The capacity that characterizes DNN is based upon the fact that each layer is continuously updated by repetitive learning, which is referred to as “backpropagation,” to find the appropriate weights and biases [137]. Backpropagation is carried out until the difference between the predicted value and the real value (the error) is optimal. In this perspective, the output (Oj) of the DNN layer j is defined according to Equation (A5), considering the input X (X = {x1, …, xN}), the activation function (σ), the weight matrix (w), and the bias vector (b):
O j = σ N X · w N + b N
This output is accurately predicted through careful adjustment of parameters, such as the activation function, the learning rate, the number of neurons in each hidden layer, the number of hidden layers, the batch size, and the number of epochs, i.e., the number of forward–backward passes through the dataset [47,138,139,140].

Appendix F

Figure A3. Illustration of map inference from GEDI- or ICESat-2-based models.
Figure A3. Illustration of map inference from GEDI- or ICESat-2-based models.
Remotesensing 17 00085 g0a3

Appendix G

Figure A4. Determining the importance of variables using the SHAP method: beeswarm (a,c) and heatmap (b,d) plots for the Random Forest (a,b) and XGBoost (c,d) models.
Figure A4. Determining the importance of variables using the SHAP method: beeswarm (a,c) and heatmap (b,d) plots for the Random Forest (a,b) and XGBoost (c,d) models.
Remotesensing 17 00085 g0a4

Appendix H. Difference Between Canopy Height Maps by Subtraction

Figure A5. Map of canopy height based on ICESat-2 (a) and that based on GEDI (b), and the difference between them (GEDI—ICESat-2) (c).
Figure A5. Map of canopy height based on ICESat-2 (a) and that based on GEDI (b), and the difference between them (GEDI—ICESat-2) (c).
Remotesensing 17 00085 g0a5
Figure A6. Map of canopy heights based on GEDI (a) and Lang (b), and their difference (GEDI—Lang) (c).
Figure A6. Map of canopy heights based on GEDI (a) and Lang (b), and their difference (GEDI—Lang) (c).
Remotesensing 17 00085 g0a6
Figure A7. Map of canopy heights based on GEDI (a) and Potapov (b), and their difference (GEDI—Potapov) (c).
Figure A7. Map of canopy heights based on GEDI (a) and Potapov (b), and their difference (GEDI—Potapov) (c).
Remotesensing 17 00085 g0a7

Appendix I. Comparing Our Models Maps with Lang and Potapov Maps

Figure A8. Histograms of difference maps between GEDI/Lang (a) and GEDI/Potapov (b) maps.
Figure A8. Histograms of difference maps between GEDI/Lang (a) and GEDI/Potapov (b) maps.
Remotesensing 17 00085 g0a8

References

  1. Van Houtan, K.S.; Tanaka, K.R.; Gagné, T.O.; Becker, S.L. The Geographic Disparity of Historical Greenhouse Emissions and Projected Climate Change. Sci. Adv. 2021, 7, eabe4342. [Google Scholar] [CrossRef] [PubMed]
  2. Xu, X.; Huang, A.; Belle, E.; De Frenne, P.; Jia, G. Protected Areas Provide Thermal Buffer against Climate Change. Sci. Adv. 2022, 8, eabo0119. [Google Scholar] [CrossRef]
  3. Moore, J.W.; Schindler, D.E. Getting Ahead of Climate Change for Ecological Adaptation and Resilience. Science 2022, 376, 1421–1426. [Google Scholar] [CrossRef] [PubMed]
  4. Babiker, M.; Berndes, G.; Blok, K.; Cohen, B.; Cowie, A.; Geden, O.; Ginzburg, V.; Leip, A.; Smith, P.; Sugiyama, M.; et al. Cross-Sectoral Perspectives (Chapter 12). In IPCC, 2022: Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Shukla, A.R., Skea, J., Slade, R., Al Khourdajie, A., van Diemen, R., McCollum, D., Pathak, M., Some, S., Vyas, P., Fradera, R., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2022; pp. 1245–1354. ISBN 978-1-009-15792-6. [Google Scholar]
  5. Fischer, H.W.; Chhatre, A.; Duddu, A.; Pradhan, N.; Agrawal, A. Community Forest Governance and Synergies among Carbon, Biodiversity and Livelihoods. Nat. Clim. Chang. 2023, 13, 1340–1347. [Google Scholar] [CrossRef]
  6. Lamb, W.F.; Gasser, T.; Roman-Cuesta, R.M.; Grassi, G.; Gidden, M.J.; Powis, C.M.; Geden, O.; Nemet, G.; Pratama, Y.; Riahi, K.; et al. The Carbon Dioxide Removal Gap. Nat. Clim. Chang. 2024, 14, 644–651. [Google Scholar] [CrossRef]
  7. Bonan, G.B. Forests and Climate Change: Forcings, Feedbacks, and the Climate Benefits of Forests. Science 2008, 320, 1444–1449. [Google Scholar] [CrossRef]
  8. Le Quéré, C.; Andrew, R.M.; Friedlingstein, P.; Sitch, S.; Pongratz, J.; Manning, A.C.; Korsbakken, J.I.; Peters, G.P.; Canadell, J.G.; Jackson, R.B.; et al. Global Carbon Budget 2017. Earth Syst. Sci. Data 2018, 10, 405–448. [Google Scholar] [CrossRef]
  9. Zhu, X.; Nie, S.; Wang, C.; Xi, X.; Lao, J.; Li, D. Consistency Analysis of Forest Height Retrievals between GEDI and ICESat-2. Remote Sens. Environ. 2022, 281, 113244. [Google Scholar] [CrossRef]
  10. Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E.; et al. Mapping Global Forest Canopy Height through Integration of GEDI and Landsat Data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
  11. Herold, M.; Carter, S.; Avitabile, V.; Espejo, A.B.; Jonckheere, I.; Lucas, R.; McRoberts, R.E.; Næsset, E.; Nightingale, J.; Petersen, R.; et al. The Role and Need for Space-Based Forest Biomass-Related Measurements in Environmental Management and Policy. Surv. Geophys. 2019, 40, 757–778. [Google Scholar] [CrossRef]
  12. Chen, J.; Yan, F.; Lu, Q. Spatiotemporal Variation of Vegetation on the Qinghai–Tibet Plateau and the Influence of Climatic Factors and Human Activities on Vegetation Trend (2000–2019). Remote Sens. 2020, 12, 3150. [Google Scholar] [CrossRef]
  13. Liu, A.; Cheng, X.; Chen, Z. Performance Evaluation of GEDI and ICESat-2 Laser Altimeter Data for Terrain and Canopy Height Retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
  14. Hurtt, G.; Zhao, M.; Sahajpal, R.; Armstrong, A.; Birdsey, R.; Campbell, E.; Dolan, K.; Dubayah, R.; Fisk, J.P.; Flanagan, S.; et al. Beyond MRV: High-Resolution Forest Carbon Modeling for Climate Mitigation Planning over Maryland, USA. Environ. Res. Lett. 2019, 14, 045013. [Google Scholar] [CrossRef]
  15. Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-Resolution Mapping of Forest Canopy Height Using Machine Learning by Coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 Data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102163. [Google Scholar] [CrossRef]
  16. Zhang, N.; Chen, M.; Yang, F.; Yang, C.; Yang, P.; Gao, Y.; Shang, Y.; Peng, D. Forest Height Mapping Using Feature Selection and Machine Learning by Integrating Multi-Source Satellite Data in Baoding City, North China. Remote Sens. 2022, 14, 4434. [Google Scholar] [CrossRef]
  17. Xi, Z.; Xu, H.; Xing, Y.; Gong, W.; Chen, G.; Yang, S. Forest Canopy Height Mapping by Synergizing ICESat-2, Sentinel-1, Sentinel-2 and Topographic Information Based on Machine Learning Methods. Remote Sens. 2022, 14, 364. [Google Scholar] [CrossRef]
  18. de Bem, P.P.; de Carvalho Junior, O.A.; Fontes Guimarães, R.; Trancoso Gomes, R.A. Change Detection of Deforestation in the Brazilian Amazon Using Landsat Data and Convolutional Neural Networks. Remote Sens. 2020, 12, 901. [Google Scholar] [CrossRef]
  19. Hemati, M.; Hasanlou, M.; Mahdianpari, M.; Mohammadimanesh, F. A Systematic Review of Landsat Data for Change Detection Applications: 50 Years of Monitoring the Earth. Remote Sens. 2021, 13, 2869. [Google Scholar] [CrossRef]
  20. Grabska, E.; Hostert, P.; Pflugmacher, D.; Ostapowicz, K. Forest Stand Species Mapping Using the Sentinel-2 Time Series. Remote Sens. 2019, 11, 1197. [Google Scholar] [CrossRef]
  21. Hemmerling, J.; Pflugmacher, D.; Hostert, P. Mapping Temperate Forest Tree Species Using Dense Sentinel-2 Time Series. Remote Sens. Environ. 2021, 267, 112743. [Google Scholar] [CrossRef]
  22. Nguyen, T.T.H.; Pham, T.A.; Luong, T.P. Estimate Tropical Forest Stand Volume Using SPOT 5 Satellite Image. IOP Conf. Ser. Earth Environ. Sci. 2021, 652, 012016. [Google Scholar] [CrossRef]
  23. Peerbhay, K.; Adelabu, S.; Lottering, R.; Singh, L. Mapping Carbon Content in a Mountainous Grassland Using SPOT 5 Multispectral Imagery and Semi-Automated Machine Learning Ensemble Methods. Sci. Afr. 2022, 17, e01344. [Google Scholar] [CrossRef]
  24. De Petris, S.; Sarvia, F.; Borgogno-Mondino, E. Uncertainties and Perspectives on Forest Height Estimates by Sentinel-1 Interferometry. Earth 2022, 3, 479–492. [Google Scholar] [CrossRef]
  25. Ge, S.; Su, W.; Gu, H.; Rauste, Y.; Praks, J.; Antropov, O. Improved LSTM Model for Boreal Forest Height Mapping Using Sentinel-1 Time Series. Remote Sens. 2022, 14, 5560. [Google Scholar] [CrossRef]
  26. Persson, H.; Fransson, J.E.S. Forest Variable Estimation Using Radargrammetric Processing of TerraSAR-X Images in Boreal Forests. Remote Sens. 2014, 6, 2084–2107. [Google Scholar] [CrossRef]
  27. Vastaranta, M.; Niemi, M.; Karjalainen, M.; Peuhkurinen, J.; Kankare, V.; Hyyppä, J.; Holopainen, M. Prediction of Forest Stand Attributes Using TerraSAR-X Stereo Imagery. Remote Sens. 2014, 6, 3227–3246. [Google Scholar] [CrossRef]
  28. Lei, Y.; Treuhaft, R.; Gonçalves, F. Automated Estimation of Forest Height and Underlying Topography over a Brazilian Tropical Forest with Single-Baseline Single-Polarization TanDEM-X SAR Interferometry. Remote Sens. Environ. 2021, 252, 112132. [Google Scholar] [CrossRef]
  29. Bao, J.; Zhu, N.; Chen, R.; Cui, B.; Li, W.; Yang, B. Estimation of Forest Height Using Google Earth Engine Machine Learning Combined with Single-Baseline TerraSAR-X/TanDEM-X and LiDAR. Forests 2023, 14, 1953. [Google Scholar] [CrossRef]
  30. Chen, W.; Zheng, Q.; Xiang, H.; Chen, X.; Sakai, T. Forest Canopy Height Estimation Using Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) Technology Based on Full-Polarized ALOS/PALSAR Data. Remote Sens. 2021, 13, 174. [Google Scholar] [CrossRef]
  31. Sa, R.; Nei, Y.; Fan, W. Combining Multi-Dimensional SAR Parameters to Improve RVoG Model for Coniferous Forest Height Inversion Using ALOS-2 Data. Remote Sens. 2023, 15, 1272. [Google Scholar] [CrossRef]
  32. Sinha, S.; Jeganathan, C.; Sharma, L.K.; Nathawat, M.S. A Review of Radar Remote Sensing for Biomass Estimation. Int. J. Environ. Sci. Technol. 2015, 12, 1779–1792. [Google Scholar] [CrossRef]
  33. Zhao, P.; Lu, D.; Wang, G.; Wu, C.; Huang, Y.; Yu, S. Examining Spectral Reflectance Saturation in Landsat Imagery and Corresponding Solutions to Improve Forest Aboveground Biomass Estimation. Remote Sens. 2016, 8, 469. [Google Scholar] [CrossRef]
  34. Naik, P.; Dalponte, M.; Bruzzone, L. Prediction of Forest Aboveground Biomass Using Multitemporal Multispectral Remote Sensing Data. Remote Sens. 2021, 13, 1282. [Google Scholar] [CrossRef]
  35. Ahmad, A.; Gilani, H.; Ahmad, S.R. Forest Aboveground Biomass Estimation and Mapping through High-Resolution Optical Satellite Imagery—A Literature Review. Forests 2021, 12, 914. [Google Scholar] [CrossRef]
  36. Gaveau, D.L.A.; Hill, R.A. Quantifying Canopy Height Underestimation by Laser Pulse Penetration in Small-Footprint Airborne Laser Scanning Data. Can. J. Remote Sens. 2003, 29, 650–657. [Google Scholar] [CrossRef]
  37. Wilkes, P.; Jones, S.D.; Suarez, L.; Mellor, A.; Woodgate, W.; Soto-Berelov, M.; Haywood, A.; Skidmore, A.K. Mapping Forest Canopy Height Across Large Areas by Upscaling ALS Estimates with Freely Available Satellite Data. Remote Sens. 2015, 7, 12563–12587. [Google Scholar] [CrossRef]
  38. Liu, G.; Wang, J.; Dong, P.; Chen, Y.; Liu, Z. Estimating Individual Tree Height and Diameter at Breast Height (DBH) from Terrestrial Laser Scanning (TLS) Data at Plot Level. Forests 2018, 9, 398. [Google Scholar] [CrossRef]
  39. Tian, J.; Dai, T.; Li, H.; Liao, C.; Teng, W.; Hu, Q.; Ma, W.; Xu, Y. A Novel Tree Height Extraction Approach for Individual Trees by Combining TLS and UAV Image-Based Point Cloud Integration. Forests 2019, 10, 537. [Google Scholar] [CrossRef]
  40. Wulder, M.A.; White, J.C.; Nelson, R.F.; Næsset, E.; Ørka, H.O.; Coops, N.C.; Hilker, T.; Bater, C.W.; Gobakken, T. Lidar Sampling for Large-Area Forest Characterization: A Review. Remote Sens. Environ. 2012, 121, 196–209. [Google Scholar] [CrossRef]
  41. Esteban, J.; McRoberts, R.E.; Fernández-Landa, A.; Tomé, J.L.; Nӕsset, E. Estimating Forest Volume and Biomass and Their Changes Using Random Forests and Remotely Sensed Data. Remote Sens. 2019, 11, 1944. [Google Scholar] [CrossRef]
  42. Lang, N.; Schindler, K.; Wegner, J.D. Country-Wide High-Resolution Vegetation Height Mapping with Sentinel-2. Remote Sens. Environ. 2019, 233, 111347. [Google Scholar] [CrossRef]
  43. Morin, D.; Planells, M.; Baghdadi, N.; Bouvet, A.; Fayad, I.; Le Toan, T.; Mermoz, S.; Villard, L. Improving Heterogeneous Forest Height Maps by Integrating GEDI-Based Forest Height Information in a Multi-Sensor Mapping Process. Remote Sens. 2022, 14, 2079. [Google Scholar] [CrossRef]
  44. Simard, M.; Pinto, N.; Fisher, J.B.; Baccini, A. Mapping Forest Canopy Height Globally with Spaceborne Lidar. J. Geophys. Res. Biogeosciences 2011, 116, G04021. [Google Scholar] [CrossRef]
  45. Baghdadi, N.; le Maire, G.; Fayad, I.; Bailly, J.S.; Nouvellon, Y.; Lemos, C.; Hakamada, R. Testing Different Methods of Forest Height and Aboveground Biomass Estimations from ICESat/GLAS Data in Eucalyptus Plantations in Brazil. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 290–299. [Google Scholar] [CrossRef]
  46. Fayad, I.; Baghdadi, N.; Bailly, J.-S.; Barbier, N.; Gond, V.; Hajj, M.E.; Fabre, F.; Bourgine, B. Canopy Height Estimation in French Guiana with LiDAR ICESat/GLAS Data Using Principal Component Analysis and Random Forest Regressions. Remote Sens. 2014, 6, 11883–11914. [Google Scholar] [CrossRef]
  47. Narine, L.L.; Popescu, S.C.; Malambo, L. Synergy of ICESat-2 and Landsat for Mapping Forest Aboveground Biomass with Deep Learning. Remote Sens. 2019, 11, 1503. [Google Scholar] [CrossRef]
  48. Qi, W.; Lee, S.-K.; Hancock, S.; Luthcke, S.; Tang, H.; Armston, J.; Dubayah, R. Improved Forest Height Estimation by Fusion of Simulated GEDI Lidar Data and TanDEM-X InSAR Data. Remote Sens. Environ. 2019, 221, 621–634. [Google Scholar] [CrossRef]
  49. Tsao, A.; Nzewi, I.; Jayeoba, A.; Ayogu, U.; Lobell, D.B. Canopy Height Mapping for Plantations in Nigeria Using GEDI, Landsat, and Sentinel-2. Remote Sens. 2023, 15, 5162. [Google Scholar] [CrossRef]
  50. Alvites, C.; O’Sullivan, H.; Francini, S.; Marchetti, M.; Santopuoli, G.; Chirici, G.; Lasserre, B.; Marignani, M.; Bazzato, E. High-Resolution Canopy Height Mapping: Integrating NASA’s Global Ecosystem Dynamics Investigation (GEDI) with Multi-Source Remote Sensing Data. Remote Sens. 2024, 16, 1281. [Google Scholar] [CrossRef]
  51. Xing, Y.; Huang, J.; Gruen, A.; Qin, L. Assessing the Performance of ICESat-2/ATLAS Multi-Channel Photon Data for Estimating Ground Topography in Forested Terrain. Remote Sens. 2020, 12, 2084. [Google Scholar] [CrossRef]
  52. Lin, X.; Xu, M.; Cao, C.; Dang, Y.; Bashir, B.; Xie, B.; Huang, Z. Estimates of Forest Canopy Height Using a Combination of ICESat-2/ATLAS Data and Stereo-Photogrammetry. Remote Sens. 2020, 12, 3649. [Google Scholar] [CrossRef]
  53. Jiang, F.; Zhao, F.; Ma, K.; Li, D.; Sun, H. Mapping the Forest Canopy Height in Northern China by Synergizing ICESat-2 with Sentinel-2 Using a Stacking Algorithm. Remote Sens. 2021, 13, 1535. [Google Scholar] [CrossRef]
  54. Guo, Q.; Du, S.; Jiang, J.; Guo, W.; Zhao, H.; Yan, X.; Zhao, Y.; Xiao, W. Combining GEDI and Sentinel Data to Estimate Forest Canopy Mean Height and Aboveground Biomass. Ecol. Inform. 2023, 78, 102348. [Google Scholar] [CrossRef]
  55. Wu, Z.; Yao, F.; Zhang, J.; Ma, E.; Yao, L.; Dong, Z. Genetic Programming Guided Mapping of Forest Canopy Height by Combining LiDAR Satellites with Sentinel-1/2, Terrain, and Climate Data. Remote Sens. 2024, 16, 110. [Google Scholar] [CrossRef]
  56. Zhang, L.; Shao, Z.; Liu, J.; Cheng, Q. Deep Learning Based Retrieval of Forest Aboveground Biomass from Combined LiDAR and Landsat 8 Data. Remote Sens. 2019, 11, 1459. [Google Scholar] [CrossRef]
  57. Sothe, C.; Gonsamo, A.; Lourenço, R.B.; Kurz, W.A.; Snider, J. Spatially Continuous Mapping of Forest Canopy Height in Canada by Combining GEDI and ICESat-2 with PALSAR and Sentinel. Remote Sens. 2022, 14, 5158. [Google Scholar] [CrossRef]
  58. Liu, X.; Su, Y.; Hu, T.; Yang, Q.; Liu, B.; Deng, Y.; Tang, H.; Tang, Z.; Fang, J.; Guo, Q. Neural Network Guided Interpolation for Mapping Canopy Height of China’s Forests by Integrating GEDI and ICESat-2 Data. Remote Sens. Environ. 2022, 269, 112844. [Google Scholar] [CrossRef]
  59. PANA. Plan d’Action National d’Adaptation Au Changement Climatique; Ministère de l’Environnement et des Ressources Forestières (MERF): Lomé, Togo, 2009; p. 113. [Google Scholar]
  60. Ern, H. Die Vegetation Togos. Gliederung, Gefährdung, Erhaltung. Willdenowia 1979, 9, 295–312. [Google Scholar]
  61. MEDDPN. Analyse Cartographique de l’occupation Des Zones Agroécologiques et Bassins de Concentration Des Populations Au Togo, Folega F., Consultant Sous Ordre de La Coordination Nationale Sur Les Changements Climatiques; Ministère de l’Environnement, du Développement Durable et la protection de la Nature (MEDDPN): Lomé, Togo, 2019; p. 66. [Google Scholar]
  62. Atakpama, W.; Amegnaglo, K.B.; Afelu, B.; Folega, F.; Batawila, K.; Akpagana, K. Biodiversité et biomasse pyrophyte au Togo. VertigO-La Rev. Électronique Sci. L’environnement 2019, 19-3. [Google Scholar] [CrossRef]
  63. Kombate, A.; Folega, F.; Atakpama, W.; Dourma, M.; Wala, K.; Goïta, K. Characterization of Land-Cover Changes and Forest-Cover Dynamics in Togo between 1985 and 2020 from Landsat Images Using Google Earth Engine. Land 2022, 11, 1889. [Google Scholar] [CrossRef]
  64. MEDDPN. Niveau de Référence pour les Forêts (NRF) du Togo; Ministère de l’Environnement, du Développement Durable et la protection de la Nature (MEDDPN): Lomé, Togo, 2020; p. 80. [Google Scholar]
  65. Ravina da Silva, M.; Merkovic, M. Forest Carbon Partnership Facility-Republic of Togo: R-Package. In Proceedings of the P30 Meeting 2021, Lomé, Togo, 14 December 2021. [Google Scholar]
  66. Dubayah, R.; Hofton, M.; Blair, J.; Armston, J.; Tang, H.; Luthcke, S. GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002. [GEDI02_A]. NASA EOSDIS Land Processes DAAC. Available online: https://lpdaac.usgs.gov/products/gedi02_av002/ (accessed on 8 August 2022).
  67. Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017, e1353691. [Google Scholar] [CrossRef]
  68. Fotso Kamga, G.A.; Bitjoka, L.; Akram, T.; Mengue Mbom, A.; Rameez Naqvi, S.; Bouroubi, Y. Advancements in Satellite Image Classification : Methodologies, Techniques, Approaches and Applications. Int. J. Remote Sens. 2021, 42, 7662–7722. [Google Scholar] [CrossRef]
  69. Neuenschwander, A.; Pitts, K. The ATL08 Land and Vegetation Product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
  70. Milenković, M.; Reiche, J.; Armston, J.; Neuenschwander, A.; De Keersmaecker, W.; Herold, M.; Verbesselt, J. Assessing Amazon Rainforest Regrowth with GEDI and ICESat-2 Data. Sci. Remote Sens. 2022, 5, 100051. [Google Scholar] [CrossRef]
  71. Wang, R.; Lu, Y.; Lu, D.; Li, G. Improving Extraction of Forest Canopy Height through Reprocessing ICESat-2 ATLAS and GEDI Data in Sparsely Forested Plain Regions. GIScience Remote Sens. 2024, 61, 2396807. [Google Scholar] [CrossRef]
  72. Pang, S.; Li, G.; Jiang, X.; Chen, Y.; Lu, Y.; Lu, D. Retrieval of Forest Canopy Height in a Mountainous Region with ICESat-2 ATLAS. For. Ecosyst. 2022, 9, 100046. [Google Scholar] [CrossRef]
  73. Lahssini, K.; Baghdadi, N.; le Maire, G.; Fayad, I. Influence of GEDI Acquisition and Processing Parameters on Canopy Height Estimates over Tropical Forests. Remote Sens. 2022, 14, 6264. [Google Scholar] [CrossRef]
  74. Bruening, J.; May, P.; Armston, J.; Dubayah, R. Precise and Unbiased Biomass Estimation from GEDI Data and the US Forest Inventory. Front. For. Glob. Chang. 2023, 6, 1149153. [Google Scholar] [CrossRef]
  75. East, A.; Hansen, A.; Jantz, P.; Currey, B.; Roberts, D.W.; Armenteras, D. Validation and Error Minimization of Global Ecosystem Dynamics Investigation (GEDI) Relative Height Metrics in the Amazon. Remote Sens. 2024, 16, 3550. [Google Scholar] [CrossRef]
  76. Moudrý, V.; Prošek, J.; Marselis, S.; Marešová, J.; Šárovcová, E.; Gdulová, K.; Kozhoridze, G.; Torresani, M.; Rocchini, D.; Eltner, A.; et al. How to Find Accurate Terrain and Canopy Height GEDI Footprints in Temperate Forests and Grasslands? Earth Space Sci. 2024, 11, e2024EA003709. [Google Scholar] [CrossRef]
  77. Probst, P.; Wright, M.N.; Boulesteix, A.-L. Hyperparameters and Tuning Strategies for Random Forest. WIREs Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
  78. Louppe, G.; Wehenkel, L.; Sutera, A.; Geurts, P. Understanding Variable Importances in Forests of Randomized Trees. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Curran Associates, Inc.: Red Hook, NY, USA, 2013; Volume 26. [Google Scholar]
  79. Scornet, E. Trees, Forests, and Impurity-Based Variable Importance in Regression. Ann. L’institut Henri Poincaré Probab. Stat. 2023, 59, 21–52. [Google Scholar] [CrossRef]
  80. Janitza, S.; Celik, E.; Boulesteix, A.-L. A Computationally Fast Variable Importance Test for Random Forests for High-Dimensional Data. Adv. Data Anal. Classif. 2018, 12, 885–915. [Google Scholar] [CrossRef]
  81. Hwang, S.-W.; Chung, H.; Lee, T.; Kim, J.; Kim, Y.; Kim, J.-C.; Kwak, H.W.; Choi, I.-G.; Yeo, H. Feature Importance Measures from Random Forest Regressor Using Near-Infrared Spectra for Predicting Carbonization Characteristics of Kraft Lignin-Derived Hydrochar. J. Wood Sci. 2023, 69, 1. [Google Scholar] [CrossRef]
  82. Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure Mode and Effects Analysis of RC Members Based on Machine-Learning-Based SHapley Additive exPlanations (SHAP) Approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
  83. Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A Novel Approach to Explain the Black-Box Nature of Machine Learning in Compressive Strength Predictions of Concrete Using Shapley Additive Explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
  84. Gebreyesus, Y.; Dalton, D.; Nixon, S.; De Chiara, D.; Chinnici, M. Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP). Future Internet 2023, 15, 88. [Google Scholar] [CrossRef]
  85. Chen, C.; Liu, Y.; Li, Y.; Chen, D. Explainable Artificial Intelligence Framework for Urban Global Digital Elevation Model Correction Based on the SHapley Additive Explanation-Random Forest Algorithm Considering Spatial Heterogeneity and Factor Optimization. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103843. [Google Scholar] [CrossRef]
  86. Pallissier-Tanon, A.; Ciais, P.; Schwartz, M.; Fayad, I.; Xu, Y.; Ritter, F.; Truchis, A.; Leban, J.-M. Combining Satellite Images with National Forest Inventory Measurements for Monitoring Post-Disturbance Forest Height Growth. Front. Remote Sens. 2024, 5, 1432577. [Google Scholar] [CrossRef]
  87. Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  88. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
  89. Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Thomas, J.; Ullmann, T.; Becker, M.; Boulesteix, A.-L.; et al. Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. WIREs Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
  90. Probst, P.; Boulesteix, A.-L.; Bischl, B. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. J. Mach. Learn. Res. 2019, 20, 1–32. [Google Scholar]
  91. Lounici, K.; Meziani, K.; Riu, B. Optimizing Generalization on the Train Set: A Novel Gradient-Based Framework to Train Parameters and Hyperparameters Simultaneously. arXiv 2020, arXiv:2006.06705. [Google Scholar]
  92. Naik, P.; Dalponte, M.; Bruzzone, L. Automated Machine Learning Driven Stacked Ensemble Modeling for Forest Aboveground Biomass Prediction Using Multitemporal Sentinel-2 Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3442–3454. [Google Scholar] [CrossRef]
  93. Lankford, S. Effective Tuning of Regression Models Using an Evolutionary Approach: A Case Study. In Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference; Association for Computing Machinery, New York, NY, USA, 15 March 2021; pp. 102–108. [Google Scholar]
  94. Gaber, M.; Kang, Y.; Schurgers, G.; Keenan, T. Using Automated Machine Learning for the Upscaling of Gross Primary Productivity. Biogeosciences 2024, 21, 2447–2472. [Google Scholar] [CrossRef]
  95. Masood, A. Automated Machine Learning: Hyperparameter Optimization, Neural Architecture Search, and Algorithm Selection with Cloud Platforms; Packt Publishing Ltd.: Birmingham, UK, 2021. [Google Scholar]
  96. Wang, X.; Tang, Y.; Guo, T.; Sang, B.; Wu, J.; Sha, J.; Zhang, K.; Qian, J.; Tang, M. Couler: Unified Machine Learning Workflow Optimization in Cloud. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 5224–5237. [Google Scholar]
  97. Lang, N.; Kalischek, N.; Armston, J.; Schindler, K.; Dubayah, R.; Wegner, J.D. Global Canopy Height Regression and Uncertainty Estimation from GEDI LIDAR Waveforms with Deep Ensembles. Remote Sens. Environ. 2022, 268, 112760. [Google Scholar] [CrossRef]
  98. Luo, Y.; Qi, S.; Liao, K.; Zhang, S.; Hu, B.; Tian, Y. Mapping the Forest Height by Fusion of ICESat-2 and Multi-Source Remote Sensing Imagery and Topographic Information: A Case Study in Jiangxi Province, China. Forests 2023, 14, 454. [Google Scholar] [CrossRef]
  99. Liu, D.; Du, Y.; Sun, G.; Yan, W.-Z.; Wu, B.-I. Analysis of InSAR Sensitivity to Forest Structure Based on Radar Scattering Model. Prog. Electromagn. Res. 2008, 84, 149–171. [Google Scholar] [CrossRef]
  100. Zadbagher, E.; Marangoz, A.M.; Becek, K. Characterizing and Estimating Forest Structure Using Active Remote Sensing: An Overview. Adv. Remote Sens. 2023, 3, 38–46. [Google Scholar]
  101. Craig Dobson, M.; Ulaby, F.T.; Pierce, L.E. Land-Cover Classification and Estimation of Terrain Attributes Using Synthetic Aperture Radar. Remote Sens. Environ. 1995, 51, 199–214. [Google Scholar] [CrossRef]
  102. Wang, Y.; Day, J.L.; Davis, F.W. Sensitivity of Modeled C- and L-Band Radar Backscatter to Ground Surface Parameters in Loblolly Pine Forest. Remote Sens. Environ. 1998, 66, 331–342. [Google Scholar] [CrossRef]
  103. Garestier, F.; Dubois-Fernandez, P.C.; Guyon, D.; Le Toan, T. Forest Biophysical Parameter Estimation Using L- and P-Band Polarimetric SAR Data. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3379–3388. [Google Scholar] [CrossRef]
  104. Cazcarra-Bes, V.; Tello-Alonso, M.; Fischer, R.; Heym, M.; Papathanassiou, K. Monitoring of Forest Structure Dynamics by Means of L-Band SAR Tomography. Remote Sens. 2017, 9, 1229. [Google Scholar] [CrossRef]
  105. Zhu, X.; Nie, S.; Zhu, Y.; Chen, Y.; Yang, B.; Li, W. Evaluation and Comparison of ICESat-2 and GEDI Data for Terrain and Canopy Height Retrievals in Short-Stature Vegetation. Remote Sens. 2023, 15, 4969. [Google Scholar] [CrossRef]
  106. Torres, R.; Snoeij, P.; Geudtner, D.; Bibby, D.; Davidson, M.; Attema, E.; Potin, P.; Rommen, B.; Floury, N.; Brown, M.; et al. GMES Sentinel-1 Mission. Remote Sens. Environ. 2012, 120, 9–24. [Google Scholar] [CrossRef]
  107. Alvarez-Mozos, J.; Villanueva, J.; Arias, M.; Gonzalez-Audicana, M. Correlation Between NDVI and Sentinel-1 Derived Features for Maize. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 6773–6776. [Google Scholar]
  108. dos Santos, E.P.; Da Silva, D.D.; do Amaral, C.H. Vegetation Cover Monitoring in Tropical Regions Using SAR-C Dual-Polarization Index: Seasonal and Spatial Influences. Int. J. Remote Sens. 2021, 42, 7581–7609. [Google Scholar] [CrossRef]
  109. Huang, W.; Min, W.; Ding, J.; Liu, Y.; Hu, Y.; Ni, W.; Shen, H. Forest Height Mapping Using Inventory and Multi-Source Satellite Data over Hunan Province in Southern China. For. Ecosyst. 2022, 9, 100006. [Google Scholar] [CrossRef]
  110. Nasirzadehdizaji, R.; Balik Sanli, F.; Abdikan, S.; Cakir, Z.; Sekertekin, A.; Ustuner, M. Sensitivity Analysis of Multi-Temporal Sentinel-1 SAR Parameters to Crop Height and Canopy Coverage. Appl. Sci. 2019, 9, 655. [Google Scholar] [CrossRef]
  111. Tavus, B.; Kocaman, S.; Gokceoglu, C. Flood Damage Assessment with Sentinel-1 and Sentinel-2 Data after Sardoba Dam Break with GLCM Features and Random Forest Method. Sci. Total Environ. 2022, 816, 151585. [Google Scholar] [CrossRef] [PubMed]
  112. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  113. Zhou, J.; Zhou, Z.; Zhao, Q.; Han, Z.; Wang, P.; Xu, J.; Dian, Y. Evaluation of Different Algorithms for Estimating the Growing Stock Volume of Pinus Massoniana Plantations Using Spectral and Spatial Information from a SPOT6 Image. Forests 2020, 11, 540. [Google Scholar] [CrossRef]
  114. Vaudour, E.; Gomez, C.; Lagacherie, P.; Loiseau, T.; Baghdadi, N.; Urbina-Salazar, D.; Loubet, B.; Arrouays, D. Temporal Mosaicking Approaches of Sentinel-2 Images for Extending Topsoil Organic Carbon Content Mapping in Croplands. Int. J. Appl. Earth Obs. Geoinf. 2021, 96, 102277. [Google Scholar] [CrossRef]
  115. Du, Y.; Zhang, Y.; Ling, F.; Wang, Q.; Li, W.; Li, X. Water Bodies’ Mapping from Sentinel-2 Imagery with Modified Normalized Difference Water Index at 10-m Spatial Resolution Produced by Sharpening the SWIR Band. Remote Sens. 2016, 8, 354. [Google Scholar] [CrossRef]
  116. Gilabert, M.A.; González-Piqueras, J.; García-Haro, F.J.; Meliá, J. A Generalized Soil-Adjusted Vegetation Index. Remote Sens. Environ. 2002, 82, 303–310. [Google Scholar] [CrossRef]
  117. Xi, Y.; Thinh, N.X.; LI, C. Preliminary Comparative Assessment of Various Spectral Indices for Built-up Land Derived from Landsat-8 OLI and Sentinel-2A MSI Imageries. Eur. J. Remote Sens. 2019, 52, 240–252. [Google Scholar] [CrossRef]
  118. Sothe, C.; Almeida, C.M.d.; Liesenberg, V.; Schimalski, M.B. Evaluating Sentinel-2 and Landsat-8 Data to Map Sucessional Forest Stages in a Subtropical Forest in Southern Brazil. Remote Sens. 2017, 9, 838. [Google Scholar] [CrossRef]
  119. Leolini, L.; Moriondo, M.; Rossi, R.; Bellini, E.; Brilli, L.; López-Bernal, Á.; Santos, J.A.; Fraga, H.; Bindi, M.; Dibari, C.; et al. Use of Sentinel-2 Derived Vegetation Indices for Estimating fPAR in Olive Groves. Agronomy 2022, 12, 1540. [Google Scholar] [CrossRef]
  120. Segarra, J.; González-Torralba, J.; Aranjuelo, Í.; Araus, J.L.; Kefauver, S.C. Estimating Wheat Grain Yield Using Sentinel-2 Imagery and Exploring Topographic Features and Rainfall Effects on Wheat Performance in Navarre, Spain. Remote Sens. 2020, 12, 2278. [Google Scholar] [CrossRef]
  121. Solymosi, K.; Kövér, G.; Romvári, R. The Development of Vegetation Indices: A Short Overview. Acta Agrar. Kaposvariensis 2019, 23, 75–90. [Google Scholar] [CrossRef]
  122. Urban, M.; Schellenberg, K.; Morgenthal, T.; Dubois, C.; Hirner, A.; Gessner, U.; Mogonong, B.; Zhang, Z.; Baade, J.; Collett, A.; et al. Using Sentinel-1 and Sentinel-2 Time Series for Slangbos Mapping in the Free State Province, South Africa. Remote Sens. 2021, 13, 3342. [Google Scholar] [CrossRef]
  123. Kumar, Y.; Babu, S.; Singh, S. Vegetation Cover and Carbon Pool Loss Assessment Due to Extreme Weather Induced Disaster in Mandakini Valley, Western Himalaya. Environ. Conserv. J. 2020, 21, 49–62. [Google Scholar] [CrossRef]
  124. Sun, H.; Wang, Q.; Wang, G.; Lin, H.; Luo, P.; Li, J.; Zeng, S.; Xu, X.; Ren, L. Optimizing kNN for Mapping Vegetation Cover of Arid and Semi-Arid Areas Using Landsat Images. Remote Sens. 2018, 10, 1248. [Google Scholar] [CrossRef]
  125. Sharma, R.C. Vegetation Structure Index (VSI): Retrieving Vegetation Structural Information from Multi-Angular Satellite Remote Sensing. J. Imaging 2021, 7, 84. [Google Scholar] [CrossRef] [PubMed]
  126. Liu, Y.; Gong, W.; Xing, Y.; Hu, X.; Gong, J. Estimation of the Forest Stand Mean Height and Aboveground Biomass in Northeast China Using SAR Sentinel-1B, Multispectral Sentinel-2A, and DEM Imagery. ISPRS J. Photogramm. Remote Sens. 2019, 151, 277–289. [Google Scholar] [CrossRef]
  127. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  128. Kelkar, K.M.; Bakal, J.W. Hyper Parameter Tuning of Random Forest Algorithm for Affective Learning System. In Proceedings of the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 August 2020; pp. 1192–1195. [Google Scholar]
  129. Wu, J.; Yang, H. Linear Regression-Based Efficient SVM Learning for Large-Scale Classification. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2357–2369. [Google Scholar] [CrossRef] [PubMed]
  130. Yang, L.; Shami, A. On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
  131. Valkenborg, D.; Rousseau, A.-J.; Geubbelmans, M.; Burzykowski, T. Support Vector Machines. Am. J. Orthod. Dentofac. Orthop. 2023, 164, 754–757. [Google Scholar] [CrossRef] [PubMed]
  132. Kavzoglu, T.; Teke, A. Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost). Arab. J. Sci. Eng. 2022, 47, 7367–7385. [Google Scholar] [CrossRef]
  133. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  134. Chen, T.; Guestrin, C. A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  135. Dairu, X.; Shilong, Z. Machine Learning Model for Sales Forecasting by Using XGBoost. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021; pp. 480–483. [Google Scholar]
  136. Rithani, M.; Kumar, R.P.; Doss, S. A Review on Big Data Based on Deep Neural Network Approaches. Artif. Intell. Rev. 2023, 56, 14765–14801. [Google Scholar] [CrossRef]
  137. Han, W.; Lee, D.; Lee, J.-S.; Lim, D.S.; Yoon, H.-K. Prediction of Flowability and Strength in Controlled Low-Strength Material through Regression and Oversampling Algorithm with Deep Neural Network. Case Stud. Constr. Mater. 2024, 20, e03192. [Google Scholar] [CrossRef]
  138. Astola, H.; Seitsonen, L.; Halme, E.; Molinier, M.; Lönnqvist, A. Deep Neural Networks with Transfer Learning for Forest Variable Estimation Using Sentinel-2 Imagery in Boreal Forest. Remote Sens. 2021, 13, 2392. [Google Scholar] [CrossRef]
  139. Park, S.-H.; Jung, H.-S.; Lee, S.; Kim, E.-S. Mapping Forest Vertical Structure in Sogwang-Ri Forest from Full-Waveform Lidar Point Clouds Using Deep Neural Network. Remote Sens. 2021, 13, 3736. [Google Scholar] [CrossRef]
  140. Qin, Y.; Wu, B.; Lei, X.; Feng, L. Prediction of Tree Crown Width in Natural Mixed Forests Using Deep Learning Algorithm. For. Ecosyst. 2023, 10, 100109. [Google Scholar] [CrossRef]
Figure 1. Location of Ecological Zone 4 within Togo.
Figure 1. Location of Ecological Zone 4 within Togo.
Remotesensing 17 00085 g001
Figure 2. Illustrative diagram of the structure of ICESat-2 (a) and GEDI (b) footprints.
Figure 2. Illustrative diagram of the structure of ICESat-2 (a) and GEDI (b) footprints.
Remotesensing 17 00085 g002
Figure 3. Flowchart of the research methodology.
Figure 3. Flowchart of the research methodology.
Remotesensing 17 00085 g003
Figure 4. Importance of variables with the RF module (see definitions of variable abbreviations in Appendix B).
Figure 4. Importance of variables with the RF module (see definitions of variable abbreviations in Appendix B).
Remotesensing 17 00085 g004
Figure 5. Importance of features evaluated with SHAP with respect to height prediction with RF (a) and XGBoost (b) algorithms (see definitions of variable abbreviations in Appendix B).
Figure 5. Importance of features evaluated with SHAP with respect to height prediction with RF (a) and XGBoost (b) algorithms (see definitions of variable abbreviations in Appendix B).
Remotesensing 17 00085 g005
Figure 6. Predicted vs. observed values when modeling canopy height with ICESat-2 data.
Figure 6. Predicted vs. observed values when modeling canopy height with ICESat-2 data.
Remotesensing 17 00085 g006
Figure 7. Predicted vs. observed values when modeling canopy height with GEDI data.
Figure 7. Predicted vs. observed values when modeling canopy height with GEDI data.
Remotesensing 17 00085 g007
Figure 8. Map of canopy heights estimated from ICESat-2-based model in the study area (a), in a zoom-in at small scale in forested aera (b), in a zoom-in at a larger scale in less forested aera (c). (d) is the Google Earth high-resolution images corresponding to the extent of (b), and (e) is the Google Earth high-resolution images corresponding to the extent of (c).
Figure 8. Map of canopy heights estimated from ICESat-2-based model in the study area (a), in a zoom-in at small scale in forested aera (b), in a zoom-in at a larger scale in less forested aera (c). (d) is the Google Earth high-resolution images corresponding to the extent of (b), and (e) is the Google Earth high-resolution images corresponding to the extent of (c).
Remotesensing 17 00085 g008
Figure 9. Map of canopy heights estimated from GEDI-based model in the study area (a), in a zoom-in at small scale in forested aera (b), in a zoom-in at a larger scale in less forested aera (c). (d) is the Google Earth high-resolution images corresponding to the extent of (b), and (e) is the Google Earth high-resolution images corresponding to the extent of (c).
Figure 9. Map of canopy heights estimated from GEDI-based model in the study area (a), in a zoom-in at small scale in forested aera (b), in a zoom-in at a larger scale in less forested aera (c). (d) is the Google Earth high-resolution images corresponding to the extent of (b), and (e) is the Google Earth high-resolution images corresponding to the extent of (c).
Remotesensing 17 00085 g009
Figure 10. Histograms of GEDI-based (a) and ICESat-2-based (b) maps, and the difference between GEDI/ICESat-2 maps (c).
Figure 10. Histograms of GEDI-based (a) and ICESat-2-based (b) maps, and the difference between GEDI/ICESat-2 maps (c).
Remotesensing 17 00085 g010
Table 1. Data used in the research.
Table 1. Data used in the research.
Data SourceType of DataYearSpatial ResolutionBrief Description
GEDISatellite LiDAR202025 m diameterGEDI02_A granules containing relative canopy heights and other variables
ICESat-2Satellite LiDAR202017 m × 100 mATL08 products containing relative canopy heights and other variables
Sentinel 1Radar202010 m × 10 mSynthetic aperture radar (SAR) images from the Sentinel-1A satellite
Sentinel 2Optical202010 m × 10 m,
20 m × 20 m
Multi-spectral images from the Sentinel-2A satellite
SRTMAltimetry200030 m × 30 mDigital Terrain Model
Field plots and
NFI2 plots
Dendrometry2020 202117 m × 100 m and
40 m diameter
Individual tree height and diameters at breast height (DBH)
Land use mapCartography202030 m × 30 mExisting land use map based on Landsat 8 data
Notes: SRTM, Shuttle Radar Topography Mission; NFI2, National Forest Inventory 2, ran from 2020 to 2021 and completed with the establishment of National Forest Reference Levels and REDD+ standards [64,65]. The first comprehensive National Forest Inventory (NFI1) ran from 2015 to 2016.
Table 2. Selection configurations for GEDI data footprints for modeling.
Table 2. Selection configurations for GEDI data footprints for modeling.
ConfigurationsSensitivityQuality_flagBeam TypeAcquisition Time
Config1All beamsAll beamsAll beamsAll beams
Config2≥01PowerDay
Config3≥01PowerNight
Config4≥01CoverageDay
Config5≥01CoverageNight
Config6≥0.91PowerDay
Config7≥0.91PowerNight
Config8≥0.91CoverageDay
Config9≥0.91CoverageNight
Note: Config1 to Config9 = Configuration 1 to Configuration 9.
Table 3. Different scenarios for combining LiDAR variables with other multisource variables.
Table 3. Different scenarios for combining LiDAR variables with other multisource variables.
ScenariosVariables Combinations *Number of Variables
S1Optical28
S2Radar29
S3Topographic03
S4Optical—Radar57
S5Optical—Topographical31
S6Radar—Topographical32
S7Optical—Radar—Topographical60
Note: S1 to S7 = Scenario 1 to Scenario 7 (* data source variables used in the scenarios).
Table 4. Hyperparameter search space.
Table 4. Hyperparameter search space.
ModelsHyperparametersSearch Range
RFmax_depth{10, 20, 30, 40}
n_estimators{100, 200, 500}
SVMC (Regularization parameter){0.01, 0.1, 1, 10}
Gamma{0.001, 0.01, 0.1, 1, 10}
XGBoosteta{0.01, 0.1, 0.2, 0.3}
n_estimators{100, 200, 500}
max_depth{3, 4, 5, 6, 7, 8}
DNNNumber of Layers{2, 4, 6}
Neurons per Layer{16, 32, 64, 128}
Batch Size{16, 32, 64, 128}
Learning Rate{0.001, 0.01, 0.1}
Dropout Rate{0.2, 0.5, 0.7}
Table 5. ICESat-2 data validation correlation matrix.
Table 5. ICESat-2 data validation correlation matrix.
Min.1st Qu.MedMeanMax.RH50RH55RH60RH65RH70RH75RH80RH85RH90RH95RH98h_canopy
Min.1
1st Qu.0.761
Med0.650.921
Mean0.640.880.951
Max.0.230.400.480.671
RH500.650.921.000.950.481
RH550.610.890.990.960.50.991
RH600.580.870.980.960.520.980.991
RH650.560.830.950.960.540.950.970.991
RH700.530.800.930.960.560.930.950.970.991
RH750.500.770.910.950.580.910.930.950.980.991
RH800.470.730.870.940.630.870.900.920.950.970.981
RH850.450.690.830.940.680.830.860.890.920.930.950.981
RH900.420.650.780.910.750.780.810.830.860.880.900.940.971
RH950.370.560.680.850.830.680.710.730.750.780.800.840.880.941
RH980.310.490.590.780.920.590.620.640.660.690.710.760.810.870.961
h_canopy0.110.230.320.410.490.320.330.340.360.390.410.420.450.50.520.531
Table 6. Accuracy metrics of the prediction models for the seven scenarios.
Table 6. Accuracy metrics of the prediction models for the seven scenarios.
ModelsRFSVMXGBoostDNN
ScenariosS1S2S3S4S5S6S7S7S7S7
r0.530.260.280.560.570.460.620.530.570.57
RMSE5.726.256.575.525.409.965.285.505.215.68
MAE4.234.704.884.153.924.444.004.084.064.11
Note: This table is presented in the form of a heatmap where increasingly green colors allow to identify large values of r (increasingly strong correlations), and small values of RMSE and MAE (increasingly weak errors). Increasingly red colors allow to identify the opposite phenomenon.
Table 7. Cross-validation metrics for ICESat-2 data modeling.
Table 7. Cross-validation metrics for ICESat-2 data modeling.
MetricsTraining *Testing
r0.580.62
RMSE5.435.28
MAE4.024.00
* Evaluation was conducted using five-fold cross-validation.
Table 8. Canopy height prediction models from GEDI data.
Table 8. Canopy height prediction models from GEDI data.
Relative HeightConfig1Config2Config3Config4Config5Config6Config7Config8Config9
Pearson Correlation Coefficient (r)
RH750.550.600.670.590.760.590.690.670.77
RH800.570.540.690.670.780.560.690.670.78
RH850.560.610.690.660.760.580.690.650.77
RH900.560.610.710.620.770.540.700.630.77
RH950.580.580.700.610.770.580.700.640.77
RH980.580.610.700.670.770.580.710.680.80
RH1000.590.590.690.690.730.590.690.650.77
Root-Mean-Square-Error (RMSE)
RH756.045.064.834.213.915.225.003.683.84
RH806.175.755.033.914.015.665.093.834.01
RH856.615.625.324.254.395.875.274.334.28
RH906.905.975.574.484.515.905.454.484.53
RH956.886.335.914.734.626.505.854.894.69
RH987.196.706.104.564.716.836.094.484.42
RH1007.236.596.184.455.096.676.174.754.90
Mean Absolute Error (MAE)
RH753.913.703.303.042.723.803.402.612.65
RH804.074.203.512.802.844.203.532.892.83
RH854.414.263.753.133.114.373.743.123.07
RH904.634.514.033.353.304.483.953.313.24
RH954.774.894.353.543.364.874.293.533.42
RH984.955.034.553.363.435.244.513.403.15
RH1005.035.124.643.343.835.164.613.533.52
Note: This table is presented in the form of a heatmap where increasingly green colors allow to identify large values of r (increasingly strong correlations), and small values of RMSE and MAE (increasingly weak errors). Increasingly red colors allow to identify the opposite phenomenon. Rank ordering of nine configurations based on three Random Forest performance metrics. Pearson r: 6 = 1 < 2 < 4 = 8 < 3 = 7 < 5 = 9. W = 0.951, c2r = 53.26, df = 8, p < 0.0001; RMSE: 1 < 6 = 2 < 3 = 7 < 5 = 4 ≤ 9 = 8. W = 0.847, c2r = 47.41, df = 8, p < 0.0001; MAE: 6 ≤ 2 = 1 < 3 = 7 < 4 = 5 < 8 < 9. W = 0.912, c2r = 51.08, df = 8, p < 0.0001.
Table 9. Cross-validation metrics for GEDI data modeling.
Table 9. Cross-validation metrics for GEDI data modeling.
MetricsTraining *Testing
r0.740.80
RMSE5.064.42
MAE3.733.15
* Evaluation was conducted using five-fold cross-validation.
Table 10. Effects of AutoML TPOT and AutoGluon on model performance.
Table 10. Effects of AutoML TPOT and AutoGluon on model performance.
DataModelsrRMSEMAE
ICESat-2RF0.625.284.00
AutoGluon (RF)0.645.123.83
TPOT (RF)0.655.103.80
GEDIRF0.804.423.15
AutoGluon (RF)0.834.162.65
TPOT (RF)0.844.152.36
Table 11. Correlations between extracted or existing data versus predicted data.
Table 11. Correlations between extracted or existing data versus predicted data.
No.Regression DatarRMSEMAE
1ICESat-2_Data/Field_data0.534.853.84
2ICESat-2_Model/Field_data0.543.112.54
3ICESat-2_Data/Lang0.603.662.80
4ICESat-2_Model/Lang0.713.382.55
5ICESat-2_Data/Potapov0.523.152.39
6ICESat-2_Model/Potapov0.623.802.93
7ICESat-2_ Model/NFI20.553.652.98
8GEDI_Data/Lang0.643.902.94
9GEDI_Model/Lang0.655.504.17
10GEDI_Data/Potapov0.544.113.15
11GEDI_ Model/Potapov0.556.044.64
12GEDI_ Model/NFI20.633.402.65
13Lang/INFI20.643.963.09
14Potapov/NFI20.464.213.28
Note. Models are set in boldface for the highest correlations using the ICESat-2 and GEDI datasets versus field measurements, NFI2, and Lang and Potapov datasets.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kombate, A.; Fotso Kamga, G.A.; Goïta, K. Modeling Canopy Height of Forest–Savanna Mosaics in Togo Using ICESat-2 and GEDI Spaceborne LiDAR and Multisource Satellite Data. Remote Sens. 2025, 17, 85. https://doi.org/10.3390/rs17010085

AMA Style

Kombate A, Fotso Kamga GA, Goïta K. Modeling Canopy Height of Forest–Savanna Mosaics in Togo Using ICESat-2 and GEDI Spaceborne LiDAR and Multisource Satellite Data. Remote Sensing. 2025; 17(1):85. https://doi.org/10.3390/rs17010085

Chicago/Turabian Style

Kombate, Arifou, Guy Armel Fotso Kamga, and Kalifa Goïta. 2025. "Modeling Canopy Height of Forest–Savanna Mosaics in Togo Using ICESat-2 and GEDI Spaceborne LiDAR and Multisource Satellite Data" Remote Sensing 17, no. 1: 85. https://doi.org/10.3390/rs17010085

APA Style

Kombate, A., Fotso Kamga, G. A., & Goïta, K. (2025). Modeling Canopy Height of Forest–Savanna Mosaics in Togo Using ICESat-2 and GEDI Spaceborne LiDAR and Multisource Satellite Data. Remote Sensing, 17(1), 85. https://doi.org/10.3390/rs17010085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop