Generating Wall-to-Wall Canopy Height Information from Discrete Data Provided by Spaceborne LiDAR System

Doyog, Nova D.; Lin, Chinsu

doi:10.3390/f15030482

Open AccessArticle

Generating Wall-to-Wall Canopy Height Information from Discrete Data Provided by Spaceborne LiDAR System

by

Nova D. Doyog

^1,2

and

Chinsu Lin

^1,*

¹

Department of Forestry and Natural Resources, College of Agriculture, National Chiayi University, 300 University Road, Chiayi 600355, Taiwan

²

College of Forestry, Benguet State University, La Trinidad 2601, Philippines

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(3), 482; https://doi.org/10.3390/f15030482

Submission received: 31 January 2024 / Revised: 29 February 2024 / Accepted: 2 March 2024 / Published: 5 March 2024

(This article belongs to the Special Issue Applications of Laser Scanning and Satellite Images in Forest Mensuration—Series II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Provision of multi-temporal wall-to-wall canopy height information is one of the initiatives to combat deforestation and is necessary in strategizing forest conversion and reforestation initiatives. This study generated wall-to-wall canopy height information of the subtropical forest of Lishan, Taiwan, using discrete data provided by spaceborne LiDARs, wall-to-wall passive and active remote sensing imageries, topographic data, and machine learning (ML) regression models such as gradient boosting (GB), k-nearest neighbor (k-NN), and random forest (RF). ICESat-2- and GEDI-based canopy height data were used as training data, and medium-resolution passive satellite image (Sentinel-2) data, active remote sensing data such as synthetic aperture radar (SAR), and topographic data were used as regressors. The ALS-based canopy height was used to validate the models’ performance using root mean square error (RMSE) and percentage RMSE (PRMSE) as validation criteria. Notably, GB displayed the highest accuracy among the regression models, followed by k-NN and then RF. Using the GEDI-based canopy height as training data, the GB model can achieve optimum accuracy with an RMSE/PRMSE of 8.00 m/31.59%, k-NN can achieve an RMSE/PRMSE of as low as 8.05 m/31.78%, and RF can achieve optimum RMSE/PRMSE of 8.16 m/32.24%. If using ICESat-2 data, GB can have an optimum RMSE/PRMSE of 13.89 m/54.86%; k-NN can have an optimum RMSE/PRMSE of 14.32 m/56.56%, while RF can achieve an RMSE/PRMSE of 14.72 m/58.14%. Additionally, integrating Sentinel-1 with Sentinel-2 data improves the accuracy of canopy height modeling. Finally, the study underlined the crucial relevance of correct canopy height estimation for sustainable forest management, as well as the potential ramifications of poor-quality projections on a variety of biological and environmental factors.

Keywords:

integrated RS- and ML-based canopy height modeling; GEDI; ICESat-2; SAR; challenges in canopy height modeling

1. Introduction

Canopy height is a very important input to modeling global, national, and local forest productivity dynamics including aboveground biomass (AGB), carbon, and volume as they are allometrically related [1,2,3,4,5,6,7,8,9,10,11,12,13,14]. Multi-temporal canopy height measurements can be compared to identify areas where the forest canopy has been significantly altered or removed. The collection of sufficient canopy height information is, therefore, a need in combating deforestation and in strategizing forest conversion and reforestation initiatives. However, large-scale and wall-to-wall collection of canopy height information cannot be attained by traditional forest inventory techniques that often rely on field measurements and manual data collection because of several challenges and limitations [15,16,17].

A good alternative to traditional inventory is remote sensing (RS) technology. However, as generally practiced, the traditional method is still integrated with RS technology. The RS technology provides broader spatial coverage, rapid data acquisition, and detailed information about forest vertical structure [18,19]. It offers multispectral and hyperspectral data, enabling detailed analysis of forest structure and precise measurements of canopy height that are crucial for understanding forest dynamics, carbon accounting, and planning sustainable forest management practices and monitoring forest changes [20]. RS enables efficient multi-temporal monitoring that enables the detection and tracking of deforestation and forest degradation. Furthermore, multi-temporal monitoring also allows the assessment of the impact of deforestation and degradation on carbon emissions, which is vital for designing and evaluating climate change mitigation strategies [21]. This technology provides a wealth of information about forests, including their spatial distribution, vegetation structure, and health status.

There are two types of RS technology depending on the source of illumination. The sun-dependent type is referred to as passive RS, and those operating using their source of illumination are active RS. The latest development in RS technology is that Sentinel satellites have been launched, providing passive data (Sentinel-2) and active data (Sentinel-1). In 2018, two spaceborne LiDAR sensors (Advanced Topographic Laser Altimeter System (ATLAS) mounted in ICESat-2 and GEDI attached to the International Space Station (ISS)) were launched. The ATLAS is a photon-counting LiDAR, while the GLAS and GEDI are full-waveform types.

Both passive and active RS have good qualities useful in forest dynamics monitoring; however, they have several limitations. Passive RS sensors such as Moderate Resolution Imaging Spectroradiometer (MODIS), Landsat, Sentinel-2, Planetscope, and Worldview provide images of the surface reflectance of Earth’s surface. Passive RS data are wall-to-wall, have global-wide coverage, and are less expensive compared to active RS sensors. However, they have limited sensitivity to the vertical forest canopy structure because the signal from the sensor may not be able to penetrate through the canopy to detect the vertical structure of the vegetation. Nevertheless, such an issue can be solved with the inclusion of vegetation indices in the canopy height modeling. The vegetation indices are based on the reflectance properties of vegetation and can be used to indirectly estimate vegetation density, biomass, and canopy height.

Saturation problems are also being dealt with when using optical data. Saturation can occur when the reflectance values of vegetation pixels reach the maximum value that can be recorded by the sensor [22,23] This can result in a loss of information and accuracy in the estimation of canopy height. Several approaches can be used to mitigate saturation problems in canopy height estimation, including the use of multiple spectral bands [24] and the inclusion of ancillary data like topographic features. Optical imagery provides measurements in multiple spectral bands, which can be used to estimate vegetation properties such as leaf area index and biomass. By combining information from multiple bands, it may be possible to estimate canopy height more accurately even in areas with high reflectance values.

Active RS can penetrate cloud cover and vegetation [25] and provide its illumination during data collection, thus continuously collecting data in good and bad weather conditions and day and night. The penetration ability of active sensors enables measurement of the vegetation’s vertical structure [26], which is one of the disadvantages of passive sensors. It is the main reason why passive imageries are integrated with active data, especially when the objective is canopy height estimation. However, active RS also faces limitations. Spaceborne LiDARs such as ICESat-2 and GEDI have almost total Earth coverage but are discrete data. TLS, UAV-LiDAR, and ALS can provide continuous data but have limited coverage due to various factors such as cost, employed methodology of data collection, and the operational capabilities of the equipment.

One active RS method, particularly Sentinel-1, provides wall-to-wall and almost around-the-globe coverage data. The Sentinel-1 mission comprises a constellation of two polar-orbiting satellites, operating day and night, enabling them to acquire imagery regardless of the weather. Sentinel-1 carries a single C-band SAR instrument operating at a center frequency of 5.405 gigahertz (GHz). It provides dual-polarization capability, very short revisit times, and rapid product delivery and offers reliable repeated wide-area monitoring. The C-band Sentinel-1 signal is unlikely to penetrate dense canopies [27,28]. Therefore, forest backscatter can be assumed to be unaffected by variations in soil moisture and surface roughness, which can cause large and abrupt changes in the backscatter over other land covers, and forests should behave as fairly time-invariant targets [29,30,31].

Aside from limited canopy penetrating potential, especially in dense vegetation, Sentinel-1 imageries also face challenges such as saturation and failure to detect vegetation understory. Data integration can reduce the impacts of such challenges. For instance, the effectiveness of synergizing Sentinel-1 and Sentinel-2 data for canopy height mapping has been explored in a few studies [32,33,34]. However, research in tropical or subtropical ecosystems is still lacking [35]. In addition, [36] discovered that combining multispectral (Landsat) and SAR data yielded greater accuracy in predicting forest structure measurements compared to using a single sensor alone.

The use of ancillary data like topographic data, such as elevation, slope, and aspect, could also affect canopy height estimation. A series of studies have shown that the accuracy of DEM and tree parameters generally decreases as the slope gradient increases. For instance, Ref. [37] reported that, when the slope increased from 15.6° to 37.6°, the vertical RMSE of tree height extraction increased from 0.576 m to 0.901 m. The studies of [38] and [39] showed that increasing slope gradient resulted in reduced tree height because LiDAR inversion and DEM errors led to forest volume errors. Ref. [40] suggested that, as the slope gradient increased, models neglecting slopes would overestimate tree height and could be corrected by a slope coefficient that would allow a more accurate estimation of tree height. Table 1 summarizes the advantages and disadvantages as well as the sensitivity of RS data in canopy height modeling.

One of the developing tools in forest resources inventory includes machine learning (ML) technologies. ML is a form of artificial intelligence in which a computer is algorithmically trained to perform a task such as regression or classification [41,42]. ML regression techniques play a significant role in canopy height modeling. They can enhance the accuracy and efficiency of the models by learning complex relationships between input data and canopy height. In addition, they can integrate various data sources, such as LiDAR point clouds, RS imagery, topographic data, and ground truth measurements. These models can handle multi-modal data and extract meaningful features from them. Once trained, ML models can be applied to new areas or time periods, provided that the input data are available. This transferability can save time and resources when modeling canopy height in different locations or monitoring changes over time. ML offers a wide range of regression algorithms like gradient boosting (GB), k-nearest neighbor (k-NN), and random forest (RF).

GB combines multiple weak predictive models to create a strong predictive model. It is a type of ensemble learning method that belongs to the boosting family of algorithms. GB iteratively builds a sequence of weak prediction models, typically decision trees, and combines their predictions to form a more accurate and robust final prediction. Each weak model is trained on the residuals or errors of the previous models, with the goal of improving the overall prediction in subsequent iterations. This process continues iteratively until a predefined stopping criterion is met, such as reaching a specified number of weak models or achieving a desired level of accuracy. To combine the predictions of the weak models, GB typically uses a weighted average approach, where each model’s prediction is weighted according to its performance or contribution to the overall error reduction. This weighting ensures that more emphasis is placed on the models that provide better predictions.

On the other hand, k-NN regression models represent a simple yet powerful ML algorithm. It is a non-parametric algorithm that makes predictions based on the similarity of data points in the feature space. The underlying idea behind k-NN is that similar data points tend to have similar labels or target values. k-NN regression exploits this assumption by averaging the target values of the nearest neighbors. k-NN does not have an explicit training phase. Instead, it stores the training data points and their corresponding labels or target values. During prediction, the algorithm searches for the k-nearest neighbors of a given test data point from the training data. To determine the similarity between data points, k-NN typically uses a distance metric. The k in k-NN represents the number of nearest neighbors considered for making predictions. It is a hyperparameter that needs to be predefined before applying the algorithm. The choice of k depends on the dataset and the problem at hand. Smaller values of k may lead to more flexible decision boundaries but increase the risk of overfitting, while larger values of k may provide smoother decision boundaries but risk oversimplifying the model. During prediction, k-NN finds the k-nearest neighbors of a test data point based on the chosen distance metric. It calculates the distance between the test point and all the training points and selects the k points with the smallest distances. For regression, it computes the average (or weighted average) of the target values of the k-nearest neighbors.

The RF developed by [43] is a popular ML algorithm that belongs to the ensemble learning family. It combines multiple decision trees to create a powerful predictive model. Each decision tree is trained on a random subset of the training data, and the final prediction is determined by aggregating the predictions of individual trees. The RF algorithm is widely used in various domains due to its robustness, accuracy, and interpretability. RF builds a collection of decision trees and aggregates their predictions to provide an accurate and robust regression model.

Integrating RS and ML techniques for canopy height modeling is efficient and cost-effective. However, despite the advancement of the technology for the collection of canopy height information and the existence of global-wide canopy height data coverage, wall-to-wall canopy height information is still insufficient. Therefore, taking advantage of available data is necessary in order to come up with efficient, effective, accurate, and flexible canopy height modeling. Thus, this study was conducted to generate wall-to-wall canopy height information by integrating medium-resolution passive satellite images such as Sentinel-2, active RS data such as SAR and LiDAR, satellite-derived indices such as vegetation, water, soil, biophysical, and backscatter indices, and ancillary data such as elevation, aspect, and slope. Specifically, it aimed to assess the performance of the ML regression models and spaceborne LiDAR discrete data and assess the effect of integrating Sentinel-1 and -2 features in canopy height modeling.

2. Materials and Methods

2.1. Study Site

Figure 1 shows the geographical location of the study site. Lishan is located in the Heping district of Taichung, Taiwan, and is part of the Jade Mountains. It has an average altitude of 2001 meters above sea level (masl), with the lowest point at 1386 masl and the highest point at 3088 masl. The standard deviation (SD) of the elevation within the study site is 326.79 masl. The temperature in Lishan varies throughout the year, ranging from 24 degrees Celsius (°C) during the summer to −4 °C during the winter.

The forest cover of Lishan is characterized by conifers, a mix of pine, conifer, and broadleaf, and broadleaf forests. Specifically, this site has a great deal of Taiwan red pine (Pinus taiwanensis) plantations, which were originally managed for wood production and thus account for a major part of the coniferous forest. The mixed species covering Lishan Forest indicates heterogenous forest canopy height, making Lishan Forest suitable for the exploration of generating flexible wall-to-wall forest canopy height information using discrete data from spaceborne LiDARs, wall-to-wall active and passive RS imageries, and topographic data. In addition, the fluctuating elevation of Lishan can represent forest scenarios from 1386 to 3088 masl. Lishan Forest helps in the absorption of cardon dioxide (CO₂) emitted to the atmosphere and, therefore, helps in mitigating the prevailing impact of climate change. Moreover, it serves as the source of water for irrigation for agricultural areas and for domestic use. Due to tourism, agricultural activities, and natural phenomena, Lishan Forest is threatened by forest disturbances, such as fire, erosion, built-up expansion, and agricultural expansion. For instance, an average of 526 ha per year were destroyed by fire within and around the area from 1963 to 2019 [44]. With the important role and with the prevailing threats, monitoring and protection of Lishan Forest is deemed necessary.

2.2. Active and Passive Remote Sensing Imageries

2.2.1. Sentinel-1

Four different types of Sentinel-1 products are available for download at the European Space Agency’s (ESA) Copernicus Hub (https://scihub.copernicus.eu/). It is either Single Looking Complex (SLC), Ground Range Detected (GRD), Ocean (OCN), or the raw SAR data from the unprocessed Instrument Source Packets (ISP). SLC products preserve phase information and are processed at the natural pixel spacing, while the GRD products contain the detected amplitude and are multi-looked to reduce the impact of speckles.

Sentinel-1 SLC and GRD products were downloaded on 17 April 2023 from the ESA Copernicus Hub (https://scihub.copernicus.eu/). A total of 6 Sentinel images (3 images have both the SLC and GRD products while 3 images are SLC products only) were downloaded. The SLC products were used for the interferometric coherence retrieval, while the GRD products were used for the backscatter coefficients. The dates of acquisition of the images that were used in this study are 2021/01/14 and 2021/01/20. The technical description of the Sentinel-1 images is summarized in Table 2 for the SLC products and Table 3 for the GRD products.

The GRD products were subjected to a radiometric terrain correction (RTC) process in the Alaska Satellite Facility Distributed Active Archive Centers (ASF DAAC) HyP3 2023 using the hyp3_gamma plugin version 6.1.0 running GAMMA release 20220630. The RTC products are projected to World Geodetic System 84/Universal Transverse Mercator zone 51N (WGS 84/UTM zone 51N), with a pixel spacing of 10 m. Further, a speckle filter has been applied to the RTC images. The Enhanced Lee filter technique was used to remove speckles while preserving edges. A dampening factor of 1 was used with a box size of 7 × 7 pixels and 30 looks. From the radiometrically corrected GRD products, the backscatter coefficients of the VV and VH polarizations were retrieved. The process of retrieving the backscatter of the GRD Sentinel-1 product is summarized in Figure 2. Both the backscatter in VV and HV polarization were used in this study.

The backscatter coefficient has already been proven as useful input for modeling vegetation parameters such as LAI, canopy water content, and biomass [45,46,47]. However, to increase the backscatter data dimensionality, vegetation indices were generated such as the sum of the dual polarization of Sentinel-1, the difference, ratio, Radar Vegetation Index (RVI), and Normalized Difference Index (NDI). The formula used to derive the vegetation indices is summarized in Table 4.

The SLC Sentinel products were paired and used for interferometric coherence retrieval. The interferometric coherence was generated in the ASF DAAC HyP3 2023 using the hyp3_gamma plugin version 6.1.0 running GAMMA release 20220630. Files are projected to WGS84/UTM zone 51N, and the pixel spacing is 40 m. The process of generating interferometric coherence is summarized in Figure 3. The spectra of the two images used for InSAR must overlap well enough to generate interferometric fringes. Areas without common overlap must be filtered out prior to InSAR generation. The correlation indicates the accuracy of the phase information or the visibility of fringes. Areas with low correlation will have a noisier phase. The magnitude of the correlation is commonly referred to as coherence. The interferometric coherence ranges from 0 to 1, of which, the larger the number, the higher the coherence or the better the correlation.

2.2.2. Sentinel-2

The Sentinel-2 products are freely downloadable at the Copernicus open-access hub (https://scihub.copernicus.eu/). The Level-2A Sentinel-2 imagery with acquisition date of 30 January 2023 and with product identifier number of S2B_MSIL2A_20230130T022939_N0509_R046_T51QUG_20230130T044448 was downloaded freely at the Copernicus open-access hub (https://scihub.copernicus.eu/) on 14 March 2023. It has a descending orbit direction, illumination azimuth angle/zenith angle of 151.38/46.63, and cloud cover (%)/cloud shadow (%) of 49.69/1.32.

The pansharpening technique was performed on the Level 2A product to enhance the spatial resolution while retaining the spectral attributes since satellite images exhibit usually either high spectral resolution and low spatial resolution, or low spectral resolution and high spatial resolution [49]. The Sen2Res plugin in Sentinel Application Platform (SNAP) software version 9.0.0 was used in this study to enhance the spatial resolution of the 20 m and 60 m bands to 10 m.

Vegetation indices [50,51] can also be used to estimate vegetation properties and reduce the effects of saturation. These indices are calculated using combinations of spectral bands and can provide information on vegetation cover and structure. Table 5 provides the summary of the vegetation indices that were integrated with the surface reflectance of Sentinel-2 in canopy height estimation in this study.

The biophysical indices such as LAI, Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), fraction vegetation cover (FVC), leaf chlorophyll content (CAB), and Canopy Water Content (CWC) were also used as additional features. The biophysical indices were calculated in SNAP software using the biophysical processor S2 plugin. The input data include spectral information (8 bands: B3–B7, B8a, and B11–B12), as well as directional information (cosine of sun zenith angle, view zenith angle, and relative azimuth angle.

2.3. Ancillary Data

In this study, the elevation data of Taiwan that were generated and distributed freely by the Ministry of the Interior were used. The DEM data were downloaded at https://data.gov.tw/dataset/35430 on 21 January 2023. They have a 20 m spatial resolution and a coordinate system of TWD97. From the DEM data, the aspect and slope were generated using the spatial analyst tool in ArcGIS. The Planar method was used for both the aspect and slope. The inclination of the slope was calculated in degrees and ranges from 0 to 90. The aspect is expressed in positive degrees from 0 to 359.9, measured clockwise from north. For consistency with the spatial resolution of Sentinel-2 data, the elevation, aspect, and slope were resampled to 10 m × 10 m using the bilinear as the resampling technique.

In summary, canopy height modeling involved layerstacking various data sources, including Sentinel-2, with their distinct surface reflectance, derived spectral indices, biophysical features, SAR data backscatter coefficients, backscatter indices, and topographic features. Three layerstacked datasets were created: Sentinel-1 data, Sentinel-2 data, and Sentinel-1&2 data.

2.4. RS-Based Canopy Height Data

2.4.1. ALS-Based Canopy Height

In December 2018, ALS point cloud data were collected using the Riegl LMS-Q780 lidar system [43]. The data collection involved operating flight altitudes ranging from 3400 to 4000 m and a laser pulse repetition rate of 240 to 270 kilohertz (kHz). The original point cloud data had a density of approximately 2.5 ground points and 15 canopy points per square meter. To process the point cloud data, several steps were performed. First, denoising and classification techniques were applied to remove noise and classify the points into ground and canopy points. This step helps separate the vegetation canopy from the ground surface. Next, a DEM and a digital surface model (DSM) were generated from the classified point cloud data using a linear interpolation technique. The resolution of the DEM and DSM was set to 1 m, meaning that each pixel in the models represented a 1 m by 1 m area on the ground. After generating the DEM and DSM, the canopy height model (CHM) was derived by subtracting the DEM from the spike-free DSM. The CHM represents the vertical height of the vegetation canopy above the ground surface.

Finally, pitfall removal was performed in the CHM. This process helps identify and remove false spikes or irregularities in the CHM, ensuring more accurate measurements of object height. The 1 m × 1 m CHM data were aggregated to 10 m, 17 m, and 25, to be consistent with the Sentinel-2 image, ICESat-2, and GEDI data, respectively. The 10 m ALS data were used during the validation of the predicted canopy height products, the 17 m were used during the analysis of the agreement of ALS and ICESat-2 data, and the 25 m were used when the GEDI training data were assessed.

Overall, the process involved denoising and classifying the original point cloud data, generation of the DEM and DSM, deriving a CHM, and performing pitfall removal on the CHM for object height determination. The maximum canopy height was used instead of the stand height to be consistent with the ICESat-2 and GEDI data. A total of 7824 ALS-based canopy height data were collected for data validation purposes.

2.4.2. GEDI-Based Canopy Height

For this study, the version 2 GEDI L2A .h5 file with acquisition dates of 2019 to 2022 was downloaded at https://searcanopyheight.earthdata.nasa.gov/search on 16 March 2023. Following the tutorial provided by the Land Processes Distributed Active Archive Center (LP DAAC) (https://lpdaac.usgs.gov/resources/e-learning/getting-started-gedi-l2a-version-2-data-python/ accessed on 16 March 2023), the GEDI L2A .h5 file was filtered and converted to a GeoJSON file. The quality flag, degrade flag, and sensitivity were used as the filtering criteria. All the footprints that have a degrade flag of 1, a quality flag of 0, and a sensitivity of <95 were removed from the data. As a result, a total of 14,612 footprints were filtered and used in this study as training data. The maximum height that is equivalent to RH95 was used. Accordingly, the difference between the elevation of the signal start and signal end is not reliable to define maximum height because of the high signal-to-noise ratio. Because the spaceborne LiDAR data were generated based on the height above the ellipsoid, the ICESt-2 and GEDI data were converted first to orthometric data.

2.4.3. ICESat-2-Based Canopy Height

The ICESat-2 mission, which was launched in September 2018 and is the successor of the ICESat mission, which was active from January 2003 to February 2010, collects altimetry data of Earth’s ice, land, and vegetation [63]. The sole instrument on ICESat-2 is the ATLAS instrument, which is a photon-counting system that operates 6 beams, 3 strong and weak pairs of green wavelength (532 nm) lasers, and achieves a ground footprint of 17 m diameter [64,65]. The algorithm for the ATL08 extracts terrain and canopy heights from the ATL03 in 100 m segments along the satellite’s ground track [66,67]. The algorithm also labels the photons as either noise, ground, canopy, or top of canopy and calculates a number of relative height metrics for each 100 m segment. The result is a suite of metrics for each segment that represent canopy heights relative to the terrain surface, height variability metrics, and a series of quality flags [65].

The ATL08 segments over the Lishan area acquired from 2018-12-29 to 2022-09-21 (version 5) [67] were downloaded from https://www.openaltimetry.org/ on 22 March 2023. OpenAltimetry is a cyberinfrastructure platform that was established for the discovery, access, and visualization of data from NASA’s ICESat and ICESat-2 missions and is a product of a collaboration between the National Snow and Ice Data Center (NSIDC), University NAVSTAR Consortium (UNAVCO), the Scripps Institution of Oceanography, and San Diego Supercomputer Center at University of California San Diego.

A total of 3279 ATL08 footprints were downloaded. Among the different height metrics, the RH98 is recommended for ICESat-2 to represent the top canopy height rather than the maximum of canopy photons owing to the signal-to-noise uncertainty at the top of the canopy [67].

2.5. Machine Learning-Based Canopy Height Modeling

2.5.1. Gradient Boosting (GB) Regression

Some popular implementations of GB algorithms include XGBoost, LightGBM, and CatBoost, which provide efficient and optimized implementations for better performance and scalability. The Extreme Gradient Boosting (XGBoost) package in R software version 4.1.3 was used in this study [68]. The package includes an efficient linear model solver and tree-learning algorithms.

The XGBoost model’s hyperparameters, including nrounds, max_depth, eta, gamma, colsample_bytree, min_child_weight, and subsample, were tuned to determine the optimal values of the parameters. Hyperparameter tuning was performed using the caret package in R. The values used for the nrounds were 50, 100, 150, 200, 250, 500; max_depth = 1, 2, 3, 4, 5, 10; eta = 0.05, 0.1, 0.3, 0.4; gamma = 0, 0.05, 0.5, 1, 2, 3; colsample_bytree = 0.4, 0.6, 0.8, 1.0; min_child_weight = 1, 2, 3, 4, 5; and subsample = 0.5, 0.75, 1.0. Cross-validation (CV) was performed using the repeated CV method (10-fold with 5 repeats). RMSE, mean absolute error (MAE), and R-squared (R²) were used among the accuracy measures. As a result, the final values used for the GB model were nrounds = 500, max_depth = 4, eta = 0.025, gamma = 0.05, colsample_bytree = 0.8, min_child_weight = 3 and subsample = 0.5.

2.5.2. k-Nearest Neighbors (k-NN)-Regression

k-NN is known for its simplicity, interpretability, and effectiveness in certain scenarios. However, it can be computationally expensive for large datasets since it requires calculating distances between each test point and all training points. Additionally, k-NN assumes that all features have equal importance. The k-NN regression was performed in R using k = 30 and Euclidean distance as the distance metrics. The horizontal distance of reference and target plots was also considered, and 1 km was used for this study.

2.5.3. Random Forest (RF) Regression

RF provides additional functionalities such as variable importance analysis, which can help identify the most influential features in the regression model. RF regression was performed using the randomForest package in R version 4.1.3, which was created in March 2022. The parameters ntree and mtry were used. The ntree parameter pertains to the number of trees to be built and mtry pertains to the number of variables that are used. Values of ntree = 1000 and mtry = 37 were used as they provide the highest accuracy during the tuning process.

2.6. Assessing the Effects of Different Factors of Canopy Height Modeling Using Full-Factorial ANOVA Design

Many factors affect canopy height modeling, and such factors have different impacts on the predicted canopy height. The factors that were identified to be analyzed in this study include characteristics of satellite data, types of regression models, and composition of training data. Some of the factors may have a positive impact on the canopy height modeling, and some may have a negative impact. Further, the interaction of the factors may positively or negatively affect the influence of the other factors. Therefore, the interaction effects of the factors must be considered, not only the main effect. Thus, the effects of these factors were assessed using the full-factorial ANOVA design.

The 3-way factorial ANOVA model, as summarized in Equation (1), offers the advantage of concurrently examining both the main and interaction effects of factors in experimental designs involving multiple factors, without sacrificing degrees of freedom, thus enabling the researchers to examine the combined effects of all factors on the dependent variable in a single analysis. The main effects of the independent variable are crucial for understanding the individual contributions of factors to the variability in the dependent variable, while the interaction effects of independent variables are fundamental for gaining a comprehensive understanding of the relationships between variables, improving the accuracy and applicability of research findings, and guiding effective decision making in various fields. By addressing the co-effects of factors at various levels simultaneously, the full factorial model facilitates the derivation of more robust conclusions, thereby providing evidence for the presumed impact of one or more independent variables on the dependent variables. Further, full-factorial ANOVA design offers comprehensive exploration and therefore enhances the understanding of how each factor independently and interactively contributes to the observed outcomes.

y_{i j k l m} = μ + α_{i} + β_{j} + γ_{k} + {(α β)}_{i j} + {(α γ)}_{i k} + {(β γ)}_{j k} + {(α β γ)}_{i j k} + ε_{i j k}

(1)

In Equation (1),

μ

is the overall canopy height mean;

α

,

β

, and

γ

are the main effects of the factors. Any items with two and three factors are the interaction effect of the corresponding factors on the canopy height. Furthermore,

ε

is the random error term. Footnotes i, j, and k denote the levels of related factors, and m is the number of observations. Parameterization of the model is completed by the weighted least squares method.

2.7. Determination of Appropriate Factors for Canopy Height Estimation

The factor levels that were examined were Sentinel-1, Sentinel-2, and Sentinel-1&2 for satellite data; GB, k-NN, and RF for the regression model; and GEDI and ICESat-2 for the training data. Overall, a total of 18 combinations (resulting from 3, 3, and 2 factor levels) were tested. The predicted values for canopy height were then compared to the ALS-based canopy height, and models exhibiting acceptable canopy height bias were identified as suitable for canopy height estimation. For this purpose, Duncan’s post hoc test was utilized to quantify specific differences between pairs of factor means and to determine the most appropriate combinations of factor levels for accurate canopy height estimation. This test is considered moderately conservative, emphasizing precise significance levels for multiple comparisons. The sampling schemes displaying the least and greatest deviation from ALS-based canopy height were categorized as the best and worst, respectively. Furthermore, the cost of inaccurate canopy height prediction in both the best and worst canopy height models was assessed based on the bias distribution perspective.

2.8. Performance Evaluation of the Canopy Height Modeling

2.8.1. Determining the Minimum Number of Test Samples for Accuracy Validation

Only the portion of the study site corresponding to the ALS data (as illustrated in Figure 1) was utilized for accuracy assessment given the unavailability of ground truth data for the entire study area. The exclusion of non-ALS data led to a total of 2,091,516 pixels with a spatial resolution of 10 m. To determine the minimum independently collected sample size (n) required for evaluating estimation accuracy, random sampling without replacement from finite populations was employed. This approach, previously utilized in developing a stand structure decomposition algorithm for analyzing the dynamics of a Mongolian boreal forest [69] involved applying Equation (2). In this equation, N represents the total pixels of the study site with ALS data; E denotes the allowable error, expressed as a percentage of the mean; CV indicates the coefficient of variation of the data; and t is the two-tailed Student’s t statistic at the confidence level of α.

n = {[{(E / (t_{α / 2, v} \times C V))}^{2} + 1 / N]}^{- 1}

(2)

According to the formula in Equation (2), to achieve a sampling rate meeting a 5% significance level and a 10% error rate, the recommended minimum sample size is 25,428 pixels, resulting in a sampling intensity of 1.22%. This prescribed minimum number of test samples was replicated 10 times.

2.8.2. Evaluation Criteria for the Accuracy Assessment

Using the ALS-based canopy height, the ML-based canopy height products were evaluated. A total of 9560 samples were collected for validation data because it is the minimum number of samples needed to cover the whole range of the dataset and satisfy an error rate of 10% as provided by the formula for the minimum number of sampling size. The sample data were collected randomly (Figure 1c). The evaluation criteria that were used were RMSE and percentage RMSE (PRMSE). RMSE is a widely used evaluation metric in regression tasks to measure the average magnitude of the differences between predicted values and actual values. It provides a measure of the model’s accuracy by calculating the square root of the average of the squared differences between predicted and actual values. It provides an intuitive understanding of how well the predictions of a regression model fit the observed data. It has the same units as the target variable, making it interpretable in the original scale of the problem. However, it is worth noting that RMSE can be sensitive to outliers since it squares the errors.

PRMSE is a combination of RMSE and a percentage-based metric. PRMSE calculates RMSE as a percentage of the range of the target variable and provides a relative measure of the error. By expressing RMSE as a percentage of the range, PRMSE allows for better comparison across different datasets or target variables with varying scales. It provides a normalized measure of the error that is easier to interpret and compare. Finally, the overall flowchart involved in this study is summarized in Figure 4.

3. Results

3.1. Data Distribution of the Retrieved Spaceborne-LiDAR-Based Canopy Height

Table 6 shows the descriptive statistics of the GEDI- and ICESat-2-based canopy height that were retrieved specifically to serve as training data for the different ML-based regression models. Within the study site, a total of 14,910 GEDI footprints were retrieved. The GEDI-based canopy height has maximum and minimum values of 89.77 m and 1.90 m, respectively, and a mean of 26.94 m.

Based on the frequency distribution (%), the frequency of the GEDI data ranges from 0.57% to 18.52%; 18.52% of the height ranges from 25 to 30 m; 17.92% ranges from 20 to 25 m; 13.78% ranges from 30 to 35 m; and 12.86% ranges from 15 to 20 m. The other height values have a frequency of <10%. Overall, 44.51% have a height higher than the average height, which is 26.94 m, and 55.49% have a height lower than the average height.

On the other hand, the ICESat-2-based canopy height has a maximum and minimum height of 149.34 m and 2.27 m, respectively, and an average height of 33.53 m. Based on the frequency distribution (%), the frequency of the ICESat-2-based canopy height ranges from 0.10% to 14.98%, and 14.98% of the footprints have a maximum height of 30–35 m; 14.38% range from 35 to 40 m; 13.62% range from 25 to 30; and 11.76% range from 40 to 45 m. All the remaining height interval values are <10%. Overall, 48.95% of the footprints have a height higher than the average height, which is 33.53 m, and 51.05% are below the average height.

As observed, the maximum canopy heights of GEDI and ICESat-2 are extremely high. However, the high canopy height values were still included after visual inspection using the generated high-resolution orthophotos because they were located in forest areas. The generated extremely high canopy height values of GEDI and ICESat-2 were probably because of the site location, canopy density, and undulating terrain. As recalled, Lishan is located in a high-elevation mountainous region of Taiwan composed of varied species indicating a heterogenous canopy structure and with a complex topography. According to [66,70], complex terrain conditions with dense forest canopy cover influence the quality of the digital terrain and CHM data. In addition, Refs. [71,72] mentioned in their work that spaceborne-LiDAR-based products have high uncertainty in mountainous regions.

3.2. Agreement of ALS-, GEDI-, and ICESat-2-Based Canopy Height

Following the locations of the collected GEDI footprints, the ALS-based canopy height was extracted for comparison. A total of 4400 GEDI footprints are located within the boundary of the ALS-based canopy height. When the GEDI- and ALS-based canopy heights were compared as shown in Figure 5a, it is observed that the frequency of ALS data is distributed mainly from heights of 0–30 m (95.84%). The remaining frequency totaling 4.16% is observed at heights of >30 m. GEDI, on the other hand, is concentrated mainly at heights 15–40 m with a frequency of 75.29%. The remaining 24.71% of the data have heights of <15 m and >40 m. The average height of the ALS-based canopy height is 16.94 m, with an SD of 8.39 m and ranging from 0 to 40.69 m. The corresponding GEDI footprints have a mean height of 27.21 ± 11.17 m and range from 0 to 89.77 m.

The same process was performed for the ICESat-2 data, as shown in Figure 5b. A total of 2410 ICESat-2 footprints overlapped with the ALS-based canopy height data. The result shows the same pattern as that of the comparison of GEDI- and ALS-based canopy height, of which a high frequency of 92.09% of ALS-based canopy height is observed at 30 m and below while the remaining 7.91% have heights ranging from 30 to 47.53 m. The ALS-based canopy height ranges from 0.02 to 47.53 m and has a mean of 19.10 ± 7.46 m. The ICESat-2 -based canopy height ranges from 0 to 149.83 m with an average of 34.14 ± 15.21 m.

Figure 6 shows the 1:1 relationship or overall agreement between the ALS-based canopy height and the spaceborne-LiDAR-based canopy height and the descriptive statistics of the spaceborne-LiDAR-based canopy height bias. A low level of overall agreement was observed for both the spaceborne LiDARs regarding ALS-based canopy height. In Figure 6a, an R² of 0.28 was observed for the linear regression of ALS-based canopy height and GEDI-based canopy height (n = 4400), and 44.92% of the GEDI-based canopy height underestimated the ALS-based canopy height, while the remaining 55.08% overestimated the ALS-based canopy height. In Figure 6b, an R² value of 0.03 was observed for the linear regression of ALS-based canopy height and ICESat-2-based canopy height (n = 2410), indicating a poor correlation. Moreover, 37.36% of the ICESat-2-based canopy height underestimated the ALS-based canopy height, while the remaining 62.64% overestimated the ALS-based canopy height.

The descriptive statistics of the canopy height bias of GEDI-based canopy height and ICESat-2-based canopy height that were integrated in Figure 6 show that the minimum and maximum bias of the GEDI-based canopy height are 63.01 m and −52.16 m, respectively, while the average + SD canopy height bias is −1.17 + 10.92 m. On the other hand, the minimum and maximum canopy height bias of the ICESat-2-based canopy height are −42.10 m and 103.32 m, respectively, while the mean + SD canopy height bias is 5.37 + 15.08 m. The GEDI-based canopy height has a lower maximum and average canopy height bias than ICESat-2 but has a higher minimum canopy height bias than ICESat-2.

3.3. ML-Based Canopy Height Modeling

The three ML regression models, GB, k-NN, and RF, with GEDI and ICESat-2-based canopy height as training data were built for each of the three layerstacked datasets (Sentinel-1&2, Sentinel-2, and Sentinel-1). Figure 7 shows the spatial distribution of the ML-based predicted canopy height, while Table 7 shows the corresponding descriptive statistics. Overall, the spatial distribution shows that low canopy height values are observed in the middle part of the study site, which is known to be agriculture and water areas, and at some of the edges of the study site, which, if viewed on Google Earth, are grassland areas. Visually comparing the different maps in Figure 7, the different wall-to-wall canopy height maps are similar. However, the descriptive statistics that are shown in Table 7 tell otherwise. It could be observed that regression model bias occurred for GB and RF, as indicated by the negative values in Sentinel-1 data for both ICESat-2 and GEDI. This indicates that RF and GB were not able to capture the true relationship between the features of the Sentinel-1 data and canopy height. Negative values were merely observed in areas with low canopy height. As recalled, the minimum values of the GEDI- and ICESat-2-based canopy height are 1.90 m and 2.27 m, respectively. One of the possible factors of non-establishment of the true relationship of the features of Sentinel-1 and canopy height is the non-representation of canopy height values that are less than 1.90 m and 2.27 m. One possible reason is the data quality of the Sentinel-1 data. As can be observed, model bias was not observed with Sentinel-2 and Sentinel-1&2. This indicates that the features of Sentinel-1 are not sufficient to establish a relationship with canopy height, thus requiring other variables for canopy height modeling. The negative values were no longer observed when Sentinel-1 was integrated with Sentinel-2. In comparison with the k-NN model, it did not produce negative canopy height values. As recalled, the k-NN regression model considers its nearest neighbors; therefore, if there are no negative values provided in the training data, no negative values will be generated.

Neglecting the negative values, the GB-based canopy height has a minimum canopy height value ranging from 0.06 m to 0.66 m for GEDI and 2.86 m to 5.89 m for ICESat-2. The maximum canopy height ranges from 66.71 m to 70.15 m for GEDI and 86.25 m to 107.16 m for ICESat-2. On the other hand, the mean canopy height values range from 26.44 m to 26.53 m for GEDI and 37.54 m to 37.75 m for ICESat-2. The difference in the mean canopy height values of GEDI and ICESat-2 ranged from 11.10 m to 11.22 m.

For the k-NN regression model, all the predicted canopy heights of the different datasets have a minimum value of 0. This also shows that the predicted canopy height values of ICESat-2 are higher than GEDI, as shown by the maximum and mean canopy height values. The maximum canopy height values for all the satellite data are 99.64 m for GEDI and 146.84 m for ICESat-2. For mean canopy height, different values were observed in all the datasets. The mean canopy height of the GEDI-based model ranges from 26.50 m to 26.64 m, and ICESat-2 ranges from 37.51 m to 37.96 m. The difference in the mean canopy height values of GEDI and ICESat-2 ranges from 11.01 m to 11.32 m.

The precision of the accuracy of the RF-based canopy height modeling was analyzed through testing several configurations. The results revealed that the best configuration for canopy height modeling is the one that uses all the metrics: the Sentinel-2 dataset includes surface reflectance, spectral indices, biophysical features, and topographical features; the Sentinel-1 dataset includes backscatter coefficient, SAR indices, and topographical features; the Sentinel-1&2 dataset includes surface reflectance, spectral indices, biophysical features, backscatter coefficient, SAR indices, and topographical features. Therefore, all the metrics of each of the three layerstacked datasets were used for the wall-to-wall canopy height modeling of the study site. Figure 8 shows the importance of each metric of each dataset based on the percentage increase in mean square error (%IncMSE). Variable importance is one of the unique features of RF models. Variable importance refers to the measure of how much each input variable contributes to the model’s predictive performance. It could be observed that the top three features with the most importance are LAI, MNDWI, and elevation for Sentinel-1&2 × ICESat-2; LAI, elevation, and coherence for Sentinel-1&2 × GEDI; LAI, MNDWI, and elevation for Sentinel-2 × ICESat-2; elevation, slope, and B4 for Sentinel-2 × GEDI; slope, elevation, and sigmaVV for Sentinel-1 × ICESat-2; and slope, elevation, and coherence for Sentinel-1 × GEDI. In general, the topographic features are all included in the top three features with the highest importance, indicating that topographic features are needed for canopy height modeling. It could be observed that the same variables have different importance depending on the dataset. This is because the variable importance of RF is affected by sample size, data distribution, and range. It could be recalled that the sample sizes of the collected ICESat-2- and GEDI-based canopy heights are not the same. Also, the data distributions and ranges of the two datasets are not the same.

From the spatial distribution (Figure 7), it could be observed that the areas with low canopy height values are very distinct and are observed in the middle and at the lower and upper left edges of the study site, just like in GB and k-NN. Through visual comparison of the predicted canopy height, the canopy heights of Sentinel-2 and Sentinel-1&2 using ICESat-2 are different from the other predicted canopy heights. Water areas are indistinguishable from agriculture areas and grassland areas.

Based on the descriptive statistics (Table 7), it could be observed that the minimum values of Sentinel-1 are negative values, indicating overfitting of the model for both GEDI and ICESat-2. Neglecting the negative values, the minimum canopy height of the RF-based predicted canopy height ranges from 2.22 m to 2.24 m for GEDI and 5.59 m to 6.09 m for ICESat-2. For the maximum canopy height, the GEDI-based canopy height ranges from 71.77 m to 73.28 m and the ICESat-2-based ranges from 103.69 m to 105.52 m. The mean canopy height of GEDI ranges from 26.71 to 26.86 m and the ICESat-2-based canopy height ranges from 38.40 m to 38.90 m. Obtaining the range of the difference in the means of GEDI- and ICESat-2-based data resulted in a difference ranging from 11.69 m to 12.04 m.

In general, it could be observed that the predicted canopy height values produced by the three models are extremely high. This could probably be an after-effect of using extremely high values during the training of the models. As recalled, the canopy height data that were used for training the models also have extremely high maximum values. This indicates that the training data should be accurate in order for the models to produce accurate results.

3.4. Correlation of the ML-Based Predicted Canopy Height with the ALS-Based Canopy Height

Finally, after generating the wall-to-wall ML-based canopy height values based on different datasets, they were compared with the ALS-based canopy height values. The result of bivariate Pearson correlation analysis showed a positive significant correlation that ranges from 0.38 to 0.64 (Table 8). The highest correlation (0.64) is observed for both the Sentinel-1&2 and Sentinel-2 datasets, with GEDI as the training data and with GB as the regression model. It could be observed that the training data of the top five datasets with the highest correlation are GEDI, satellite images are Sentinel-1&2 and Sentinel-2, and regression models are GB and k-NN. On the other hand, the two datasets with the least correlation with ALS-based canopy height have a satellite image of Sentinel-1, training data ICESat-2, and the regression models are k-NN and RF, indicating that the training data have the most important role as compared with satellite image and regression models in canopy height modeling.

3.5. Performance of the ML-Based Predicted Canopy Height Based on RMSE (m) and PRMSE (%)

With accuracy measures such as RMSE(m) and PRMSE (%), the performance of the wall-to-wall ML-based canopy height was validated against the ALS-based canopy height. The ALS-based canopy height only covers a portion of the whole study site. Therefore, for the validation, only the area with ALS-based canopy height was compared with the wall-to-wall ML-based canopy height.

The factors affecting RMSE (m) and PRMSE (%), such as the characteristics of the satellite image, the composition of training data, and the type of regression model, were considered in the analysis. The main and interaction effects of the different factors were analyzed using the three-way full-factorial ANOVA model. The F-value, p-value, and the partial eta squared (ηp2) were used to present the interpretation of the results of the ANOVA model. Although there is concern regarding prevalent misuse and misconceptions regarding using p-values [73], the values were consistent with the F-values and ηp2. As shown in Table 9 and Table 10, the ANOVA revealed a significant main effect for the regression model (RMSE: F-value = 2782.22, p-value =< 0.001; PRMSE: F-value = 1701, p-value =< 0.001), satellite data (RMSE: F-value = 193.09, p-value =< 0.001; PRMSE: F-value = 118.07, p-value < 0.001), and training data (RMSE: F-value = 1,011,356.76, p-value < 0.001; PRMSE: F-value = 618,324.81, p-value < 0.001). The result indicates that the RMSE and PRMSE of ML-based canopy height modeling are statistically different and therefore significantly depend on the regression model, satellite data, and training data. The ηp2 value, which provides an overview of the magnitude of the effect of the independent variables, indicates that the training data have the most substantial influence on the RMSE of canopy height modeling, with a value of 1, followed by the regression model and satellite data, with values of 0.97 and 0.70, respectively. The result of ηp2 for PRMSE is the same as for RMSE.

The ANOVA model also revealed a statistically significant difference between the RMSE and PRMSE of the interaction between the two factors at a 0.05 probability level. Based on the F-value of the interactions with significant value, the two-level interaction between the regression model × training data has the highest value (RMSE: F-value = 1173.73; PRMSE: F-value = 717.61), followed by satellite data × training data (RMSE: F-value = 983.93; PRMSE: F-value = 601.54) and regression model × satellite data (RMSE: F-value = 89.72; PRMSE: F-value = 54.86).

The interaction of the full factors (regression model × satellite data × training data) also showed highly significant influence on RMSE and PRMSE, with an F-value and p-value of 155.06 and <0.001, respectively, for RMSE and 94.8 and <0.001, respectively, for PRMSE.

Duncan’s post hoc test was used for multiple comparisons of the factor levels, and the result is shown in Figure 9. The factor levels were grouped into several subsets based on the significant differences of the means. Figure 9a is the result of the comparison of the RMSE and PRMSE of the regression models. It shows that the RMSE and PRMSE of the GB, k-NN, and RF are statistically significant different from each other at a 0.05 level of significance with a p-value of <0.001.

The average RMSE/PRMSE of the levels under the regression models, GB, k-NN, and RF, are significantly different, with values of 11.08 m/43.75%, 11.36 m/44.86%, and 11.64 m/45.79%, respectively. The result indicates that the difference in the RMSE of the GB and k-NN is 0.28 m, while the k-NN and RF have an RMSE difference of 0.28 m. On the other hand, the GB and RF have a difference of 0.56 m. A slight difference in PRMSE was also observed between the three regression models. The difference in the PRMSE of GB and k-NN is 1.11%, while k-NN and RF and GB and RF have differences of 0.93% and 2.04%, respectively. The amount of improvement in the accuracy of the canopy height modeling using the three regression models indicates that the use of different regression models did not affect the accuracy of the canopy height modeling much. Nevertheless, among the regression models, GB provided the highest accuracy, followed by k-NN. The RF model has the least accuracy among the three.

Figure 9b is the result of the comparison of the RMSE and PRMSE of the satellite data. Accordingly, the RMSE and PRMSE of Sentinel-1&2 and Sentinel-2 are not statistically significantly different from each other but are significantly different from Sentinel-1 at a 0.05 level of significance with a p-value of <0.001. The RMSE/PRMSE values of Sentinel-1&2, Sentinel-2, and Sentinel-1 are 11.31 m/44.69, 11.32 m/44.70, and 11.44 m/45.21, respectively.

Figure 9c is the result of the pairwise comparisons of the RMSE and PRMSE of the training data. According to the pairwise comparison, the RMSE and PRMSE of ICESat-2 and GEDI are statistically significantly different, with a p-value of <0.001. The RMSE/PRMSE of GEDI are 8.24 m/34.54%, while ICESat-2 is 14.18 m/57.19%. The RMSE is decreased by 5.94 m when GEDI is used as training data instead of ICESat-2. On the other hand, the PRMSE is decreased by as much as 22.65%.

Finally, the RMSE and PRMSE of the predicted canopy height were determined based on the interactions of the satellite data, training data, and regression model (Table 11). The levels of the three factors resulted in a total of 18 combinations. The 18 combinations were divided by Duncan’s test post hoc test into 16 homogeneous subsets for RMSE and 14 for PRMSE. The RMSE of the full interactions ranged from 8.00 m to 15.09 m, with an average of 11.36 m, and 50.0% of the full interactions have values that are below the average RMSE and 50.0% have values above the average RMSE value. PRMSE ranged from 31.59% to 59.62%, with an average of 44.86%, and 50% of the datasets have PRMSE values below the average and 50% have values that are above the average PRMSE. The dataset with the highest and lowest accuracy has a difference of 7.10 m based on RMSE and 28.03% based on PRMSE.

To summarize, the characteristics of satellite data, composition of training data, and type of regression model play an important role in the accuracy of canopy height modeling in a critical way based on the three-way full-factorial ANOVA design. On average, when considering a particular factor with the most appropriate method or parameters, the recommended datasets are Sentinel-1&2 for satellite data, GEDI for training data, and GB for the regression model because they have the highest accuracy, outperforming the other data combinations. In contrast, the most unfavorable would be using Sentinel-2 for satellite data, ICESat-2 for training data, and RF for the regression model.

3.6. Agreement of GEDI and ICESat-2 in Canopy Height Modeling

The agreement of the predicted canopy height products using ICESat-2 and GEDI as training data was determined by obtaining the difference between the two products. Figure 10 shows the spatial and frequency distribution of the agreement. Negative values indicate that the GEDI-based canopy height product has a lower prediction value than the ICESat-2-based canopy height product, and 0 values indicate high agreement between the GEDI and ICESat-2 products. On the other hand, values farthest from 0 indicate the lowest agreement. The agreement of the two products using the GB regression model ranges from −70 to 26 m. The majority of the agreement, equivalent to 91.57%, is within −20 to 0 m, indicating high agreement. The agreement when using the k-NN regression models ranges from −118 to 55 m, of which the major frequency is observed from −20 to 0m. The total frequency is 95.44%, indicating that the agreement is high. On the other hand, when using the RF regression model, the agreement between the two products ranges from −79 to 33 m, a lower range compared to k-NN. The majority of the agreement is observed from −20 to 0 m, totaling 89.48%. Generally, low agreement is observed in lower-right and upper-left corners of study sites. Compared with the satellite image, areas with low agreement are observed in boundaries of low and high vegetation, areas with low vegetation, and also in shaded areas of the satellite image.

3.7. Comparison of the Spatial Distribution of the Canopy Height Prediction Bias of GEDI and ICESat-2

Using the Sentinel-1&2 data, the spatial distribution of the canopy height prediction bias produced by GEDI and ICESat-2 was visualized (Figure 11). Through visual inspection, it can be observed that high bias values are located toward the center of the study site. Upon inspection with the satellite image, it is observed that the center of the site is a non-forested area and includes water, indicating low canopy height values. Figure 12 shows the behavior of the canopy height prediction and prediction bias throughout the observed canopy height. In general, observed canopy heights with low values were over-predicted, as indicated by positive percentage prediction bias, while those observed canopy heights with higher values were under-predicted, as indicated by negative percentage prediction bias. The observed canopy heights with low values have high prediction bias. When GEDI-based canopy height is used as training data, the GB regression model has a prediction bias ranging from −45.45 m to 51.52 m with a mean + SD of −3.30 + 8.83 m; the k-NN regression model has a bias ranging from −59.04 m to 52.3 m with a mean + SD of −3.45 + 8.96 m; while the RF has a bias ranging from −48.03 m to 50.43 m and a mean + SD of −3.69 + 8.99. When using ICESat-2, the canopy height prediction bias of the GB regression model ranges from −65.36 m to 42.49 m with a mean + SD of −13.11 + 9.99; the k-NN has a bias ranging from −91.08 m to 48.03 m with a mean + SD of −14.07 + 9.88 m; while the RF has a bias ranging from −77.30 m to 40.47 m with a mean + SD of −14.64 + 10.33 m. Based on frequency, when using GEDI-based canopy height as training data, the majority of the prediction bias of the three regression models is within −20 m–10 m, around 90%. When using the ICESat-2-based canopy height as training data, the majority of the prediction bias of the regression models increased and is between −10 and −65.36, −91.08, and −77.30 m for GB, k-NN, and RF, respectively. The prediction bias of the three regression models is higher when ICESat-2 data are used as training data. Further, Figure 12 shows the correlation of the predicted canopy height and observed canopy height and the correlation of the prediction bias with the observed canopy height. In general, the GEDI-based canopy height has higher correlation with the ALS data than the ICESat-based canopy height. In addition, it is observed that the observed canopy height with low values has the highest positive bias. This indicates that the regression models tend to over-predict low canopy height values and under-predict high canopy height values.

4. Discussion

4.1. Effects of Integrating SAR Data with Optical data in Canopy Height Modeling

Figure 13 depicts the improvement in the accuracy of Sentinel-1 when it was integrated into Sentinel-2 data during canopy height modeling using the GB regression model. Based on RMSE, the accuracy of the predicted canopy height was improved from 8.37 m to 8.00 m, and, based on PRMSE, the accuracy was improved to 31.59% from 33.06%. Accuracy percentage increases of 4.62% (RMSE) and 4.65% (PRMSE) were observed.

Figure 14 shows the spatial distribution and the descriptive statistics of the canopy height prediction bias. Based on the descriptive statistics, the mean canopy height bias values, which are 4.03 m and 3.92 m for Sentinel-1 and Sentinel-2, respectively, were decreased to 3.45 m when the features of the two datasets were integrated in canopy height modeling. This indicates that integrating Sentinel-1 data with Sentinel-2 data helps to improve canopy height accuracy.

Combining Sentinel-1 and Sentinel-2 data for canopy height modeling can offer several advantages due to the complementary nature of these sensors. Both Sentinel-1 and Sentinel-2 are part of the ESA’s Copernicus program, but they operate in different parts of the electromagnetic spectrum and provide distinct types of information.

Sentinel-1 is a SAR sensor that operates in the microwave range. It can penetrate through clouds and provide information about the structure of vegetation. Combining SAR data with optical data from Sentinel-2, which operates in the visible and NIR range, allows for more comprehensive characterization of vegetation structure. This is particularly useful in areas with frequent cloud cover as SAR can provide data even in cloudy conditions, thus improving sensitivity to vegetation structure. Sentinel-1 has a shorter revisit time compared to Sentinel-2, which means it can capture changes in vegetation structure more frequently. Combining the two datasets enables the creation of more temporally dynamic canopy height models, providing insights into seasonal variations and growth patterns, therefore enhancing temporal resolution.

SAR data from Sentinel-1 can be sensitive to the vertical structure of vegetation, allowing for more accurate estimation of canopy height. Integrating this information with the spectral data from Sentinel-2 can lead to a more robust and accurate model for canopy height. With this, it is understood that there will be improved accuracy in canopy height estimation.

In summary, combining Sentinel-1 and Sentinel-2 features in canopy height modeling can provide a more comprehensive and accurate representation of vegetation structure, especially in challenging environmental conditions. However, careful data integration and processing are crucial for realizing the full benefits of this multi-sensor approach.

4.2. Effects of Topography on Canopy Height Modeling

Using the canopy height data generated from the interaction of Sentinel-1&2, GB, and GEDI, the correlations of canopy height with topographic data were emphasized. Figure 15 shows the scatterplot of the predicted canopy height and topographic data such as slope, elevation, and aspect. The R² of the of the canopy height and slope is 0.48. On the other hand, the relationship of elevation and aspect to canopy height is observed to have a very low R², indicating that no particular relationship is observed.

To summarize, the effect of slope on predicted canopy height shows that Sentinel-1, GB, and GEDI have the highest R² compared to the other levels of the same factor.

Topographic features such as aspect, elevation, and slope can significantly influence canopy height prediction models. Integrating these features into the modeling process can enhance the accuracy and interpretability of the predictions. Aspect refers to the direction a slope faces (e.g., north, south, east, or west). It influences the amount of sunlight a location receives and, consequently, can affect vegetation growth. In canopy height prediction, an aspect can impact the microclimate and water availability, affecting vegetation growth patterns. In areas with different aspects, the amount and intensity of sunlight may vary, leading to variations in canopy structure. Including aspects in the model can capture these spatial variations and improve height predictions.

Elevation represents the height of a location above a reference point (usually sea level), and it influences temperature, precipitation, and atmospheric pressure. During canopy height modeling, vegetation tends to exhibit altitude-dependent variations in growth. Different elevations may have different ecological zones and vegetation types. Elevation can indirectly influence canopy height by affecting climate and soil conditions. Models that incorporate elevation information can better capture these variations in vegetation structure.

Slope represents the steepness or incline of the terrain. It affects water drainage, soil erosion, and sunlight exposure. As to the influence on canopy height prediction, slope can influence the distribution of water, nutrients, and sunlight, which in turn affects vegetation growth. Steeper slopes may have different vegetation types and heights compared to flatter areas. Including slope as a feature in canopy height models helps to account for these variations and improves the model’s predictive accuracy.

Combining these three topographic features provides a more comprehensive characterization of the terrain. The combined effects of aspect, elevation, and slope allow for a more nuanced understanding of the relationships between terrain characteristics and vegetation structure. For example, certain aspects combined with specific elevations might create microclimates that are favorable or unfavorable for certain types of vegetation. Models that consider these combined effects can provide more accurate predictions of canopy height. When incorporating topographic features into canopy height prediction models, it is essential to consider the scale of analysis, potential interactions between features, and the availability of accurate topographic data. Additionally, careful validation and testing of model performance are crucial to ensuring the reliability of predictions across different landscapes and conditions.

4.3. Limitations of Generating Wall-to-Wall Canopy Height Information Using Discrete RS Data, Medium-Resolution Passive and Active Satellite Images, and ML Regression Techniques

The integration of RS with ML techniques for canopy height modeling presents various challenges. While this approach offers valuable insights, it also involves complexities that need to be addressed. Some of the challenges associated with integrated RS- and ML-based canopy height modeling are data integration and heterogeneity, lack of ground truth data, complexity of vegetation structure, model overfitting, computational intensity, interpretability, transferability across regions, temporal dynamics, scale dependency, availability and accessibility of data, and handling noisy and incomplete data.

Data integration and heterogeneity: Integrating data from different remote sensing sources (e.g., LiDAR, SAR, or optical imagery) requires careful data fusion. These sources may have different resolutions, scales, and data formats, making it challenging to create a homogeneous dataset for modeling. For this study, the data that were used, like Sentinel-1, Sentinel-2, and topographic data, have a different spatial resolution, like 40 m, 10 m, and 20 m, respectively. During the analysis, the 40 m and 20 m data were rescaled to 10 m, which may have had a negative impact on the original values. For the LiDAR data, the collected ALS data had 1 m resolution but needed to be rescaled to 10 m, 40 m, 17 m, and 25 m to be consistent with the Sentinel-2, Sentinel-1, ICESat-2, and GEDI data, respectively. Again, this could have impaired the data quality.
Lack of ground truth data: Obtaining accurate and sufficient ground truth data for training ML models is often a challenge, especially at a large scale. Ground-based measurements for canopy height across diverse landscapes may be limited or expensive to collect. In this study site, wider coverage was used regarding canopy height prediction. However, in the validation, only the coverage of the ground truth regarding the ALS data was validated due to the availability of the ground truth. It is noted that, when the ALS data were collected, a significant budget was allotted. In modeling, it is better that wider coverage is validated for reliability purposes.
Model overfitting: ML models may overfit the training data, capturing noise and specific patterns that do not generalize well to new data. Balancing model complexity and generalization is crucial to avoid overfitting. Model overfitting in this study was observed in the RF and GB models.

Addressing these challenges involves a combination of advances in algorithm development, careful preprocessing of remote sensing data, collaboration across interdisciplinary teams, and efforts to improve data availability and quality. Additionally, ongoing research in the field is essential to develop innovative solutions and improve the overall robustness of integrated RS- and ML-based canopy height modeling, especially when discrete data are regressed to generate wall-to-wall canopy height.

This study has successfully generated wall-to-wall canopy height information using the discrete data from spaceborne LiDARs, medium-resolution passive and active satellite images, satellite-derived indices, and ancillary data. However, it is limited to general canopy height modeling and was not able to explore the challenges involving the diverse characteristics of vegetation, such as variations in tree species, forest types, and growth stages. Moreover, the generated wall-to-wall canopy height captured the physical pattern of the canopy structure of Lishan Forest when studied visually with real-time high-resolution photos but was not able to significantly address the high canopy height estimation bias. These limitations of the study can serve as baseline information for future research on generating high-accuracy canopy height modeling.

5. Conclusions

This study used discrete data that are provided by spaceborne LiDARs such as GEDI and ICESat-2 to generate wall-to-wall canopy height information using ML-based regression models and satellite images and ancillary data. Evaluation of the results demonstrated that the interaction of the GB regression model, GEDI, and Sentinel-1&2 in canopy height modeling had the best accuracy. Meanwhile, the interaction of RF, ICESat-2, and Sentinel-2 had the lowest accuracy. Moreover, the comparison of the performance of the GEDI and ICESat-2 data in canopy height modeling showed that GEDI data have better agreement with the ALS-based canopy height than ICESat-2. The predicted canopy height has more or less similar appearance to the satellite image when visualized but has low quality when being measured based on RMSE and PRMSE. It is noted that the best accuracy that was attained was 8.00 m and 31.59% based on RMSE and PRMSE, respectively.

This study also concludes that the main and interaction effects of the factors such as characteristics of satellite data, type of regression model, and composition of training data have a significant effect on the accuracy of the predicted canopy height. Furthermore, from the result of Duncan’s post hoc test for the levels of the factors, it can be concluded that regression models such as GB, k-NN, and RF have significant differences as well as training data such as GEDI and ICESat-2. In contrast, the levels of satellite data such as Sentinel-2 and Sentinel-1&2 have no significant difference with each other but are significantly different from Sentinel-1.

This study also concludes that integrating Sentinel-1 data with Sentinel-2 contributes to enhancing canopy height accuracy. The RMSE improved from 8.37 m to 8.00 m, and the PRMSE improved from 33.06% to 31.59% (GEDI-based canopy height). On the other hand, the mean canopy height bias of Sentinel-1, initially 4.03 m, decreased to 3.45 m when integrated with Sentinel-2.

Accurate prediction of canopy height is vital for effective forest management, offering valuable insights into forest structure. It informs decisions on timber harvesting, biodiversity conservation, and ecosystem health. Precise canopy height predictions also improve estimates of biomass and carbon sequestration, essential for understanding forests’ role in climate change mitigation. Knowledge of vegetation height and structure assists in assessing wildfire risk, allowing for the identification of high-fuel-load areas and prioritizing fire management efforts. In contrast, poor-quality canopy height prediction can lead to inaccurate estimates of biomass and carbon sequestration, misguide wildfire risk assessments, and impact decisions on green space allocation, tree planting, and landscape design. It may compromise biodiversity assessments, affect crop management, and influence hydrological models, leading to unreliable assessments of water-related processes. Additionally, it can hinder disaster management, misinform decisions about ecologically valuable areas, and affect the reliability of studies in ecology, climatology, and environmental science.

Author Contributions

Conceptualization and methodology, C.L. and N.D.D.; resources, C.L.; formal analysis, N.D.D.; investigation, visualization, and writing—draft preparation, N.D.D. and C.L.; writing—review and editing, N.D.D. and C.L.; Supervision and Funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Ministry of Science and Technology, Taiwan, through project MOST 111-2221-E-415-006-MY3.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hyyppä, J.; Hyyppä, H.; Inkinen, M.; Engdahl, M.; Linko, S.; Zhu, Y.H. Accuracy comparison of various remote sensing data sources in the retrieval of forest stand attributes. For. Ecol. Manag. 2000, 128, 109–120. [Google Scholar] [CrossRef]
Drake, J.B.; Dubayah, R.O.; Clark, D.B.; Knox, R.G.; Blair, J.B.; Hofton, M.A.; Chazdon, R.L.; Weishampel, J.F.; Prince, S. Estimation of tropical forest structural characteristics using large-footprint lidar. Remote Sens. Environ. 2002, 79, 305–319. [Google Scholar] [CrossRef]
Drake, J.B.; Dubayah, R.O.; Knox, R.G.; Clark, D.B.; Blair, J.B. Sensitivity of large-footprint lidar to canopy structure and biomass in a neotropical rainforest. Remote Sens. Environ. 2002, 81, 378–392. [Google Scholar] [CrossRef]
Lefsky, M.A.; Cohen, W.B.; Harding, D.J.; Parker, G.G.; Acker, S.A.; Gower, S.T. Lidar remote sensing of above-ground biomass in three biomes. Glob. Ecol. Biogeogr. 2002, 11, 393–399. [Google Scholar] [CrossRef]
Houghton, R.A.; Hall, F.; Goetz, S.J. Importance of biomass in the global carbon cycle. J. Geophys. Res. 2009, 114, 1–13. [Google Scholar] [CrossRef]
Zhou, S.; Punt, A.; Deng, R.; Dichmont, C.; Ye, Y.; Bishop, J. Modified hierarchical Bayesian biomass dynamics models for assessment of short-lived invertebrates: A comparison for tropical tiger prawns. Mar. Freshw. Res. 2009, 60, 1298–1308. [Google Scholar] [CrossRef]
Dubayah, R.O.; Sheldon, S.L.; Clark, D.B.; Hofton, M.A.; Blair, J.B.; Hurtt, G.C.; Chazdon, R.L. Estimation of Tropical Forest Height and Biomass Dynamics Using Lidar Remote Sensing at La Selva, Costa Rica: Forest Dynamics using LiDAR. J. Geophys. Res. 2010, 115, G00E09. [Google Scholar] [CrossRef]
Simard, M.; Pinto, N.; Fisher, J.B.; Baccini, A. Mapping Forest canopy height globally with spaceborne lidar. J. Geophys. Res. 2011, 116, G04021. [Google Scholar] [CrossRef]
Asner, G.P.; Mascaro, J.; Muller-Landau, H.C.; Vieilledent, G.; Vaudry, R.; Rasamoelina, M.; Hall, J.S.; Van Breugel, M. A universal airborne LiDAR approach for tropical forest carbon mapping. Oecologia. 2012, 168, 1147–1160. [Google Scholar] [CrossRef] [PubMed]
Carreiras, J.M.; Quegan, S.; Le Toan, T.; Minh, D.H.T.; Saatchi, S.S.; Carvalhais, N.; Reichstein, M.; Scipal, K. Coverage of high biomass forests by the ESA BIOMASS mission under defense restrictions. Remote Sens. Environ. 2017, 196, 154–162. [Google Scholar] [CrossRef]
Jucker, T.; Caspersen, J.; Chave, J.; Antin, C.; Barbier, N.; Bongers, F.; Dalponte, M.; van Ewijk, K.Y.; Forrester, D.I.; Haeni, M.; et al. Allometric equations for integrating remote sensing imagery into forest monitoring programmes. Glob. Chang. Biol. 2017, 23, 177–190. [Google Scholar] [CrossRef]
Tang, H.; Armston, J.; Hancock, S.; Marselis, S.; Goetz, S.; Dubayah, R. Characterizing global forest canopy cover distribution using spaceborne lidar. Remote Sens. Environ. 2019, 231, 111262. [Google Scholar] [CrossRef]
Narine, L.L.; Popescu, S.C.; Malambo, L. Using ICESat-2 to Estimate and Map Forest Aboveground Biomass: A First Example. Remote Sens. 2020, 12, 1824. [Google Scholar] [CrossRef]
Wang, G.; Wang, N.; Guo, W. Modelling Forest Aboveground Biomass Based on GF-3 Dual-Polarized and WorldView-3 Data: A Case Study in Datong National Wetland Park, China. Math. Probl. Eng. 2021, 2021, 9925940. [Google Scholar] [CrossRef]
Stojanova, D.; Panov, P.; Gjorgjioski, V.; Kobler, A.; Džeroski, S. Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 2010, 5, 256–266. [Google Scholar] [CrossRef]
Masek, J.G.; Hayes, D.J.; Hughes, M.J.; Healey, S.P.; Turner, D.P. The role of remote sensing in process-scaling studies of managed forest ecosystems. For. Ecol. Manag. 2015, 355, 109–123. [Google Scholar] [CrossRef]
García, M.; Saatchi, S.; Ustin, S.; Balzter, H. Modelling Forest canopy height by integrating airborne LiDAR samples with satellite Radar and multispectral imagery. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 159–173. [Google Scholar] [CrossRef]
Chen, Q.; Gong, P.; Baldocchi, D.; Tian, Y.Q. Estimating basal area and stem volume for individual trees from lidar data. Photogramm. Eng. Remote Sens. 2007, 73, 1355–1365. [Google Scholar] [CrossRef]
Popescu, S.C.; Zhao, K.; Neuenschwander, A.; Lin, C. Satellite lidar vs. small footprint airborne lidar: Comparing the accuracy of aboveground biomass estimates and forest structure metrics at footprint level. Remote Sens. Environ. 2011, 115, 2786–2797. [Google Scholar] [CrossRef]
Abad-Segura, E.; González-Zamar, M.D.; Vázquez-Cano, E.; López-Meneses, E. Remote Sensing Applied in Forest Management to Optimize Ecosystem Services: Advances in Research. Forests 2020, 11, 969. [Google Scholar] [CrossRef]
Gao, Y.; Skutsch, M.; Paneque-Gálvez, J.; Ghilardi, A. Remote sensing of forest degradation: A review. Environ. Res. Lett. 2020, 15, 103001. [Google Scholar] [CrossRef]
Zhao, P.; Lu, D.; Wang, G.; Wu, C.; Huang, Y.; Yu, S. Examining Spectral Reflectance Saturation in Landsat Imagery and Corresponding Solutions to Improve Forest Aboveground Biomass Estimation. Remote Sens. 2016, 8, 469. [Google Scholar] [CrossRef]
XMura, M.; Bottalico, F.; Giannetti, F.; Bertani, R.; Giannini, R.; Mancini, M.; Orlandini, S.; Travaglini, D.; Chirici, G. Exploiting the capabilities of the Sentinel-2 multi spectral instrument for predicting growing stock volume in forest ecosystems. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 126–134. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Tian, L.; Wu, X.; Tao, Y.; Li, M.; Qian, C.; Liao, L.; Fu, W. Review of Remote Sensing-Based Methods for Forest Aboveground Biomass Estimation: Progress, Challenges, and Prospects. Forests 2023, 14, 1086. [Google Scholar] [CrossRef]
Fernández-Guisuraga, J.M.; Suárez-Seoane, S.; Calvo, L. Radar and multispectral remote sensing data accurately estimate vegetation vertical structure diversity as a fire resilience indicator. Remote Sens. Ecol. 2023, 9, 117–132. [Google Scholar] [CrossRef]
Ulaby, F.; Long, D.; Blackwell, W.; Elachi, C.; Fung, A.; Ruf, C.; Sarabandi, K.; Zyl, J.; Zebker, H. Microwave Radar and Radiometric Remote Sensing; Artech: Morristown, NJ, USA, 2014. [Google Scholar]
Link, M.; Entekhabi, D.; Jagdhuber, T.; Ferrazzoli, P.; Guerriero, L.; Baur, M.; Ludwig, R. Vegetation Effects on Covariations of L-Band Radiometer and C-Band/L-Band Radar Observations. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 357–360. [Google Scholar]
Hawkins, R.K.; Touzi, R.; Livingstone, C.E. Calibration and Use of CV-580 Polarimetric SAR Data. In Proceedings of the Fourth International Airborne Remote Sensing Conference and Exhibition/21st Canadian Symposium on Remote Sensing, Ottawa, ON, Canada, 21–24 June 1999; Volume 2, pp. 32–40. [Google Scholar] [CrossRef]
Baghdadi, N.; Zribi, M.; Paloscia, S.; Verhoest, N.E.C.; Lievens, H.; Baup, F.; Mattia, F. Semi-Empirical Calibration of the Integral Equation Model for Co-Polarized L-Band Backscattering. Remote Sens. 2015, 7, 13626–13640. [Google Scholar] [CrossRef]
El Hajj, M.; Baghdadi, N.; Zribi, M.; Belaud, G.; Cheviron, B.; Courault, D.; Charron, F. Soil moisture retrieval over irrigated grassland using X-band SAR data. Remote Sens. Environ. 2016, 176, 202–218. [Google Scholar] [CrossRef]
Liu, Y.; Gong, W.; Xing, Y.; Hu, X.; Gong, J. Estimation of the Forest Stand Mean Height and Aboveground Biomass in Northeast China Using SAR Sentinel-1B, Multispectral Sentinel-2A, and DEM Imagery. ISPRS J. Photogramm. Remote Sens. 2019, 151, 277–289. [Google Scholar] [CrossRef]
Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-Resolution Mapping of Forest Canopy Height Using Machine Learning by Coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 Data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102163. [Google Scholar] [CrossRef]
Morin, D.; Planells, M.; Baghdadi, N.; Bouvet, A.; Fayad, I.; Le Toan, T.; Mermoz, S.; Villard, L. Improving Heterogeneous Forest Height Maps by Integrating GEDI-Based Forest Height Information in a Multi-Sensor Mapping Process. Remote Sens. 2022, 14, 2079. [Google Scholar] [CrossRef]
Ghosh, S.M.; Behera, M.D.; Paramanik, S. Canopy Height Estimation Using Sentinel Series Images through Machine Learning Models in a Mangrove Forest. Remote Sens. 2020, 12, 1519. [Google Scholar] [CrossRef]
Moghaddam, M.; Dungan, J.L.; Acker, S. Forest Variable Estimation from Fusion of SAR and Multispectral Optical Data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2176–2187. [Google Scholar] [CrossRef]
Takahashi, T.; Yamamoto, K.; Senda, Y.; Tsuzuku, M. Estimating individual tree heights of sugi (Cryptomeria japonica D. Don) plantations in mountainous areas using small-footprint airborne LiDAR. J. For. Res. 2005, 10, 135–142. [Google Scholar] [CrossRef]
Hodgson, M.E.; Bresnahan, E. Accuracy of airborne LiDAR-derived elevation: Empirical assessment and error budget. Photogramm. Eng. Remote Sens. 2004, 70, 331–339. [Google Scholar] [CrossRef]
Gatziolis, D.; Fried, J.S.; Monleon, V.S. Challenges to estimating tree height via LiDAR in closed-canopy forests: A parable from western Oregon. For. Sci. 2010, 56, 139–155. [Google Scholar]
Breidenbach, J.; Koch, B.; Kändler, G.; Kleusberg, A. Quantifying the influence of slope, aspect, crown shape, and stem density on the estimation of tree height at plot level using LiDAR and InSAR data. Int. J. Remote Sens. 2008, 29, 1511–1536. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Lin, C.; Ma, S.E.; Huang, L.P.; Chen, C.I.; Lin, P.T.; Yang, Z.K.; Lin, K.T. Generating a baseline map of surface fuel loading using stratified random sampling inventory data through cokriging and multiple linear regression methods. Remote Sens. 2021, 13, 1561. [Google Scholar] [CrossRef]
Hosseini, M.; McNairn, H.; Merzouki, A.; Pacheco, A. Estimation of Leaf Area Index (LAI) in corn and soybeans using multi-polarization C- and L-band radar data. Remote Sens. Environ. 2015, 170, 77–89. [Google Scholar] [CrossRef]
Fieuzal, R.; Baup, F. Estimation of leaf area index and crop height of sunflowers using multi-temporal optical and SAR satellite data. Int. J. Remote Sens. 2016, 37, 2780–2809. [Google Scholar] [CrossRef]
Hosseini, M.; McNairn, H. Using multi-polarization C- and L-band synthetic aperture radar to estimate biomass and soil moisture of wheat fields. Int. J. Appl Earth Obs. Geoinf. 2017, 58, 50–64. [Google Scholar] [CrossRef]
Nasirzadehdizaji, R.; Balik Sanli, F.; Abdikan, S.; Cakir, Z.; Sekertekin, A.; Ustuner, M. Sensitivity Analysis of Multi-Temporal Sentinel-1 SAR Parameters to Crop Height and Canopy Coverage. Appl. Sci. 2019, 9, 655. [Google Scholar] [CrossRef]
Ranchin, T.; Aiazzi, B.; Alparone, L.; Baronti, S.; Wald, L. Image Fusion: The ARSIS Concept and Some Successful Implementation Schemes. ISPRS J. Photogramm. Remote Sens. 2003, 58, 4–18. [Google Scholar] [CrossRef]
Huete, A.R.; Liu, H.Q.; Batchily, K.; Van Leeuwen, W. A comparison of vegetation indices over a global set of TM images for EOS-MODIS. Remote Sens. Environ. 1997, 59, 440–451. [Google Scholar] [CrossRef]
Mutanga, O.; Adam, E.; Cho, M.A. High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm. Int. J. Appl. Earth Obs. Geoinf. 2012, 18, 399–406. [Google Scholar] [CrossRef]
Kaufman, Y.J.; Tanre, D. Atmospherically Resistant Vegetation Index (ARVI) for EOS-MODIS. IEEE Trans. Geosci. Remote Sens. 1992, 30, 261–270. [Google Scholar] [CrossRef]
Pinty, B.; Verstraete, M. GEMI: A Non-Linear Index to Monitor Global Vegetation from Satellites. Vegetation 1992, 101, 15–20. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N. Remote sensing of chlorophyll concentration in higher plant leaves. Adv. Space Res. 1998, 22, 689–692. [Google Scholar] [CrossRef]
Crippen, R.E. Calculating the vegetation index faster. Remote Sens. Environ. 1990, 34, 71–73. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS (Earth Resources Technology Satellite). In Proceedings of the 3rd Earth Resources Technology Satellite Symposium, Washington, DC, USA, 10–14 December 1973; pp. 309–317. [Google Scholar]
Blackburn, G.A. Spectral indices for estimating photosynthetic pigment concentrations: A test using senescent tree leaves. Int. J. Remote Sens. 1998, 19, 657–675. [Google Scholar] [CrossRef]
Richardson, A.J.; Wiegand, C.L. Distinguishing vegetation from soil background information. Photogramm. Eng. Remote Sens. 1977, 43, 1541–1552. [Google Scholar]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 259–309. [Google Scholar] [CrossRef]
Clevers, J.G.P.W. Application of a weighted infrared-red vegetation index for the estimation of leaf area index. Remote Sens. Environ. 1988, 25, 53–69. [Google Scholar] [CrossRef]
Xu, H. Modification of normalized difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
Neumann, T.A.; Martino, A.J.; Markus, T.; Bae, S.; Bock, M.R.; Brenner, A.C.; Brunt, K.M.; Cavanaugh, J.; Fernandes, S.T.; Hancock, D.W.; et al. The Ice, Cloud, and Land Elevation Satellite-2 mission: A global geolocated photon product derived from the Advanced Topographic Laser Altimeter System. Remote Sens. Environ. 2019, 233, 111325. [Google Scholar] [CrossRef] [PubMed]
Magruder, L.A.; Brunt, K.M.; Alonzo, M. Early ICESat-2 on-orbit Geolocation Validation Using Ground-Based Corner Cube Retro-Reflectors. Remote Sensing. 2020, 12, 3653. [Google Scholar] [CrossRef]
Neuenschwander, A.; Guenther, E.; White, J.C.; Duncanson, L.; Montesano, P. Validation of ICESat-2 terrain and canopy heights in boreal forests. Remote Sens. Environ. 2020, 251, 112110. [Google Scholar] [CrossRef]
Neuenschwander, A.; Pitts, K. The ATL08 land and vegetation product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
Neuenschwander, A.L.; Pitts, K.L.; Jelley, B.P.; Robbins, J.; Klotz, B.; Popescu, S.C.; Nelson, R.F.; Harding, D.; Pederson, D.; Sheridan, R. ATLAS/ICESat-2 L3A Land and Vegetation Height; Version 5 [Data Set]; NASA National Snow and Ice Data Center Distributed Active Archive Center: Boulder, CO, USA, 2021. [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Lin, C.; Tsogt, K.; Zandraabal, T. A decompositional stand structure analysis for exploring stand dynamics of multiple attributes of a mixed-species forest. For. Eco. Manag. 2016, 378, 111–121. [Google Scholar] [CrossRef]
Dong, J.; Ni, W.; Zhang, Z.; Sun, G. Performance of ICESat-2 ATL08 product on the estimation of forest height by referencing to small footprint LiDAR data. Natl. Remote Sens. Bull. 2021, 25, 1294–1307. [Google Scholar] [CrossRef]
Zhu, X.; Nie, S.; Wang, C.; Xi, X.; Hu, Z. A ground elevation and vegetation height retrieval algorithm using micro-pulse photon-counting Lidar data. Rem. Sens. 2018, 10, 1962. [Google Scholar] [CrossRef]
Pang, S.; Li, G.; Jiang, X.; Chen, Y.; Lu, Y.; Lu, D. Retrieval of forest canopy height in a mountainous region with ICESat-2 ATLAS. For. Ecosyst. 2022, 9, 100046. [Google Scholar] [CrossRef]
Wasserstein, R.L.; Lazar, N.A. The ASA Statement on p-Values: Context, Process, and Purpose. Am. Stat. 2016, 70, 129–133. [Google Scholar] [CrossRef]

Figure 1. The geographical location of the study site and training and validation data; the location of the study site within Taiwan (a); the Sentinel-2 image of the study site with an acquisition date of 2022-08-23 (b); the locations of the GEDI and ICESat-2 plots (training data) and the ALS data (validation data) (c); and the magnified view of 2 ha × 2 ha plot (d).

Figure 2. Flowchart of retrieving the backscatter of SAR and generation of backscatter indices.

Figure 3. Flowchart of retrieving the interferometric coherence.

Figure 4. The overall procedure for the integrated RS- and ML-regression-techniques-based canopy height modeling.

Figure 5. Comparison of the spaceborne-LiDAR-based canopy height with the ALS-based canopy height; (a) GEDI and ALS, (b) ICESat-2 and ALS.

Figure 6. The scatterplot between the ALS-based canopy height and GEDI-based canopy height (a) and between the ALS-based canopy height and ICESat-2-based canopy height (b) and the descriptive statistics of the canopy height bias of the GEDI-based canopy height and ICESat-2-based canopy height. The solid line indicates the trendline of the equation while the dotted lines indicate the 1:1 line.

Figure 7. The spatial distribution of the k-NN-based canopy height using the spaceborne LiDARs as training data and satellite images acquired in different seasons.

Figure 8. Percentage increase in mean square error (%IncMSE) of each predictor of the RF-based canopy height modeling of each dataset.

Figure 9. The RMSE and PRMSE of the different factors of the canopy height modeling. (a) for the regression models, (b) for the satellite data, and (c) for the training data.

Figure 10. Spatial and frequency distribution of the agreement of the canopy height product produced using ICESat-2 and GEDI data.

Figure 11. Spatial distribution of the canopy height prediction bias (m) produced by the different ML-based regression models.

Figure 12. The correlation between the predicted canopy height and canopy height prediction bias and the observed canopy height values. The solid red line indicates the trendline of the equation.

Figure 13. Accuracy of the canopy height prediction before and after the integration of Sentinel-1 features to optical remote sensing data features.

Figure 14. Spatial distribution and descriptive statistics of the canopy prediction bias before and after the integration of Sentinel-1 to optical data in canopy height estimation.

Figure 15. Correlation of the predicted canopy height and topographic data.

Table 1. Properties and sensitivity to forest vertical structure of the distinct remote sensing sensors.

Remote Sensing Sensor	Distinct Properties	Sensitivity to Forest Vertical Structure
Spaceborne LiDAR (ICESat-2 and GEDI)	Discrete data; global-wide coverage, provided freely	High sensitivity
TLS, UAV-LiDAR system, ALS	Wall-to-wall data but with limited coverage, costly, limited operational capabilities	High sensitivity
Optical sensors such as Sentinel-2, Landsat, etc.	Wall-to-wall data; global-wide coverage; provided freely	Very low sensitivity
SAR sensors like Sentinel-1	Wall-to-wall; global-wide coverage; provided freely	Limited sensitivity

Table 2. Technical characteristics of Sentinel-1 images used in this study for retrieval of interferometric coherence (SLC product).

Granules/Product ID	Pass Direction	Orbit Number	Baseline
Granules/Product ID	Pass Direction	Orbit Number	Perpendicular	Temporal
Reference: S1B_IW_SLC_1SDV_20210114T215134_20210114T215204_025156_02FEC4_6FDD	Descending	25,156	186	6
Secondary: S1A_IW_SLC_1SDV_20210120T215221_20210120T215251_036227_043FC3_1F47	Descending	36,227	186	6

Table 3. Technical characteristics of Sentinel-1 images used in this study for retrieval of backscatter coefficient (GRD product).

Granule	S1B_IW_GRDH_1SDV_20210114T215134_20210114T215203_025156_02FEC4_FAD7
Mission	S1B
Beam Mode	IW
Start Date of Acquisition	2021/01/14
Start Time of Acquisition	21:51:34
Stop Time of Acquisition	21:52:04
Season	Winter
Polarization	Dual; VV + VH; Primary Polarization: V
Orbit Type:	Precise
Orbit number	25156
Path/Frame	105/509
Flight direction	Descending
Pixel Spacing	10 m

Table 4. SAR-based indices for the development of canopy height model.

	Equation	Reference
Sum	VV + VH	[48]
Difference	VV − VH
Ratio	VV/VH
NDI	NDI = (VV − VH)/(VV + VH)
RVI	RVI = 4 × VH/(VV + VH)

Table 5. List of spectral indices that were integrated with the surface reflectance of Sentinel-2 in canopy height estimation in this study.

Spectral index	Reference	Value indication
Atmospherically resistant vegetation index (ARVI)	[52]	Vegetation Index; range for an ARVI is −1 to 1, where green vegetation generally falls between values of 0.20 to 0.80.
Global Environmental Monitoring Index (GEMI)	[53]	Positive values indicate the presence of vegetation (with greater values indicating healthier vegetation). Negative values indicate a lack of vegetation (water, rock, or soil).
Green Normalized Differential Vegetation Index (GNDVI)	[54]	Values range from −1 to 1; −1 to 0 indicates the presence of water or bare soil.
Infrared Percentage Vegetation Index (IPVI)	[55]	Vegetation Index; IPVI is functionally equivalent to NDVI and RVI, but it only ranges in value from 0.0 to 1.0
Second Modified Soil Adjusted Vegetation Index (MSAVI2)	[56]	Positive values indicate the presence of vegetation (with greater values indicating healthier vegetation). Negative values indicate a lack of vegetation (water, rock, or soil).
Normalized Differential Vegetation Index (NDVI)	[57]	Range from −1.0 to 1.0, with negative values indicating clouds and water, positive values near zero indicating bare soil, and higher positive values of NDVI ranging from sparse vegetation (0.1–0.5) to dense green vegetation (0.6 and above).
Pigment Specific Simple Ratio Algorithm (PSSRa)	[58]	Vegetation Index
Perpendicular Vegetation Index (PVI)	[59]	Values ranged from −1.0 to 1.0; similar to the DVI
Soil Adjusted Vegetation Index (SAVI)	[60]	Negative values are either water or urban areas and positive values are vegetation. The higher the values, the denser or healthier the vegetation
Weighted Difference Vegetation Index (WDVI)	[61]	Can be used for estimating the LAI of green vegetation.
Modified Normalized Difference Water Index (MNDWI)	[62]	The value ranges from −1 to +1. Negative values represent non-water bodies. Positive values >0.5 are water bodies.

Table 6. Descriptive statistics of the GEDI- and ICESat-2-based canopy height that was used for training the ML-based regression models.

	Canopy Height (m)				N
	Min	Max	Mean	SD	N
GEDI	1.90	89.77	26.94	12.45	14,910
ICESat-2	1.96	149.96	34.92	19.42	9059

Table 7. Descriptive statistics of the k-NN-based canopy height using the spaceborne LiDARs as training data and satellite images acquired in different seasons.

		Sentinel-1		Sentinel-2		Sentinel-1&2
		ICESat-2	GEDI	ICESat-2	GEDI	ICESat-2	GEDI
GB	Min	−3.79	−4.52	2.86	0.66	5.89	0.06
	Max	86.25	66.71	107.16	68.56	100.63	70.15
	Mean	37.54	26.44	37.75	26.53	37.66	26.53
	SD	7.05	5.80	6.97	6.05	6.73	6.18
k-NN	Min	0	0	0	0	0	0
	Max	146.84	99.64	146.84	99.64	146.84	99.64
	Mean	37.70	26.64	37.51	26.54	37.96	26.50
	SD	5.93	5.21	5.97	5.53	6.18	5.67
RF	Min	−1.30 × 10⁻¹⁴	−2.40 × 10⁻¹⁴	5.59	2.22	6.09	2.24
	Max	103.69	72.55	105.16	71.77	105.52	73.28
	Mean	38.40	26.71	38.87	26.80	38.90	26.86
	SD	7.64	6.15	6.85	6.16	6.76	6.17

Table 8. Correlation of the ML-based predicted canopy height with the ALS-based canopy height.

Regression Model	Satellite Date	Training Data	Correlation with ALS-Based Canopy Height
GB	Sentinel-1&2	GEDI	0.64 **
GB	Sentinel-2	GEDI	0.64 **
k-NN	Sentinel-1&2	GEDI	0.63 **
k-NN	Sentinel-2	GEDI	0.63 **
RF	Sentinel-1&2	GEDI	0.62 **
RF	Sentinel-2	GEDI	0.62 **
GB	Sentinel-1	GEDI	0.52 **
k-NN	Sentinel-1&2	ICESat-2	0.51 **
k-NN	Sentinel-2	ICESat-2	0.50 **
GB	Sentinel-1&2	ICESat-2	0.49 **
RF	Sentinel-1	GEDI	0.49 **
GB	Sentinel-2	ICESat-2	0.49 **
k-NN	Sentinel-1	GEDI	0.47 **
RF	Sentinel-1&2	ICESat-2	0.44 **
GB	Sentinel-1	ICESat-2	0.42 **
RF	Sentinel-2	ICESat-2	0.41 **
k-NN	Sentinel-1	ICESat-2	0.40 **
RF	Sentinel-1	ICESat-2	0.38 **

** Significant at 0.05 level of significance.

Table 9. The main and interaction effects of the regression model, satellite data, and training data to the RMSE of the canopy height modeling as revealed by 3-way full factorial ANOVA design.

Source	Type III Sum of Squares	df	Mean Square	F	p-Value	Partial Eta Squared
Corrected Model	1772.63a	17	104.27	60,153.05	<0.001	1.00
Intercept	23,219.30	1	23,219.30	13,394,830.97	<0.001	1.00
Regression model (RM)	9.65	2	4.82	2782.22	<0.001	0.97
Satellite data (SD)	0.67	2	0.34	193.09	<0.001	0.70
Training data (TD)	1753.14	1	1753.14	1,011,356.76	<0.001	1.00
RM × SD	0.62	4	0.16	89.72	<0.001	0.69
RM × TD	4.07	2	2.04	1173.73	<0.001	0.94
SD × TD	3.41	2	1.71	983.93	<0.001	0.92
RM × SD × TD	1.08	4	0.27	155.06	<0.001	0.79
Error	0.28	162	0.002
Total	24,992.21	180
Corrected Total	1772.91	179

Table 10. The main and interaction effects of the regression model, satellite data, and training data to the PRMSE of the canopy height modeling as revealed by 3-way full factorial ANOVA design.

Source	Type III Sum of Squares	df	Mean Square	F	p-Value	Partial Eta Squared
Corrected Model	27,659.84a	17	1627.05	36,776.46	<0.001	1.00
Intercept	36,2307.60	1	362,307.60	8,189,297.55	<0.001	1.00
RM	150.51	2	75.25	1701.00	<0.001	0.96
SD	10.45	2	5.22	118.07	<0.001	0.59
TD	27,355.68	1	27,355.68	618,324.81	<0.001	1.00
RM × SD	9.71	4	2.43	54.86	<0.001	0.58
RM × TD	63.50	2	31.75	717.61	<0.001	0.90
SD × TD	53.23	2	26.61	601.54	<0.001	0.88
RM × SD × TD	16.78	4	4.19	94.80	<0.001	0.70
Error	7.17	162	0.04
Total	389,974.60	180
Corrected Total	27,667.01	179

Table 11. The mean RMSE and PRMSE of the predicted canopy height are based on the interaction of satellite data, training data, and regression model.

Regression Model	Satellite Data	Training Data	Mean RMSE (m)	Mean PRMSE (%)
GB	Sentinel-1&2	GEDI	8.00 ± 0.03 a	31.59 ± 0.12 a
k-NN	Sentinel-1&2	GEDI	8.05 ± 0.03 b	31.78 ± 0.15 b
GB	Sentinel-2	GEDI	8.06 ± 0.03 b	31.84 ± 0.14 b
k-NN	Sentinel-2	GEDI	8.11 ± 0.04 c	32.04 ± 0.15 c
RF	Sentinel-1&2	GEDI	8.16 ± 0.03 d	32.24 ± 0.14 d
RF	Sentinel-2	GEDI	8.21 ± 0.04 e	32.42 ± 0.17 d
GB	Sentinel-1	GEDI	8.37 ± 0.02 f	33.06 ± 0.13 e
k-NN	Sentinel-1	GEDI	8.52 ± 0.03 g	33.66 ± 0.16 f
RF	Sentinel-1	GEDI	8.65 ± 0.03 h	34.18 ± 0.14 g
GB	Sentinel-1&2	ICESat-2	13.89 ± 0.06 i	54.86 ± 0.26 h
GB	Sentinel-2	ICESat-2	14.05 ± 0.06 j	55.51 ± 0.28 i
GB	Sentinel-1	ICESat-2	14.08 ± 0.05 j	55.62 ± 0.26 i
k-NN	Sentinel-1	ICESat-2	14.32 ± 0.05 k	56.56 ± 0.28 j
k-NN	Sentinel-2	ICESat-2	14.37 ± 0.04 l	56.78 ± 0.22 k
RF	Sentinel-1	ICESat-2	14.72 ± 0.05 m	58.14 ± 0.26 l
k-NN	Sentinel-1&2	ICESat-2	14.76 ± 0.04 n	58.32 ± 0.22 l
RF	Sentinel-1&2	ICESat-2	15.02 ± 0.05 o	59.32 ± 0.26 m
RF	Sentinel-2	ICESat-2	15.09 ± 0.05 p	59.62 ± 0.29 n

Note: The letters indicate the groupings based on the average RMSE (m) and RMPSE (%) of the factors. Models with the same groupings indicate no significant difference.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Doyog, N.D.; Lin, C. Generating Wall-to-Wall Canopy Height Information from Discrete Data Provided by Spaceborne LiDAR System. Forests 2024, 15, 482. https://doi.org/10.3390/f15030482

AMA Style

Doyog ND, Lin C. Generating Wall-to-Wall Canopy Height Information from Discrete Data Provided by Spaceborne LiDAR System. Forests. 2024; 15(3):482. https://doi.org/10.3390/f15030482

Chicago/Turabian Style

Doyog, Nova D., and Chinsu Lin. 2024. "Generating Wall-to-Wall Canopy Height Information from Discrete Data Provided by Spaceborne LiDAR System" Forests 15, no. 3: 482. https://doi.org/10.3390/f15030482

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generating Wall-to-Wall Canopy Height Information from Discrete Data Provided by Spaceborne LiDAR System

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Site

2.2. Active and Passive Remote Sensing Imageries

2.2.1. Sentinel-1

2.2.2. Sentinel-2

2.3. Ancillary Data

2.4. RS-Based Canopy Height Data

2.4.1. ALS-Based Canopy Height

2.4.2. GEDI-Based Canopy Height

2.4.3. ICESat-2-Based Canopy Height

2.5. Machine Learning-Based Canopy Height Modeling

2.5.1. Gradient Boosting (GB) Regression

2.5.2. k-Nearest Neighbors (k-NN)-Regression

2.5.3. Random Forest (RF) Regression

2.6. Assessing the Effects of Different Factors of Canopy Height Modeling Using Full-Factorial ANOVA Design

2.7. Determination of Appropriate Factors for Canopy Height Estimation

2.8. Performance Evaluation of the Canopy Height Modeling

2.8.1. Determining the Minimum Number of Test Samples for Accuracy Validation

2.8.2. Evaluation Criteria for the Accuracy Assessment

3. Results

3.1. Data Distribution of the Retrieved Spaceborne-LiDAR-Based Canopy Height

3.2. Agreement of ALS-, GEDI-, and ICESat-2-Based Canopy Height

3.3. ML-Based Canopy Height Modeling

3.4. Correlation of the ML-Based Predicted Canopy Height with the ALS-Based Canopy Height

3.5. Performance of the ML-Based Predicted Canopy Height Based on RMSE (m) and PRMSE (%)

3.6. Agreement of GEDI and ICESat-2 in Canopy Height Modeling

3.7. Comparison of the Spatial Distribution of the Canopy Height Prediction Bias of GEDI and ICESat-2

4. Discussion

4.1. Effects of Integrating SAR Data with Optical data in Canopy Height Modeling

4.2. Effects of Topography on Canopy Height Modeling

4.3. Limitations of Generating Wall-to-Wall Canopy Height Information Using Discrete RS Data, Medium-Resolution Passive and Active Satellite Images, and ML Regression Techniques

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI