Next Article in Journal
How Does Ant Forest Influence Low Carbon Consumption Behavior: An Analysis Based on the S-O-R Model
Previous Article in Journal
Nonlinear and Synergistic Effects of Built Environment Indicators on Street Vitality: A Case Study of Humid and Hot Urban Cities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration

1
College of Marine Science, Shanghai Ocean University, Shanghai 201306, China
2
Shanghai Estuary Marine Surveying and Mapping Engineering Technology Research Center, Shanghai 201306, China
3
Nansha Islands Coral Reef Ecosystem National Observation and Research Station, Guangzhou 510300, China
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(5), 1735; https://doi.org/10.3390/su16051735
Submission received: 22 November 2023 / Revised: 25 January 2024 / Accepted: 25 January 2024 / Published: 20 February 2024

Abstract

:
Forest canopy height is an important indicator of the forest ecosystem, and an accurate assessment of forest canopy height on a large scale is of great significance for forest resource quantification and carbon sequestration. The retrieval of canopy height based on remote sensing provides a possibility for studying forest ecosystems. This study proposes a new method for estimating forest canopy height based on remote sensing. In this method, the GEDI satellite and ICESat-2 satellite, which are different types of space-borne lidar products, are used to cooperate with the Landsat 9 image and SRTM terrain data, respectively. Two forest canopy height-retrieval models based on multi-source remote sensing integration are obtained using a random forest regression (RFR) algorithm. The study, conducted at a forest site in the northeastern United States, synthesized various remote sensing data sets to produce a robust canopy height model. First, we extracted relative canopy height products, multispectral features, and topographic data from GEDI, ICESat-2, Landsat 9, and SRTM images, respectively. The importance of each variable was assessed, and the random forest algorithm was used to analyze each variable statistically. Then, the random forest regression algorithm was used to combine these variables and construct the forest canopy height model. Validation with airborne laser scanning (ALS) data shows that the GEDI and ICESat-2 models using a single data source achieve better accuracy than the Landsat 9 model. Notably, the combination of GEDI, Landsat 9, and SRTM data (R = 0.92, MAE = 1.91 m, RMSE = 2.78 m, and rRMSE = 12.64%) and a combination of ICESat-2, Landsat 9, and SRTM data (R = 0.89, MAE = 1.84 m, RMSE = 2.54 m, and rRMSE = 10.75%). Compared with the least accurate Landsat 9 model, R increased by 29.58%, 93.48%, MAE by 44.64%, 46.20%, RMSE by 42.80%, 49.40%, and the rRMSE was increased by 42.86% and 49.32%, respectively. These results fully evaluate and discuss the practical performance and benefits of multi-source data retrieval of forest canopy height by combining space-borne lidar data with Landsat 9 data, which is of great significance for understanding forest structure and dynamics. The study provides a reliable methodology for estimating forest canopy height and valuable insights into forest resource management and its contribution to global climate change.

1. Introduction

In recent years, the negative terrestrial ecological impacts of climate change have become more apparent and are very likely to intensify over the coming decades [1,2,3]. Forests constitute a crucial segment of land-based ecosystems and are instrumental in sustaining the cycle of carbon on a global scale. They play a key role in counteracting the effects of global warming, enhancing environmental conditions, and maintaining the equilibrium of natural ecosystems [4,5,6]. The measurement of canopy height in forests is a critical indicator of the overall structure of a forest and its capacity to store carbon. Large-area and high-resolution forest canopy height data are fundamental to regional and global forest carbon stock estimates. It is an important part of assessing the global carbon cycle and estimating forest carbon storage, biomass, and forest dynamics [7,8,9,10]. Traditional methods of measuring forest canopy height rely on field measurements [11]. Manual surveys based on resource inventories can provide accurate and high-resolution data, but they are resource-intensive, requiring extensive human, material, and time resources [10]. However, these methods are only applicable to small forest areas, posing challenges in obtaining forest canopy height data for large regions. Conversely, remote sensing technology has advanced, allowing for the rapid, accurate, continuous, dynamic, long-term, and non-destructive estimation of forest canopy height through the extraction of horizontal structural information from remote sensing data over large areas [12,13]. Various methods, including optical remote sensing, microwave remote sensing, and lidar, can be used for forest canopy height retrieval. These methods significantly improve the efficiency and accuracy of forest canopy height retrieval without causing damage to forest ecosystems [14].
The Landsat series of Earth observation satellites is the longest-running program in Earth observation, continuously monitoring the global land surface on a moderate scale. Millions of high-quality medium-resolution multispectral data have been acquired. These data are stored at receiving stations in the United States and around the world for research on global change [15]. The most recent addition to the Landsat program, Landsat 9, was launched in September 2021 by the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS) [16]. Landsat 9 is equipped with enhanced sensors, including the Thermal Infrared Sensor-2 (TIRS-2) and Operational Land Imager-2 (OLI-2). Unlike its predecessor, Landsat 8, the instrument provides radiometric and geometric data superior to those on the previous generation of Landsat satellites. Landsat 9 OLI-2 has a higher radiometric resolution, increasing the quantization from Landsat 8 from 12 bits to 14 bits, enabling the sensor to detect finer differences, especially in darker areas such as water or dense forests, and has higher imaging capabilities to capture horizontal structure data of large-scale forest canopy [17,18,19]. Compared to Landsat 8 thermal infrared sensors (TIRS), the TIRS-2 also significantly reduces stray light, resulting in improved atmospheric correction and more accurate surface temperature measurements. In addition, it contains abundant spectral features and vegetation indices, which can effectively reduce the resource consumption of field measurements. It is an important data source for estimating forest canopy height [16,19]. However, it faces challenges in acquiring vertical structural characteristics of forests due to poor penetration, saturation, low data quality, and substantial errors in estimating forest canopy height. Consequently, these factors impact the accuracy of the estimation [19,20]. However, space-borne lidar has a higher orbital altitude, wider observation range, and high penetration. These data provide a solution to the challenges faced by Landsat satellite image data. In addition, it also provides reliable data support for large-scale terrain elevation and forest canopy height inversion [21]. In 2018, NASA embarked on a groundbreaking endeavor by launching two laser altimeter missions: ICESat-2 (Ice, Cloud, and Land Elevation Satellite 2) and GEDI (Global Ecosystem Dynamics Investigation) [22]. GEDI, a cutting-edge multibeam waveform lidar, is tasked with measuring the global tree height and canopy density through its unique eight laser beam ground trajectories [23]. The system is the world’s first spaceborne lidar altimeter system for high-resolution measurement of three-dimensional vertical structure and topography of tropical and temperate forest vegetation [24,25,26]. ICESat-2 boasts the Advanced Topographic Laser Altimeter System (ATLAS), incorporating pioneering micropulse multibeam photon counting lidar technology, a first for satellite applications. The primary scientific objective is to gauge vegetation canopy height accurately [27]. ICESat-2 and GEDI have unique advantages and represent the most advanced level of the space-borne laser altimeter. They have significantly expanded our capabilities in understanding and studying forest biomass, carbon, and water cycles and have opened new avenues for biodiversity conservation [23,27,28].
Previous studies have successfully utilized ICESat-2 and GEDI data to estimate forest height [29,30,31]. However, due to the characteristics of multispectral data and space-borne lidar data, researchers have been exploring the combination and application of multiple sensors and measurements to retrieve forest parameters [32]. Space-borne lidar data directly provide precise information about the vertical vegetation structure [33]. When combined with optical remote sensing data or radar data, this combination enables the large-scale and comprehensive estimation of forest attributes, including height [20,34]. Qi et al. [35] explored the efficacy of simulating GEDI data for enhancing TDX InSAR data height estimation in temperate forests, mountainous coniferous forests, and tropical rainforests. The results demonstrated that combining both data types led to more accurate forest canopy height estimation than using just one data source [35]. Potapov et al. [36] and Tyukavina et al. [37] emphasized the significance of incorporating Landsat-based surface phenology indices into forest structure modeling. Potapov et al. [38] utilized a superpixel machine learning algorithm to generate a global forest canopy height map at a resolution of 30 m by integrating GEDI and Landsat 8 data. The findings indicated that, in comparison to ALS data, the RMSE and MAE were 9.07 m and 6.36 m, respectively. Wang et al. [39] integrated WorldView-2 stereo images with Landsat 7 data and employed six machine learning methods to estimate forest canopy height. The results demonstrated that the gradient-enhanced regression method provided the most accurate estimation of CHM, with the R of 0.64 and RMSE of 3.1 m. Additionally, the study highlighted the limitations of using solely very high-resolution stereo images for CHM estimation and suggested that integrating CHM from WorldView-2 stereo images with satellite vegetation indices could enhance the accuracy of CHM estimation. Ren et al. [40] demonstrated the accuracy of GEDI LiDAR data and multitemporal Sentinel-2 images in estimating forest diversity in large areas using four regression algorithms in machine learning. In a similar vein, Li et al. [9] integrated data from the ICESat-2 lidar with observations from Sentinel-1, Sentinel-2, and Landsat-8 satellites. They employed deep learning techniques and random forest models to generate a high-resolution CHM. Furthermore, Xi et al. [41] utilized a combination of ICESat-2 lidar data and Sentinel-1 and Sentinel-2 data. They employed random forest and gradient-boosting decision tree methods in machine learning to develop a forest canopy height estimation model. Likewise, Guan et al. [42] employed GEDI and Landsat 8 data to generate canopy height maps for estimating forest age. Zhu et al. [43], based on the combination of GED lidar data and Landsat 8 and Landsat 9 data, developed a forest canopy height estimation model using the BP neural network in machine learning. The results show that the Landsat 9 model is superior to the Landsat 8 model, and the combination model of GEDI and Landsat 9 is the best. These studies collectively demonstrate the feasibility and effectiveness of combining space-borne lidar data with multispectral data for forest parameter retrieval, using machine learning methods to achieve significant results. Despite the advancements in space-borne lidar technology represented by GEDI and ICESat-2, research on the synergistic application of the most recent Landsat 9 data with these lidar systems is limited. Although the combination of GEDI and Landsat 9 data has been discussed in the literature [43], the time consistency processing of Landsat 9 images in the literature still needs to be optimized. At the same time, the extraction of Landsat 9 data feature variables in the literature only analyzes part of vegetation index variables, and there is no more comprehensive selection and analysis of Landsat 9 data feature variables. To date, there have been few studies on the potential integration of ICESat-2 and Landsat 9 data to assess forest canopy height. GEDI and ICESat-2 stand at the forefront of space-borne lidar, yet the real-world performance and benefits of combining these data with Landsat 9 for detailed forest canopy height retrieval have not been fully evaluated or discussed. These gaps highlight the great opportunity to enhance forest canopy height research and improve understanding of forest structure and dynamics by optimizing data processing, assessment, and analysis, integrating advanced data sources, and using superior machine learning methods.
Therefore, this study employs GEDI data, ICESat-2 data, Landsat 9 OLI-2 data, and SRTM terrain factors as the primary data sources. Various combinations of data sources are employed and compared to individual data sources. The random forest regression algorithm is employed to model the retrieval of forest canopy height within the study area. The canopy height model (CHM) extracted from airborne lidar is integrated as reference data. Lastly, the accuracy and precision of models using single and multiple data sources are compared and analyzed. In this study, the superiority and reliability of forest canopy height retrieval results of the random forest regression model based on multi-source data integration were fully evaluated by combining Landsat 9 data with different models of space-borne lidar data. It is proved that it is possible to obtain forest canopy height with low cost, high accuracy, and high efficiency in a large area, which provides a reference for regional scale forest remote sensing monitoring.

2. Materials and Methods

2.1. Study Area

This research concentrates on the two forested areas located in the northeastern United States. The chosen study areas were Harvard Forest (HARV) in Worcester County, Massachusetts (42°32′24″ N, 72°10′12″ W) (see Figure 1). HARV serves as a representative rural/wild region in the Northeast, with elevations ranging from 160 to 415 m and an average elevation of 348 m. Its climate is classified as temperate–continental–humid. The main vegetation in this region is an Eastern deciduous temperate forest, primarily featuring red maple (Acer rubrum) and white pine (Pinus strobus) [44,45]. This study involved data collection from the selected study areas, as well as the development and testing of relevant methodologies.

2.2. Research Data

2.2.1. NEON Airborne LiDAR Data

The airborne lidar data used in this study were obtained from the National Ecological Observatory Network (NEON). NEON’s airborne observation platform is equipped with an imaging spectrometer, an airborne lidar, and a high-resolution color camera. This initiative aims to gather a range of data, including multispectral images, lidar measurements, and varied ecological indicators, to investigate the impacts of climatic shifts, alterations in land utilization, and the proliferation of invasive species on ecosystems at a continental scale [46,47,48,49,50]. The airborne lidar-specific parameters can be found in Table 1 [22]. The NEON lidar-gridded data products used in this study were primarily acquired in August 2022 and can be downloaded from the following address: Explore Field Sites|NSF NEON|Open Data to Understand our Ecosystems (neonscience.org) [51]. The canopy heights for reference forests were determined by selecting the 90th percentile value within the 25 m × 25 m grid of the canopy height model (CHM) [36].

2.2.2. GEDI Data

Table 2 [22,24,52] provides a record of the parameter names and sizes used in the GEDI system. The GEDI system consists of three lasers. Two of the lasers operate independently at full power, while the third laser is bifurcated into two beams, generating four beams. These beams are then used to generate eight light spot trajectories on the surface of the Earth through beam jittering. The radar records the complete waveforms of the footprints [24], which can be seen in Figure 2. For temporal consistency with airborne LiDAR data, this study downloaded the latest GEDI L2A version 2 products from January 2022 to December 2022 via the NASA Earth Data website (https://search.earthdata.nasa.gov/) (accessed on 12 October 2023) [53,54].
We filter out invalid GEDI footprint data by utilizing parameters inherent in GEDI L2A data to ensure high-quality footprint points are selected for training and validation. From the GEDI L2A product, we filtered the GEDI footprint within the study area based on latitude and longitude. This filtering process is guided by the following parameters: quality_flag = 1, rx_assess_flag = 0, rx_algrunflag_flag = 1, degrade_flag = 0, and sensitivity ≥0.9 as a method of selecting the desired footprint [52]. The optimal algorithm was analyzed to obtain the final valid GEDI dataset by evaluating the data for the six algorithms (see Table 3) within the GEDI L2A footprint and determining the common valid footprint among these algorithms [29,31,54].

2.2.3. ICESat-2 Data

ICESat-2/ATLAS includes 21 standard data products ATL01~ATL21, which are divided into five levels, represented by Level 0, Level 1, Level 2, Level 3A, and Level 3B, respectively [27,55]. Among them, the data products related to forest vegetation include ATL03 and ATL08. ATL03 provides a global positioning photonic data product, and ATL08 is a terrestrial vegetation data product that provides topographic elevation, canopy height, and relative height indicators every 100 m along the route through the official processing of ATL03 data. In addition, ATL08 can be associated with ATL03 products to provide the opportunity to calculate ground elevation, canopy height, and relative height indicators at different scales [41,56,57,58,59].
ATL03 and ATL08 version 6 data products are used in this study. Considering the acquisition time of airborne lidar data, GEDI, and Landsat 9, the inaccuracy of forest canopy height inversion caused by time difference is avoided. The data acquisition time is from 1 January 2022 to 31 December 2022. The data come from the National Snow and Ice Data Center of the United States (https://nsidc.org/data/atl08/versions/6) (accessed on 12 October 2023). Details are shown in Table 4. To facilitate subsequent analysis, we extracted a set of relative height indicators ATL08 data: RH25, RH50, RH60, RH75, RH85, RH90, RH98, and RH100. For ATL03 data, we obtained signal photons by matching them to corresponding ATL08 data products and extracted forest height metrics from these signal photons for each 25 m swath.
To ensure the consistent analysis of elevation data from various datasets, we utilized NOAA’s VDatum, which can be accessed at https://vdatum.noaa.gov/vdatumweb/ (accessed on 12 October 2023), to align the GEDI data and ICESat-2 data vertical datum with that of NEON. It is worth noting that crown height remains relative to the ground and is unaffected by the chosen vertical datum [22].

2.2.4. Landsat 9 Data

Because discrete GEDI and ICESat-2 footprints do not provide continuous coverage, it is necessary to prepare optical remote sensing data for omnidirectional forest height estimation. Mid-resolution Landsat 9 data are improved based on the previous generation. They have the advantages of availability, wide coverage, and high accuracy, but they have not been widely used in the study of omni-directional forest height estimation. This study mainly uses Landsat 9 optical remote sensing data. A total of 73 Landsat9 Level 1T data images covering the study area from January to December 2022 were acquired from the Google Earth Engine (GEE) website platform (http://developers.google.cn/) (accessed on 12 October 2023).
The GEE cloud removal algorithm was applied to the images, and an annual synthesis of time series images was performed to generate Landsat 9 data covering the study area in 2022 through median processing [60] (see Figure 3). Based on spectral characteristics, texture characteristics, and the vegetation index, the canopy spectral reflectance information is provided using seven bands of coastal, blue, green, red, NIR, SWIR1, and SWIR2 after image processing of Landsat 9 data (see Table 5) [61]. Six kinds of feature variables are extracted: the single-band factor, multi-band factor, vegetation index, principal component analysis (PCA) factor, minimum noise fraction (MNF) transform, and texture factor [62,63,64,65]. Further details of all Landsat 9-related variables extracted in this study are summarized in Table 6.

2.2.5. Auxiliary Data

Due to factors such as sunlight, soil moisture, and topographical changes, plant growth varies at different slopes and orientations, resulting in different canopy heights. To mitigate the effects of slope, aspect, and elevation on terrain and canopy height, we introduced ancillary data called the Shuttle Radar Terrain Mission (SRTM) [66,67]. SRTM Digital Elevation Model (DEM) data are available in two resolutions: 30 m and 90 m [68]. Topographic parameters such as elevation, slope, and aspect were extracted from SRTM data with a resolution of 30 m in 2022 obtained from http://earthexplorer.usgs.gov/ (accessed on 12 October 2023). To ensure the consistency of extracted variables, this study uses the central coordinates of GEDI and ICESat-2 footprints as the reference for variable extraction, effectively solving the problem of resolution differences among SRTM data, GEDI data, ICESat-2 data, and neon data sources [69]. Taking these parameters as input variables of the random forest model, the effect of terrain on forest height is effectively mitigated.

2.3. Methods

In this study, the GEDI, ICESat-2, Landsat 9, and SRTM data were used to extract the characteristic index (see Table 6). Then, the multivariate importance analysis is carried out on these characteristic indexes to obtain the final characteristic variable set (see Table 7). By combining GEDI, ICESat-2, Landsat 9, and SRTM data, respectively, three models based on the single data source, one model based on GEDI, Landsat 9, and SRTM data source combination, and one model based on ICESat-2, Landsat 9, and SRTM data source combination were established, respectively, using the random forest regression algorithm. At last, the accuracy of the six inversion models is verified using airborne LiDAR data.

2.3.1. Importance Analysis of Feature Variables

The random forest algorithm can reduce the dimensionality of characteristic variables, thereby improving the computational efficiency of machine learning by screening variables [43]. Numerous studies have extensively shown the effectiveness of random forests in predicting forest attributes with remote sensing data [70]. In this study, the random forest method was used to analyze the importance of each characteristic variable and the first 90% of the characteristic variables with a cumulative contribution rate (the blue bars in Figure 4 represent the characteristic variables in the top 90% of the cumulative contribution rate) were screened out by calculating the importance scores of characteristic variables, and the contribution of each characteristic variable to each tree species in random forest set was evaluated. Then, the subsequent study of forest canopy height inversion was carried out.
In this study, the forest canopy height retrieval experiment was carried out based on different space-borne lidar data products after screening the feature variables by analyzing and evaluating the importance scores of the feature variables. In this study area, the GEDI model and Landsat 9 model of single data source and GEDI and Landsat 9 model of combined data source are established based on GEDI L2A products, ICESat-2 model and Landsat 9 model of single data source and ICESat-2 and Landsat 9 model of combined data source are established based on ICESat-2/ATLAS products (see Table 7). The two experiments established in this study fully evaluate the actual performance and benefit of Landsat 9 combined with different space-borne lidar data products for detailed forest canopy height retrieval so that the results of forest canopy height retrieval using multi-source remote sensing integration can be strictly analyzed and discussed.

2.3.2. Construction of the Forest Canopy Height-Retrieval Model

The random forest algorithm is capable of handling a large number of input variables, missing data, non-linear relationships, and data with non-Gaussian distributions. Random forest regression trees are a machine learning algorithm that has been successfully employed for modeling the structure and biomass of forests at regional and global scales [32,70]. In this study, we implemented a random forest regression tree ensemble method to estimate forest height.
The random forest regression model in the study [71] was trained based on the random forest regulator function of the Scikit-leam library [72] in Python. To minimize the unexpected error caused by splitting the training set and the test set, the final screened data set of the study area was split into the training set and the validation set using a ten-fold cross-validation method. The principle of cross-validation is to divide the data set into ten subsets and use one subset as the validation set and the rest as the training set. We change the verification subset in different iterations. The evaluation scores for each iteration are averaged to obtain a more robust evaluation score for the model. A 10-fold cross-validation method was used to find the optimal parameters (see Table 8) [73,74] and to build the optimal model.
According to the above information. The random forest canopy height inversion model was constructed using single and combined data sources. Table 8 lists the final parameter configurations for each model.

2.4. Accuracy Assessment

Utilizing the DTM and CHM data in NEON as reference data, terrain and canopy height were evaluated.
The correlation coefficient (R), root-mean-square error (RMSE), mean absolute error (MAE), and relative root-mean-square error (rRMSE) are calculated to measure the accuracy of each dataset in the model.
R = i = 1 n x i x ¯ y i y ¯ i = 1 n x i x ¯ 2 i = 1 n y i y ¯ 2
MAE = 1 n i = 1 n y i x i
RMSE = i = 1 n y i x i n 2
rRMSE = 100 % y ¯ i = 1 n y i x i n 2
where n represents the total number of samples, y i denotes the verification value, x i is the prediction value, y ¯ represents the average of the verification value, and x ¯ represents the average of the prediction value.

3. Results

3.1. Utilizing GEDI L2A Product for Forest Canopy Height Retrieval

In this study area, 5398 valid footprints screened using GEDI L2A products were used for accuracy validation to assess the ability of GEDI L2A data to accurately estimate forest canopy height. In addition to the six algorithms, GEDI L2A provides a default optimal algorithm for relative canopy height [75]. The study used a default algorithm to obtain a set of relative altitude metrics (RH90, RH92, RH94, RH96, RH98, and RH100) for each footprint in the study area and assess the correlation with the airborne radar RH90. Table 9 shows the correlation results between the two. Table 9 illustrates that the retrieval accuracy of forest canopy height increases gradually from RH90 to RH98 and then decreases from RH98 to RH100. It is estimated that RH98 of GEDI L2A can be used to retrieve canopy height. This is the same result as the GEDI forest canopy height values selected from the literature [76].
We extracted values from six different sets of algorithms for RH98 data in GEDI products to investigate the correlation between data from these six sets of RH98 algorithms and airborne data for reference forest heights RH90. The results are shown in Figure 5. Among the different algorithm groups, a4 and a5 have the worst accuracy of canopy height inversion, with RMSE values as high as 9.56 m. The a2 algorithm has the highest accuracy (R = 0.78, MAE = 3.19, RMSE = 4.94 m, and rRMSE = 22.60%). Therefore, the subsequent use of an a2 algorithm data inversion model construction.

3.2. Utilizing the ICESat-2 ATL08 Product for Forest Canopy Height Retrieval

We extracted a set of relative height metrics from each footprint in the study area in ICESat-2 ATL08 data: RH75, RH85, RH90, RH98, and RH100. For ATL03 data, we obtained signal photons by matching them to corresponding ATL08 data products, extracted forest height indicators for each 25-meter strip from these signal photons, and assessed correlation with reference forest height airborne radar RH90. Figure 6 shows the accuracy of ICESat-2/ATLAS-derived RH indicators versus reference forest heights. Of all relative height indicators, RH90 performed best in estimating forest height. Therefore, RH90 derived from ICESat-2 was selected as the forest height index. Figure 7 shows a scatter plot of the RH90 index versus reference forest height for ICE-Sat-2/ATLAS day and night data. The results show that the data collected at night have a strong ability to estimate forest height. In contrast, ICESat-2 data collected during the day are unsuitable for forest height retrieval because of their low accuracy. Therefore, the inversion model is constructed by using the night acquisition data.

3.3. Utilizing the Random Forest Regression Model for Forest Canopy Height Retrieval

After preprocessing different space-borne lidar data products in the study area, we use a random forest regression algorithm to construct a forest canopy height-retrieval model for different space-borne lidar data products in the study area.
Firstly, 5398 valid laser footprints were screened from GEDI data. To optimize the efficiency and accuracy of model training, refer to Table 6 and Table 7 mentioned above for the selected model input characteristic variables and parameters. In the experimental area, there are two models based on single data sources, the GEDI model and Landsat 9 model, and there is a model based on a combination of multiple data sources, the GEDI and Landsat 9 model. The inversion results are shown in Figure 8. In models using a single data source, GEDI outperformed Landsat 9 with an increase of 0.07 in R, a decrease of 0.69 in MAE, and a decrease of 0.51 m in RMSE. This is because GEDI data provide relative heights of forest canopy structure parameters that were more important than Landsat 9 feature variables in the previously performed variable importance analysis. Among models that combine data sources, we observe significant accuracy of the GEDI, Landsat 9, and SRTM combined models. Compared to the least accurate Landsat 9 model (R = 0.71, MAE = 3.45 m, RMSE = 4.86 m, and rRMSE = 22.12%), combined model based on GEDI, Landsat 9, and SRTM data (R = 0.92, MAE = 1.91 m, RMSE = 2.78 m, and rRMSE = 12.64%) R accuracy increased by 29.58%, MAE accuracy increased by 44.64%, RMSE accuracy increased by 42.80%, and rRMSE accuracy increased by 42.86%. Compared with the optimal model in the literature [43], the MAE accuracy of the combined model based on GEDI, Landsat 9, and SRTM data in this study is improved by 13.18%, and RMSE accuracy is improved by 10.61%. Overall, accuracy has been greatly improved.
Secondly, 1690 valid laser footprints were screened from ICESat-2/ATLAS data. To optimize the efficiency and accuracy of model training, refer to Table 6 and Table 7 mentioned above for the selected model input characteristic variables and parameters. In the experimental area, there are two models based on single data sources, the ICESat-2 model and Landsat 9 model, and one model based on a combination of multiple data sources, the ICESat-2 and Landsat 9 model. The inversion results are shown in Figure 9. ICESat-2 outperformed Landsat 9 in models using a single data source with an increase of 0.24 in R, a decrease of 0.88 in MAE, and a decrease of 0.98 m in RMSE. This is because ICESat-2 data provide relative heights of forest canopy structure parameters that were more important than Landsat 9 feature variables in the previously performed variable importance analysis. Among models that combine data sources, we observe significant accuracy of ICESat-2, Landsat 9, and SRTM combined models. Compared to the least accurate Landsat 9 model (R = 0.46, MAE = 3.42 m, RMSE = 5.02 m, and rRMSE = 22.21%), combined model based on ICESat-2, Landsat 9, and SRTM data (R = 0.89, MAE = 1.84 m, RMSE = 2.54 m, and rRMSE = 10.75%) R accuracy increased by 93.48%, MAE accuracy increased by 46.20%, RMSE accuracy increased by 49.40%, and rRMSE accuracy increased by 51.60%. Compared with the optimal model in the literature [43], the MAE accuracy of the combined model based on ICESat-2, Landsat 9, and SRTM data in this study is improved by 16.36%, and the RMSE accuracy is improved by 18.33%. Overall, accuracy has been greatly improved.

4. Discussion

We analyzed and evaluated different space-borne lidar data products against reference canopy heights (90th percentile of airborne radar canopy height data). For GEDI data, we compared the reference canopy height to the data generated using the default algorithm obtained from the GEDI waveform return metrics. We analyzed six relative height (RH) measurements from RH90 to RH100 and found that RH98 had the highest correlation with reference canopy height. This finding is consistent with the findings of the study [76]. However, contrary to the results of Potapov et al. [38], RH95 was identified by Potapov et al. as the most representative measure of true forest height. A possible explanation for this discrepancy is that Potapov et al.’s study included not only forests but also other land cover types such as shrubs, grasslands, wetlands, cropland, and tundra in their canopy height estimates, resulting in an underestimation of forest height. GEDI L2A version 2 product data provides six different algorithms to improve the accuracy of GEDI L2A data by adjusting three elements of the waveform signal (smoothing width, start threshold, and end threshold) [75]. Comparing the RH98 data with reference canopy height data, it is found that Algorithm 4 and Algorithm 5 have insufficient accuracy in canopy height inversion, while Algorithm 2 has the highest accuracy. Table 5 shows that Algorithm 4 has the highest waveform start and end thresholds, while Algorithm 5 has the lowest waveform start and end thresholds. Therefore, it can be inferred that both too-short and too-long waveform lengths will result in reduced accuracy of canopy height retrieval.
For ICESat-2/ATLAS data, we extracted five forest height indicators for each 25 m band from signal photons obtained by matching ATL03 to corresponding ATL08 data products and validated them against reference canopy heights, finding the highest correlation between RH90 measurements and reference canopy heights. We then assessed the correlation between RH90 indicators and reference canopy heights in ICESat-2/ATLAS data collected during the daytime and at night in the study area. The results show that the data collected at night have a strong ability to estimate forest height. In contrast, the ICESat-2 data collected during the day are unsuitable for forest height retrieval due to their low accuracy, which may be caused by the fact that the data collected at night are not affected by excessive solar noise [77].
Multivariate analysis was performed on the characteristic variables of different space-borne lidar data products in the study area using a random forest model to determine the relative importance of each variable and select the final set of variables that fit the model. The results show that topographic factors are key variables for canopy height inversion (see Figure 10). This finding is consistent with discussions in the literature [22]. Terrain and slope are identified as the primary influencers affecting terrain height-retrieval accuracy, while canopy height emerges as the foremost error factor in canopy height retrieval. Therefore, terrain factors were added in subsequent model retrieval to improve the accuracy. The Landsat satellite series does not allow for direct measurement of forest canopy height. However, relevant feature variables can be obtained through multispectral data, which, when combined with other related variables, can estimate forest canopy height. The accuracy of the model hinges on the presence and quality of reference data, the temporal and spatial coherence of optical data, and the geographic scale of its application [38]. The literature [69] confirms the significant role of vegetation indices in estimating forest canopy height, establishing a relationship between the two using Landsat remote sensing imagery. In this study, we conducted a multivariate analysis using random forest to select the feature variables obtained from multispectral data.
Six models were established based on different space-borne lidar data products using a random forest regression algorithm to retrieve canopy height in the experimental area. In GEDI data and ICESat-2/ATLAS data, the multivariate analysis in Figure 4 shows that the importance scores for the characteristic variables of Lansat 9 data are relatively lower than the relative heights of GEDI and ICESat-2. The accuracy of canopy height retrieved using the Landsat 9 model is lower than the GEDI model and ICESat-2 model. This may be because canopy height is the main error factor in canopy height retrieval, and multispectral data cannot be measured directly.
In this study, we effectively utilized random forest regression models in machine learning to estimate forest canopy height by integrating GEDI L2A data from space-borne lidar, Landsat 9 data from optical remote sensing images and topographic factors, and ICESat-2 data from space-borne lidar, Landsat 9 data from optical remote sensing images and topographic factors, respectively. The integrated inversion results of multi-source remote sensing data are satisfactory. Compared with the results of tree height inversion using the BP neural network for multi-source remote sensing data integration in the literature [43], the inversion accuracy of two multi-source remote sensing data integration in this study is improved by 10.61% and 18.33%, respectively. The results highlight the superiority of forest canopy height inversion using the random forest regression model in this study. However, the inherent uncertainty of the random forest regression model makes it prone to overfitting at larger decision tree depths, while smaller depths may result in underfitting. Therefore, it is necessary to further strengthen parameter optimization in future research.

5. Conclusions

This study evaluated the effectiveness of multi-source remote sensing data integration and machine learning methods to improve canopy height-retrieval capabilities. In boreal temperate forest ecosystems, Landsat 9 data were combined with GEDI and ICESat-2 data representing space-borne lidar, respectively, to generate forest canopy height inversion models using random forest regression algorithms. In addition, the influence of various factors on the retrieval accuracy of canopy height is analyzed.
The study based on data from different space-borne lidar products shows that (1) the accuracy of canopy height estimation from ICESat-2 and GEDI data varies with data acquisition schemes. Among the six algorithms for GEDI L2A second edition data, the a2 algorithm estimates CHM with the best accuracy. The ICESat-2 ATL08 sixth edition data acquisition accuracy at night is better than the data collected during the day. (2) A random forest algorithm was used to rank the importance of the characteristic variables. It was found that terrain factors were important factors affecting the accuracy of canopy height retrieval, and canopy height had the greatest influence on canopy height retrieval. (3) The random forest model of multi-source remote sensing data integration is always better than the single-source model, and the Landsat 9 model is lower than the GEDI model and ICESat-2 model. The RMSE of GEDI and Landsat 9 and SRTM is 2.78 m. Compared with the GEDI model, the precision of the GEDI and Landsat 9 and SRTM models is improved by 35.95% and 42.80%, respectively. The RMSE of ICESat-2 and Landsat 9 and SRTM is 2.54 m. Compared with the ICESat-2 model, the precision of ICESat-2 and Landsat 9 and SRTM is 37.13% higher, and compared with the Landsat 9 model, the precision of ICESat-2 and Landsat 9 and SRTM is 49.40%.
In summary, this study is a preliminary exploration of the integration method of optical remote sensing Landsat 9 data with space-borne lidar products GEDI and ICE-Sat-2 of different models. The results will provide valuable insights into the practical performance and benefits of future use of Landsat 9 data in combination with other data for detailed forest canopy height retrieval, highlighting the enormous benefits of combining multispectral data and topographical factors when analyzing forest ecosystems. They provide important insights for future efforts in this field, highlighting in particular the effectiveness of combining different space-borne lidar products and Landsat 9 data separately with advanced machine learning algorithms. Nevertheless, the study acknowledges certain limitations, including limitations on site selection. Future research should focus on broader, more diverse global site selection and data accumulation, ideally incorporating GEDI and ICESat-2 data alongside other datasets like the Sentinel-3 and GaoFen-7 data. This expanded approach would likely yield more comprehensive insights for regional-scale forest monitoring.

Author Contributions

Conceptualization, W.Z. and Y.L.; methodology, W.Z. and Y.L.; software, Y.L.; validation, N.H., X.Z. and Z.Z.; investigation, K.L. and Z.Q.; formal analysis, W.Z. and K.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L. and W.Z.; writing—review and editing, Y.L. and K.L.; visualization, Y.L.; supervision, W.Z.; project administration, W.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (grant no. 42371441) and the scientific innovation program project by the Shanghai Committee of Science and Technology (grant no. 20dz1206501).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used in this research for remote sensing are openly available and free of charge. You can access the GEDI data at https://www.earthdata.nasa.gov (accessed on 12 October 2023). Landsat 9 data and SRTM data can be obtained through the USGS Earth Resources Observation and Science (EROS) Center at https://earthexplorer.usgs.gov (accessed on 12 October 2023). Airborne Laser Scan data from the National Ecological Observatory Network (NEON) in the United States can be accessed at https://data.neonscience.org/data-products/DP3.30024.001 (accessed on 12 October 2023).

Acknowledgments

All authors are grateful to NASA and NEON for providing the GEDI, SRTM, and Landsat 9 data and to the editors and reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Le Quéré, C.; Andrew, R.M.; Canadell, J.G.; Sitch, S.; Korsbakken, J.I.; Peters, G.P.; Manning, A.C.; Boden, T.A.; Tans, P.P.; Houghton, R.A. Global carbon budget 2016. Earth Syst. Sci. Data 2016, 8, 605–649. [Google Scholar] [CrossRef]
  2. Malhi, Y.; Franklin, J.; Seddon, N.; Solan, M.; Turner, M.G.; Field, C.B.; Knowlton, N. Climate change and ecosystems: Threats, opportunities and solutions. Philos. Trans. R. Soc. B Biol. Sci. 2020, 375, 20190104. [Google Scholar] [CrossRef]
  3. Meir, P.; Cox, P.; Grace, J. The influence of terrestrial ecosystems on climate. Trends Ecol. Evol. 2006, 21, 254–260. [Google Scholar] [CrossRef]
  4. Rustad, L.E. The response of terrestrial ecosystems to global climate change: Towards an integrated approach. Sci. Total Environ. 2008, 404, 222–235. [Google Scholar] [CrossRef]
  5. Sun, Y.; Wang, C.; Chen, H.Y.H.; Liu, Q.; Ge, B.; Tang, B. A global meta-analysis on the responses of C and N concentrations to warming in terrestrial ecosystems. Catena 2022, 208, 105762. [Google Scholar] [CrossRef]
  6. Wang, Y.-S.; Gu, J.-D. Ecological responses, adaptation and mechanisms of mangrove wetland ecosystem to global climate change and anthropogenic activities. Int. Biodeterior. Biodegrad. 2021, 162, 105248. [Google Scholar] [CrossRef]
  7. Baccini, A.; Walker, W.; Carvalho, L.; Farina, M.; Sulla-Menashe, D.; Houghton, R.A. Tropical forests are a net carbon source based on aboveground measurements of gain and loss. Science 2017, 358, 230–234. [Google Scholar] [CrossRef]
  8. Lagomasino, D.; Fatoyinbo, T.; Lee, S.-K.; Simard, M. High-resolution forest canopy height estimation in an African blue carbon ecosystem. Remote Sens. Ecol. Conserv. 2015, 1, 51–60. [Google Scholar] [CrossRef] [PubMed]
  9. Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-resolution mapping of forest canopy height using machine learning by coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102163. [Google Scholar] [CrossRef]
  10. Vaglio Laurin, G.; Ding, J.; Disney, M.; Bartholomeus, H.; Herold, M.; Papale, D.; Valentini, R. Tree height in tropical forest as measured by different ground, proximal, and remote sensing instruments, and impacts on above ground biomass estimates. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101899. [Google Scholar] [CrossRef]
  11. Swinfield, T.; Lindsell, J.A.; Williams, J.V.; Harrison, R.D.; Agustiono; Habibi; Gemita, E.; Schönlieb, C.B.; Coomes, D.A. Accurate Measurement of Tropical Forest Canopy Heights and Aboveground Carbon Using Structure From Motion. Remote Sens. 2019, 11, 928. [Google Scholar] [CrossRef]
  12. Liu, C.; Wang, S. Estimating tree canopy height in densely forest-covered mountainous areas using Gedi spaceborne full-waveform data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, V-1-2022, 25–32. [Google Scholar] [CrossRef]
  13. Wang, Q.; Ni-Meister, W. Forest Canopy Height and Gaps from Multiangular BRDF, Assessed with Airborne LiDAR Data. Remote Sens. 2019, 11, 2566. [Google Scholar] [CrossRef]
  14. He, X.Y.; Ren, C.Y.; Chen, L.; Wang, Z.; Zheng, H. The Progress of Forest Ecosystems Monitoring with Remote Sensing Techniques. Sci. Geogr. Sin 2018, 38, 997–1011. [Google Scholar] [CrossRef]
  15. Roy, D.P.; Wulder, M.A.; Loveland, T.R.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Helder, D.; Irons, J.R.; Johnson, D.M.; Kennedy, R.; et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ. 2014, 145, 154–172. [Google Scholar] [CrossRef]
  16. Showstack, R. Landsat 9 Satellite Continues Half-Century of Earth Observations: Eyes in the sky serve as a valuable tool for stewardship. Biosci. J. 2022, 72, 226–232. [Google Scholar] [CrossRef]
  17. Hemati, M.; Hasanlou, M.; Mahdianpari, M.; Mohammadimanesh, F. A Systematic Review of Landsat Data for Change Detection Applications: 50 Years of Monitoring the Earth. Remote Sens. 2021, 13, 2869. [Google Scholar] [CrossRef]
  18. Wulder, M.A.; Roy, D.P.; Radeloff, V.C.; Loveland, T.R.; Anderson, M.C.; Johnson, D.M.; Healey, S.; Zhu, Z.; Scambos, T.A.; Pahlevan, N.; et al. Fifty years of Landsat science and impacts. Remote Sens. Environ. 2022, 280, 113195. [Google Scholar] [CrossRef]
  19. Masek, J.G.; Wulder, M.A.; Markham, B.; McCorkel, J.; Crawford, C.J.; Storey, J.; Jenstrom, D.T. Landsat 9: Empowering open science and applications through continuity. Remote Sens. Environ. 2020, 248, 111968. [Google Scholar] [CrossRef]
  20. Healey, S.P.; Yang, Z.; Gorelick, N.; Ilyushchenko, S. Highly Local Model Calibration with a New GEDI LiDAR Asset on Google Earth Engine Reduces Landsat Forest Height Signal Saturation. Remote Sens. 2020, 12, 2840. [Google Scholar] [CrossRef]
  21. David, R.M.; Rosser, N.J.; Donoghue, D.N.M. Improving above ground biomass estimates of Southern Africa dryland forests by combining Sentinel-1 SAR and Sentinel-2 multispectral imagery. Remote Sens. Environ. 2022, 282, 113232. [Google Scholar] [CrossRef]
  22. Liu, A.; Cheng, X.; Chen, Z. Performance evaluation of GEDI and ICESat-2 laser altimeter data for terrain and canopy height retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
  23. Wake, S.; Ramos-Izquierdo, L.A.; Eegholm, B.; Dogoda, P.; Denny, Z.; Hersh, M.; Mulloney, M.; Thomes, W.J.; Ott, M.N.; Jakeman, H. Optical system design and integration of the Global Ecosystem Dynamics Investigation Lidar. In Proceedings of the Infrared Remote Sensing and Instrumentation XXVII, San Diego, CA, USA, 12–14 August 2019; pp. 99–111. [Google Scholar]
  24. Dubayah, R.; Blair, J.B.; Goetz, S.; Fatoyinbo, L.; Hansen, M.; Healey, S.; Hofton, M.; Hurtt, G.; Kellner, J.; Luthcke, S.; et al. The Global Ecosystem Dynamics Investigation: High-resolution laser ranging of the Earth’s forests and topography. Sci. Remote Sens. 2020, 1, 100002. [Google Scholar] [CrossRef]
  25. Hancock, S.; Armston, J.; Hofton, M.; Sun, X.; Tang, H.; Duncanson, L.I.; Kellner, J.R.; Dubayah, R. The GEDI Simulator: A Large-Footprint Waveform Lidar Simulator for Calibration and Validation of Spaceborne Missions. Earth Space Sci. 2019, 6, 294–310. [Google Scholar] [CrossRef] [PubMed]
  26. Patterson, P.L.; Healey, S.P.; Ståhl, G.; Saarela, S.; Holm, S.; Andersen, H.-E.; Dubayah, R.O.; Duncanson, L.; Hancock, S.; Armston, J.; et al. Statistical properties of hybrid estimators proposed for GEDI—NASA’s global ecosystem dynamics investigation. Environ. Res. Lett. 2019, 14, 065007. [Google Scholar] [CrossRef]
  27. Abdalati, W.; Zwally, H.J.; Bindschadler, R.; Csatho, B.; Farrell, S.L.; Fricker, H.A.; Harding, D.; Kwok, R.; Lefsky, M.; Markus, T.; et al. The ICESat-2 Laser Altimetry Mission. Proc. IEEE 2010, 98, 735–751. [Google Scholar] [CrossRef]
  28. Coyle, D.B.; Paul, R.S.; Furqan, L.C.; Erich, F.; Demetrios, P. The Global Ecosystem Dynamics Investigation (GEDI) Lidar laser transmitter. In Proceedings of the Infrared Remote Sensing and Instrumentation XXVII, San Diego, CA, USA, 12–14 August 2019; p. 111280L. [Google Scholar]
  29. Adam, M.; Urbazaev, M.; Dubois, C.; Schmullius, C. Accuracy Assessment of GEDI Terrain Elevation and Canopy Height Estimates in European Temperate Forests: Influence of Environmental and Acquisition Parameters. Remote Sens. 2020, 12, 3948. [Google Scholar] [CrossRef]
  30. Dhargay, S.; Lyell, C.S.; Brown, T.P.; Inbar, A.; Sheridan, G.J.; Lane, P.N.J. Performance of GEDI Space-Borne LiDAR for Quantifying Structural Variation in the Temperate Forests of South-Eastern Australia. Remote Sens. 2022, 14, 3615. [Google Scholar] [CrossRef]
  31. Lahssini, K.; Baghdadi, N.; le Maire, G.; Fayad, I. Influence of GEDI Acquisition and Processing Parameters on Canopy Height Estimates over Tropical Forests. Remote Sens. 2022, 14, 6264. [Google Scholar] [CrossRef]
  32. Caughlin, T.T.; Rifai, S.W.; Graves, S.J.; Asner, G.P.; Bohlman, S.A. Integrating LiDAR-derived tree height and Landsat satellite reflectance to estimate forest regrowth in a tropical agricultural landscape. Remote Sens. Ecol. Conserv. 2016, 2, 190–203. [Google Scholar] [CrossRef]
  33. Ceccherini, G.; Girardello, M.; Beck, P.S.A.; Migliavacca, M.; Duveiller, G.; Dubois, G.; Avitabile, V.; Battistella, L.; Barredo, J.I.; Cescatti, A. Spaceborne LiDAR reveals the effectiveness of European Protected Areas in conserving forest height and vertical structure. Commun. Earth Environ. 2023, 4, 97. [Google Scholar] [CrossRef]
  34. Gu, C.; Clevers, J.G.P.W.; Liu, X.; Tian, X.; Li, Z.; Li, Z. Predicting forest height using the GOST, Landsat 7 ETM+, and airborne LiDAR for sloping terrains in the Greater Khingan Mountains of China. ISPRS-J. Photogramm. Remote Sens. 2018, 137, 97–111. [Google Scholar] [CrossRef]
  35. Qi, W.; Lee, S.-K.; Hancock, S.; Luthcke, S.; Tang, H.; Armston, J.; Dubayah, R. Improved forest height estimation by fusion of simulated GEDI Lidar data and TanDEM-X InSAR data. Remote Sens. Environ. 2019, 221, 621–634. [Google Scholar] [CrossRef]
  36. Potapov, P.; Tyukavina, A.; Turubanova, S.; Talero, Y.; Hernandez-Serna, A.; Hansen, M.C.; Saah, D.; Tenneson, K.; Poortinga, A.; Aekakkararungroj, A.; et al. Annual continuous fields of woody vegetation structure in the Lower Mekong region from 2000–2017 Landsat time-series. Remote Sens. Environ. 2019, 232, 111278. [Google Scholar] [CrossRef]
  37. Tyukavina, A.; Baccini, A.; Hansen, M.C.; Potapov, P.V.; Stehman, S.V.; Houghton, R.A.; Krylov, A.M.; Turubanova, S.; Goetz, S.J. Aboveground carbon loss in natural and managed tropical forests from 2000 to 2012. Environ. Res. Lett. 2015, 10, 074002. [Google Scholar] [CrossRef]
  38. Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E.; et al. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
  39. Wang, J.; Liu, D.; Quiring, S.M.; Qin, R. Estimating canopy height change using machine learning by coupling WorldView-2 stereo imagery with Landsat-7 data. Int. J. Remote Sens. 2023, 44, 631–645. [Google Scholar] [CrossRef]
  40. Ren, C.; Jiang, H.; Xi, Y.; Liu, P.; Li, H. Quantifying Temperate Forest Diversity by Integrating GEDI LiDAR and Multi-Temporal Sentinel-2 Imagery. Remote Sens. 2023, 15, 375. [Google Scholar] [CrossRef]
  41. Xi, Z.; Xu, H.; Xing, Y.; Gong, W.; Chen, G.; Yang, S. Forest Canopy Height Mapping by Synergizing ICESat-2, Sentinel-1, Sentinel-2 and Topographic Information Based on Machine Learning Methods. Remote Sens. 2022, 14, 364. [Google Scholar] [CrossRef]
  42. Guan, X.; Yang, X.; Yu, Y.; Pan, Y.; Dong, H.; Yang, T. Canopy-Height and Stand-Age Estimation in Northeast China at Sub-Compartment Level Using Multi-Resource Remote Sensing Data. Remote Sens. 2023, 15, 3738. [Google Scholar] [CrossRef]
  43. Zhu, W.; Yang, F.; Qiu, Z.; He, N.; Zhu, X.; Li, Y.; Xu, Y.; Lu, Z. Enhancing Forest Canopy Height Retrieval: Insights from Integrated GEDI and Landsat Data Analysis. Sustainability 2023, 15, 10434. [Google Scholar] [CrossRef]
  44. Raup, H.M. Some Problems in Ecological Theory and their Relation to Conservation. J. Anim. Ecol. 1964, 33, 19–28. [Google Scholar] [CrossRef]
  45. SanClements, M.; Lee, R.H.; Ayres, E.D.; Goodman, K.; Jones, M.; Durden, D.; Thibault, K.; Zulueta, R.; Roberti, J.; Lunch, C.; et al. Collaborating with NEON. Biosci. J. 2020, 70, 107. [Google Scholar] [CrossRef]
  46. Hutsler, T.; Pricope, N.G.; Gao, P.; Rother, M.T. Detecting Woody Plants in Southern Arizona Using Data from the National Ecological Observatory Network (NEON). Remote Sens. 2023, 15, 98. [Google Scholar] [CrossRef]
  47. Johnson, B.R.; Kuester, M.A.; Kampe, T.U.; Keller, M. National ecological observatory network (NEON) airborne remote measurements of vegetation canopy biochemistry and structure. In Proceedings of the 2010 IEEE International Geoscience and Remote Sensing Symposium, Honolulu, HI, USA, 25–30 July 2010; pp. 2079–2082. [Google Scholar]
  48. Kampe, T.; Johnson, B.; Kuester, M.; Keller, M. NEON: The first continental-scale ecological observatory with airborne remote sensing of vegetation canopy biochemistry and structure. In Proceedings of the Remote Sensing and Modeling of Ecosystems for Sustainability VI, San Diego, CA, USA, 2 August 2009. [Google Scholar]
  49. Scholl, V.M.; Cattau, M.E.; Joseph, M.B.; Balch, J.K. Integrating National Ecological Observatory Network (NEON) Airborne Remote Sensing and In-Situ Data for Optimal Tree Species Classification. Remote Sens. 2020, 12, 1414. [Google Scholar] [CrossRef]
  50. Wang, C.; Jia, D.; Lei, S.; Numata, I.; Tian, L. Accuracy Assessment and Impact Factor Analysis of GEDI Leaf Area Index Product in Temperate Forest. Remote Sens. 2023, 15, 1535. [Google Scholar] [CrossRef]
  51. NEON (National Ecological Observatory Network). Elevation—LiDAR (DP3.30024.001), RELEASE-2023. Available online: https://data.neonscience.org/data-products/DP3.30024.001/RELEASE-2023 (accessed on 12 October 2023).
  52. Rishmawi, K.; Huang, C.; Schleeweis, K.; Zhan, X. Integration of VIIRS Observations with GEDI-Lidar Measurements to Monitor Forest Structure Dynamics from 2013 to 2020 across the Conterminous United States. Remote Sens. 2022, 14, 2320. [Google Scholar] [CrossRef]
  53. Dubayah, R.; Armston, J.; Healey, S.P.; Bruening, J.M.; Patterson, P.L.; Kellner, J.R.; Duncanson, L.; Saarela, S.; Ståhl, G.; Yang, Z.; et al. GEDI launches a new era of biomass inference from space. Environ. Res. Lett. 2022, 17, 095001. [Google Scholar] [CrossRef]
  54. Oliveira, P.V.; Zhang, X.; Peterson, B.; Ometto, J.P. Using simulated GEDI waveforms to evaluate the effects of beam sensitivity and terrain slope on GEDI L2A relative height metrics over the Brazilian Amazon Forest. Sci. Remote Sens. 2023, 7, 100083. [Google Scholar] [CrossRef]
  55. Markus, T.; Neumann, T.; Martino, A.; Abdalati, W.; Brunt, K.; Csatho, B.; Farrell, S.; Fricker, H.; Gardner, A.; Harding, D.; et al. The Ice, Cloud, and land Elevation Satellite-2 (ICESat-2): Science requirements, concept, and implementation. Remote Sens. Environ. 2017, 190, 260–273. [Google Scholar] [CrossRef]
  56. Lin, X.; Xu, M.; Cao, C.; Dang, Y.; Bashir, B.; Xie, B.; Huang, Z. Estimates of Forest Canopy Height Using a Combination of ICESat-2/ATLAS Data and Stereo-Photogrammetry. Remote Sens. 2020, 12, 3649. [Google Scholar] [CrossRef]
  57. Mulverhill, C.; Coops, N.C.; Hermosilla, T.; White, J.C.; Wulder, M.A. Evaluating ICESat-2 for monitoring, modeling, and update of large area forest canopy height products. Remote Sens. Environ. 2022, 271, 112919. [Google Scholar] [CrossRef]
  58. Neuenschwander, A.; Pitts, K. The ATL08 land and vegetation product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
  59. Narine, L.L.; Popescu, S.C.; Malambo, L. Using ICESat-2 to Estimate and Map Forest Aboveground Biomass: A First Example. Remote Sens. 2020, 12, 1824. [Google Scholar] [CrossRef]
  60. Wang, C.; Jia, M.; Chen, N.; Wang, W. Long-Term Surface Water Dynamics Analysis Based on Landsat Imagery and the Google Earth Engine Platform: A Case Study in the Middle Yangtze River Basin. Remote Sens. 2018, 10, 1635. [Google Scholar] [CrossRef]
  61. Gerardo, R.; de Lima, I.P. Comparing the Capability of Sentinel-2 and Landsat 9 Imagery for Mapping Water and Sandbars in the River Bed of the Lower Tagus River (Portugal). Remote Sens. 2023, 15, 1927. [Google Scholar] [CrossRef]
  62. Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
  63. López-Serrano, P.M.; Cárdenas Domínguez, J.L.; Corral-Rivas, J.J.; Jiménez, E.; López-Sánchez, C.A.; Vega-Nieva, D.J. Modeling of Aboveground Biomass with Landsat 8 OLI and Machine Learning in Temperate Forests. Forests 2020, 11, 11. [Google Scholar] [CrossRef]
  64. Trier, Ø.D.; Salberg, A.-B.; Haarpaintner, J.; Aarsten, D.; Gobakken, T.; Næsset, E. Multi-sensor forest vegetation height mapping methods for Tanzania. Eur. J. Remote Sens. 2018, 51, 587–606. [Google Scholar] [CrossRef]
  65. Zhang, Y.; Wang, R. Estimation of aboveground biomass of vegetation based on landsat 8 OLI images. Heliyon 2022, 8, e11099. [Google Scholar] [CrossRef]
  66. Rennó, C.D.; Nobre, A.D.; Cuartas, L.A.; Soares, J.V.; Hodnett, M.G.; Tomasella, J.; Waterloo, M.J. HAND, a new terrain descriptor using SRTM-DEM: Mapping terra-firme rainforest environments in Amazonia. Remote Sens. Environ. 2008, 112, 3469–3481. [Google Scholar] [CrossRef]
  67. Rodríguez, E.; Morris, C.S.; Belz, J.E. A Global Assessment of the SRTM Performance. Photogramm. Eng. Remote Sens. 2006, 72, 249–260. [Google Scholar] [CrossRef]
  68. Yang, L.; Meng, X.; Zhang, X. SRTM DEM and its application advances. Int. J. Remote Sens. 2011, 32, 3875–3896. [Google Scholar] [CrossRef]
  69. Pascual, C.; Cohen, W.; García-Abril, A.; Arroyo, L.A.; Valbuena, R.; Martí-Fernández, S.; Manzanera, J.A.; Hill, R.; Rosette, J.; Suárez, J. Mean height and variability of height derived from lidar data and Landsat images relationship. In Proceedings of the SilviLaser 2008, 8th International Conference on LiDAR Applications in Forest Assessment and Inventory, Edinburgh, UK, 17–19 September 2008; pp. 517–525. [Google Scholar]
  70. Guerra-Hernández, J.; Narine, L.L.; Pascual, A.; Gonzalez-Ferreiro, E.; Botequim, B.; Malambo, L.; Neuenschwander, A.; Popescu, S.C.; Godinho, S. Aboveground biomass mapping by integrating ICESat-2, SENTINEL-1, SENTINEL-2, ALOS2/PALSAR2, and topographic information in Mediterranean forests. GISci. Remote Sens. 2022, 59, 1509–1533. [Google Scholar] [CrossRef]
  71. Teoh, T.T.; Rong, Z. Regression. In Artificial Intelligence with Python; Teoh, T.T., Rong, Z., Eds.; Springer: Singapore, 2022; pp. 163–181. [Google Scholar]
  72. Bisong, E. Introduction to Scikit-learn. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Bisong, E., Ed.; Apress: Berkeley, CA, USA, 2019; pp. 215–229. [Google Scholar]
  73. Malakouti, S.M.; Menhaj, M.B.; Suratgar, A.A. The usage of 10-fold cross-validation and grid search to enhance ML methods performance in solar farm power generation prediction. Clean. Eng. Technol. 2023, 15, 100664. [Google Scholar] [CrossRef]
  74. Liu, X.W.; Long, Z.L.; Zhang, W.; Yang, L.M. Key feature space for predicting the glass-forming ability of amorphous alloys revealed by gradient boosted decision trees model. J. Alloys Compd. 2022, 901, 163606. [Google Scholar] [CrossRef]
  75. Liu, L.; Wang, C.; Nie, S.; Zhu, X.; Xi, X.; Wang, J. Analysis of the influence of different algorithms of GEDI L2A on the accuracy of ground elevation and forest canopy height. J. Univ. Chin. Acad. Sci. 2021, 39, 502–511. [Google Scholar] [CrossRef]
  76. Lang, N.; Kalischek, N.; Armston, J.; Schindler, K.; Dubayah, R.; Wegner, J.D. Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles. Remote Sens. Environ. 2022, 268, 112760. [Google Scholar] [CrossRef]
  77. Neuenschwander, A.; Guenther, E.; White, J.C.; Duncanson, L.; Montesano, P. Validation of ICESat-2 terrain and canopy heights in boreal forests. Remote Sens. Environ. 2020, 251, 112110. [Google Scholar] [CrossRef]
Figure 1. The locations of study sites in the US and GEDI and ICESat-2/ATLAS ground tracks. (a) Study sites in Worcester County, Massachusetts, and (b) study site in Harvard Forest, Worcester County, Massachusetts, with coverage of satellite ground tracks.
Figure 1. The locations of study sites in the US and GEDI and ICESat-2/ATLAS ground tracks. (a) Study sites in Worcester County, Massachusetts, and (b) study site in Harvard Forest, Worcester County, Massachusetts, with coverage of satellite ground tracks.
Sustainability 16 01735 g001
Figure 2. GEDI mulitbeam sampling mode.
Figure 2. GEDI mulitbeam sampling mode.
Sustainability 16 01735 g002
Figure 3. Annual image synthesis of Landsat 9 image dataset.
Figure 3. Annual image synthesis of Landsat 9 image dataset.
Sustainability 16 01735 g003
Figure 4. Based on the feature variables extracted from different space-borne lidar data, the importance ranking and sorting were carried out. (a) Importance ranking of GEDI feature variables. (b) Importance ranking of Landsat 9 feature variables corresponding to GEDI. (c) Importance ranking of all feature variables of GEDI and Landsat 9 data. (d) Importance ranking of ICESat-2 feature variables. (e) Landsat 9 feature variable importance ranking corresponding to ICESat-2. (f) Landsat 9 feature variable importance ranking of ICESat-2 and Landsat 9 data.
Figure 4. Based on the feature variables extracted from different space-borne lidar data, the importance ranking and sorting were carried out. (a) Importance ranking of GEDI feature variables. (b) Importance ranking of Landsat 9 feature variables corresponding to GEDI. (c) Importance ranking of all feature variables of GEDI and Landsat 9 data. (d) Importance ranking of ICESat-2 feature variables. (e) Landsat 9 feature variable importance ranking corresponding to ICESat-2. (f) Landsat 9 feature variable importance ranking of ICESat-2 and Landsat 9 data.
Sustainability 16 01735 g004aSustainability 16 01735 g004b
Figure 5. Airborne lidar scatter plot of forest canopy height (RH98) and reference forest height (RH90) extracted using six different GEDI algorithms (Dashed line is baseline, red is fit line, (a1a6): six algorithms representing GEDI L2A data).
Figure 5. Airborne lidar scatter plot of forest canopy height (RH98) and reference forest height (RH90) extracted using six different GEDI algorithms (Dashed line is baseline, red is fit line, (a1a6): six algorithms representing GEDI L2A data).
Sustainability 16 01735 g005
Figure 6. To assess the accuracy of ICESat-2 ATL08 forest canopy height indicators and forest canopy heights of reference forests (Dashed line is baseline, red is fit line).
Figure 6. To assess the accuracy of ICESat-2 ATL08 forest canopy height indicators and forest canopy heights of reference forests (Dashed line is baseline, red is fit line).
Sustainability 16 01735 g006
Figure 7. Scatter plot of ICESat-2 ATL08 forest canopy height (RH90) versus airborne LiDAR at reference forest height (RH90) at different collection times (Dashed line is baseline, red is fit line).
Figure 7. Scatter plot of ICESat-2 ATL08 forest canopy height (RH90) versus airborne LiDAR at reference forest height (RH90) at different collection times (Dashed line is baseline, red is fit line).
Sustainability 16 01735 g007
Figure 8. Retrieval of canopy height with the random forest regressing model: (a) GEDI L2A; (b) Landsat 9; and (c) the combined GEDI and Landsat 9 model (Dashed line is baseline, red is fit line).
Figure 8. Retrieval of canopy height with the random forest regressing model: (a) GEDI L2A; (b) Landsat 9; and (c) the combined GEDI and Landsat 9 model (Dashed line is baseline, red is fit line).
Sustainability 16 01735 g008
Figure 9. Retrieval of canopy height with the random forest regressing model: (a) ICESat-2/ATLAS; (b) Landsat 9; and (c) the combined ICESat-2 and Landsat 9 model (Dashed line is baseline, red is fit line).
Figure 9. Retrieval of canopy height with the random forest regressing model: (a) ICESat-2/ATLAS; (b) Landsat 9; and (c) the combined ICESat-2 and Landsat 9 model (Dashed line is baseline, red is fit line).
Sustainability 16 01735 g009
Figure 10. Importance score of each variable in the canopy height model based on different model space-borne lidar products.
Figure 10. Importance score of each variable in the canopy height model based on different model space-borne lidar products.
Sustainability 16 01735 g010
Table 1. Specifications of the NEON airborne lidar instrument and its derived products [22].
Table 1. Specifications of the NEON airborne lidar instrument and its derived products [22].
NEON Airborne Lidar Instrument and Product Specifications
Laser wavelength1064 nm (near IR)Elevation
accuracy
5–35 cm
Laser power250 µJDerived productsDTM and CHM (in the format of 1 km by 1 km mosaic tiles)
Laser repetition
rate
33–167 kHzProduct resolutionuniform grid (1 m × 1 m)
footprint diameter0.25 m (at 1000 m flying height), 0.8 m in wide beam-divergence modeTerrain parameterselevation and slope
Sampling density1–4 points per square meterVertical datumGEOID12A
Horizontal
accuracy
5–15 cmCanopy parameterscanopy top height, relative height (RH), and canopy cover
Table 2. GEDI parameters name and size [22,24,52].
Table 2. GEDI parameters name and size [22,24,52].
Parameters NameParameters SizeParameters NameParameters Size
Track height~400 kmFootprint25 m
Coverage51.6° N~51.6° SGeolocation error8 m
repetition
rate
242 HzAlong-track distances60 m
Pulse width15 nsAcross-track distances600 m
Wavelength1064 nmProduct tested in this studyL2A (elevation and canopy heights)
Table 3. The parameter configurations for GEDI L2A’s six algorithms with σ being utilized to denote the standard deviation of the background noise level [29,31].
Table 3. The parameter configurations for GEDI L2A’s six algorithms with σ being utilized to denote the standard deviation of the background noise level [29,31].
Algorithm
Setting Group
Smoothing
Width (Noise)
Smoothing
Width (Signal)
Waveform Signal Start ThresholdWaveform Signal End Threshold
a16.5σ6.5σ
a26.5σ3.5σ
a36.5σ3.5σ
a46.5σ6.5σ
a56.5σ3.5σ
a66.5σ3.5σ
Table 4. ICESat-2 data acquisition time.
Table 4. ICESat-2 data acquisition time.
Data TypeAcquisition TimeNumber of Documents
ICESat-2 ATL082022/01/01–2022/12/314
ICESat-2 ATL032022/01/01–2022/12/314
Table 5. Landsat 9 operational land imager-2 parameters [61].
Table 5. Landsat 9 operational land imager-2 parameters [61].
Band No.BandBand Range/µmSpatial Resolution/m
B1Coastal0.43~0.4530
B2Blue0.45~0.5130
B3Gree0.53~0.5930
B4Red0.64~0.6730
B5NIR0.85~0.8830
B6SWIR11.57~1.6530
B7SWIR22.11~2.2930
B8Pan0.50~0.6815
B9Cirrus1.36~1.3930
Table 6. Summary of characteristic variables calculated from GEDI L2A, ICESat-2, Landsat 9, and SRTM data [62,63,64,65].
Table 6. Summary of characteristic variables calculated from GEDI L2A, ICESat-2, Landsat 9, and SRTM data [62,63,64,65].
TypeCharacteristic VariableDescription
GEDI L2ARH25, RH50, RH60, RH75, RH85, RH90, RH98, and RH100GEDI extracted relative elevation (25th, 50th, 60th,75th, 85th, 90th, 98th, and 100th)
ICESat-2RH25, RH50, RH60, RH75, RH85, RH90, RH98, and RH100ICESat-2 extracted relative elevation (25th, 50th, 60th, 75th, 85th, 90th, 98th, and 100th)
Landsat 9B1, B2, B3, B4, B5, B6, and B7Landsat 9 bands 1, 2, 3, 4, 5, 6, and 7
B24 B 2 / B 4
B74 B 7 / B 4
B76 B 7 / B 6
B345 B 3 B 4 / B 5
EVI 2.5 B 5 B 4 B 5 + 6 B 4 7.5 B 2 + 1
DVI B 5 B 4
SLAVI B 5 B 4 + B 7
VI3 B 5 B 6 / B 5 + B 6
PVI 0.355 B 5 0.149 B 4 2 + 0.355 B 4 0.852 B 5 2
NDVI B 5 B 4 / B 5 + B 4
RDVI B 5 B 4 B 5 + B 4
ND43 B 4 B 3 B 4 + B 3
ND67 B 6 B 7 B 6 + B 7
PC1, PC2, and PC3The first to third bands of principal component analysis.
TCBThe tassel cap transforms the Brightness band.
TCGThe tassel cap transforms the green band.
TCWThe tassel cap transforms the wetness band.
MNF1, MNF2, MNF3, and MNF4Minimum noise fraction first to fourth band.
SRTMelevationElevation extracted from DEM.
slopeSlope extracted from DEM.
aspectAspect extracted from DEM.
Table 7. The feature variables utilized in the final experimental modeling process [62,63,64,65].
Table 7. The feature variables utilized in the final experimental modeling process [62,63,64,65].
ModelCharacteristic VariableNumber of Characteristic Variables
GEDIrh98, rh100, rh90, rh85, rh25, and rh756
Landsat 9MNF2, B3, TCW, B74, B7, MNF3, B6, ND43, MNF4, B76, EVI, B1, B24, ND67, B2, B345, B4, pc3, and VI319
GEDI and Landsat 9rh98, rh100, MNF2, rh85, rh90, B3, MNF3, B74, MNF4, rh25, TCW, B24, EVI, B7, B76, B4 ND43, rh75, B345, and rh6020
ICESat-2rh85, rh90, rh75, rh98, rh25, rh100, and rh607
Landsat 9B7, B3, MNF2, MNF4, MNF3, TCW, pc3, B74, ND43, B1, B6, B2, B76, B24, EVI, B4, pc2, ND67, VI3, B345, TCB, and SLAVI22
ICESat-2 and Landsat 9rh85, rh90, rh75, B3, MNF2, B7, rh98, rh60, rh100, B76, MNF3, rh25, TCW, MNF4, B24, B74, EVI, ND43, B4, B1, ND67, pc3, and rh5023
Table 8. Optimal parameters of the random forest model from different data sources.
Table 8. Optimal parameters of the random forest model from different data sources.
DATA GROUPParameter
n_estimatorsmax_depthmin_samples_splitmin_samples_leafmax_features
GEDI28815150.96
Landsat 931217260.89
GEDI and
Landsat 9
30015160.97
ICESat-228012160.92
Landsat 931016270.89
ICESat-2 and Landsat 920014150.90
Table 9. The accuracy of forest canopy and reference airborne lidar forest canopy was assessed using default algorithmic methods in GEDI L2A products.
Table 9. The accuracy of forest canopy and reference airborne lidar forest canopy was assessed using default algorithmic methods in GEDI L2A products.
Relative Height (RH)RMAE/mRMSE/mrRMSEN
RH900.744.376.1628.21%5398
RH920.753.935.6725.94%5398
RH940.773.555.2423.98%5398
RH960.783.274.9622.71%5398
RH980.783.224.9422.59%5398
RH1000.784.005.6325.76%5398
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, W.; Li, Y.; Luan, K.; Qiu, Z.; He, N.; Zhu, X.; Zou, Z. Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration. Sustainability 2024, 16, 1735. https://doi.org/10.3390/su16051735

AMA Style

Zhu W, Li Y, Luan K, Qiu Z, He N, Zhu X, Zou Z. Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration. Sustainability. 2024; 16(5):1735. https://doi.org/10.3390/su16051735

Chicago/Turabian Style

Zhu, Weidong, Yaqin Li, Kuifeng Luan, Zhenge Qiu, Naiying He, Xiaolong Zhu, and Ziya Zou. 2024. "Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration" Sustainability 16, no. 5: 1735. https://doi.org/10.3390/su16051735

APA Style

Zhu, W., Li, Y., Luan, K., Qiu, Z., He, N., Zhu, X., & Zou, Z. (2024). Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration. Sustainability, 16(5), 1735. https://doi.org/10.3390/su16051735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop