2.3.3. GEDI LiDAR Data

GEDI LiDAR L2B data were obtained from NASA Land Processes Distributed Active Archive Center (https://search.earthdata.nasa.gov/search, accessed on 21 October 2022) in 2019–2021, matching the region of study. The GEDI instrument acquired structural information, such as canopy height metrics, vertical profiles, and surface topography, by analyzing the amount of energy returned by various tree components at different heights above the ground [38]. In this study, the foliage height diversity (FHD) and plant area index (PAI) were extracted from 154,371 observations from GEDI L2B. The FHD index is a plant structural measure that describes the vertical heterogeneity of the foliage profile (Table 3) [39]. The PAI, which comprises various plant components (stem, branches, and leaves), is the one-sided area of plant material surface per unit ground surface area [39]. Considering the changes in forest structure caused by phenological differences, we differentiated the two metrics as growing season and non-growing season. Considering the signal-to-noise ratio of the waveform, the sensitivity of a GEDI footprint shows the dense canopy cover that can be penetrated. Thus, we excluded footprints with sensitivity less than 0.9. After filtering out these invalid observations, 62,593 pairs of FHD and PAI were used for further processing.


**Table 3.** Characteristics of data used in this study.

To obtain spatially continuous FHD and PAI, we used inverse distance weighting (IDW) interpolation to achieve wall-to-wall diversity mapping. The IDW, as a global interpolation, is usually used for sample datasets that are uniformly distributed and dense enough to reflect local differences [40]. Measured values closest to the predicted location have a greater effect on the predicted value than those farther away, resulting in sensitivity of IDW interpolation to outliers and sampling configurations (i.e., clustering and isolation points) [41]. Thus, we randomly select dense GEDI points until these points are uniformly distributed throughout the study area. Then, we selected 80% of GEDI points for interpolation and parameter optimization and applied the remaining sample data (20%) for validation until the correlation coefficient was higher than 0.8.

#### **3. Methods**

#### *3.1. Variable Importance Assessment*

Selecting the most important variables from high-dimensional datasets is beneficial in improving efficiency and reducing model overfitting. In this study, Boosted Regression Tree (BRT) and Mean Decrease Gini (MDG) algorithms were used to evaluate the importance of independent variables. MDG indicates the contribution of each variable to the homogeneity of the nodes and leaves in the resulting random forest, while BRT evaluates variable performance by iteratively fitting and combining multiple regression tree models [42]. Both algorithms are capable of ingesting multiple classes of predicted variables to model

complex interactions without making assumptions about variable interactions and have been widely used in ecological and remote sensing research [43].
