*5.1. Prospects of GEDI LiDAR and Sentinel-2 Data on Forest Diversity*

In this study, we succeeded in estimating forest diversity in a mixed broadleaf-conifer forest, using multi-temporal Sentinel-2 and GEDI LiDAR data. This suggests promising potential for LiDAR data and optical images, combined with machine-learning approach, to estimate forest species diversity over large areas. Such a method would greatly improve conservation and management of forest resources. GEDI LiDAR data uses the reflected laser energy within ~25 m footprints to determine the height, canopy cover, and vertical distribution of plant material. This study is the first to apply the GEDI-derived FHD metrics to forest diversity estimation, our results demonstrate the importance of FHD metrics in future diversity studies. In forest ecology, a high FHD value typically indicates a more complex forest structure (e.g., caused by multiple canopy layers). Structure differences across tree species provide a different directional gap probability, which underlies the LiDAR-based estimations of forest diversity and were confirmed by the direct correlations between tree species diversity by indices (*H* , λ and *J* ) and GEDI-derived FHD and PAI indices (Figure 6). Therefore, GEDI LiDAR data will become one of the most important parameters in forest diversity estimation. Nonetheless, we argue that it is difficult to achieve good performance using only GEDI data. Our study demonstrated that combined remote sensing data sources were better than GEDI LiDAR data or Sentinel-2 images alone in explaining tree species diversity. The higher explanatory power of the combined data sources was attributed to the full utilization of vegetation properties (vegetation structure information, biochemical properties, and phenological variability).

**Figure 6.** Coefficients of determination (R2) between measured diversity indices and GEDI LiDAR indices.

Unique spectral responses are caused by differences in the physical and chemical characteristics of various tree species, which is the main driver of forest diversity estimation. Compared to band features, vegetation indices (NDVI, NDWI, EVI, and SAVI) were more significantly correlated with forest diversity (*H* , λ and *J* ). These results coincide with those reported by Madonsela et al. [2]. Vegetation indices enhance the spectral information from vegetation while limiting the spectral reflectance from non-vegetative characteristics [49]. This is also proven in Figure 7: The correlation coefficient between predicted *H* index and vegetation indices in the fall season is significantly higher than that of the band features. Variability in vegetation indices is caused by a variety of vegetation properties, such as photosynthetic pigments, biomass, and structural carbohydrates [50]. Thus, it is unsurprising that vegetation indices have a significant relationship with forest diversity indices (*H* , λ and *J* ). Additionally, the value of Red-Edge, NIR, and SWIR bands for

estimating plant diversity has been demonstrated in previous studies by Sothe et al. [51] and Grabska et al. [52]. This study also confirmed the importance of these bands using the BRT and MDG algorithm (see Figure 2). This success is attributed to the rich spectral band setting in Sentinel-2, for example, NIR and SWIR bands are sensitive to water content, lignin, starch, and nitrogen [53]. In addition, we noticed that the correlation coefficients of growing season and non-growing seasons showed a great gap, especially for spectral features. Seasonal variations in canopy structure and biochemical characteristics among several tree species were captured by the spectral values and vegetation indices. These differences provide important references for estimating forest diversity in various forest environments.

**Figure 7.** The correlation coefficient between predicted *H* index and Sentinel-2 derived feature variables. \*\*: Significant correlation (*p* < 0.01), \*: Significant correlation (*p* < 0.05).

#### *5.2. Machine Learning Algorithms for Forest Diversity Mapping*

Four different types of machine-learning algorithms were used to estimate forest diversity indices, with three of the diversity indices used having their own variable selection. Our results showed that RF and SVM models provided the highest estimation accuracy in terms of the highest R2, the lower RMSE, and MAE. This was confirmed by the KNN and LR models. The RF classifier, as an ensemble approach, consisted of a number of tree classifiers, which reduces overfitting impacts and has been the most often used in remote sensing tasks [54]. Similarly, SVMs are a high-performance method designed to solve nonlinear problems using various kernel functions, such as the radial basis function [55]. The solid performance of RF and SVM models were confirmed in other studies [56,57]. For λ and *J* indices, RF has the best prediction result, while in the *H* index, the SVM model is best. The kernel-based algorithm (e.g., SVM) is prone to overfitting when presented with an extreme value that cannot be identified in the sample [57]. In contrast, tree-based algorithms (e.g., RF) seem to be more resistant to overfitting, though they do not fit as well as kernel-based algorithms [58].
