**1. Introduction**

Forests host unique tree species diversities, which support key ecosystem services such as nutrient cycles, head-water conservation, and biomass estimation [1]. Forest diversity is changing in response to climate change, soil erosion, species introductions and more [2]. In addition, forest productivity increases with tree species richness, and higher tree species diversity provides more food options for wildlife. Thus, developing effective technology is urgently needed for mapping forest diversity distribution over large areas to assess their current states and carrying capacity for animal populations [3].

Forest diversity is typically assessed by botanical surveys of the woods and metrics related to their species diversity (i.e., richness, Simpson, and Pielou diversity) [2,4]. Traditionally, forest diversity is calculated by counting the number and types of trees, which is an expensive, time-consuming process. Additionally, due to accuracy problems and difficulty in recognizing intertwined tree species, such a strategy is difficult to implement in large (e.g., hundreds of hectares) forest communities [5]. The challenges are more significant in natural forests with dense canopies. Remote sensing techniques have shown great potential

Qingdao 266520, China

for large-scale estimations of forest diversity and have been successfully used to estimate species diversity of subtropical and tropical forest ecosystems [6,7]. However, contemporary remote sensing-based approaches to estimate forest diversity vary with regard to the satellite data and machine learning models deployed. Plant richness of herbaceous ecosystems has been assessed using hyperspectral imagery by Oldeland et al. [3]. Nagendra et al. [7] used IKONOS and Landsat images to estimate forest species richness and diversity in central India. Stenzel et al. [8] used multi-seasonal, multi-spectral remote sensing data (RapidEye) to map ecological regions with high species richness. Almeida et al. [9] used hyperspectral images and airborne LiDAR data to assess the structure and diversity of restoration plantings. Clearly, rich spectral information plays an important role in species richness. However, these remote-sensing data are limited by area coverage, weather conditions, high costs, and acquisition time [10], making it challenging to develop detailed maps of forest diversity across large areas. Currently, commonly used methods for estimating forests diversity based on remote sensing data are extrapolated by using field data collected. Leutner et al. [11] examined the relationship between remotely sensed and field data, and mapped α- and β-diversity in the Yucatan Peninsula by using a regression kriging procedure. Hakkenberg et al. [12] predicted floristic diversity at different spatial scales using nonparametric models trained with spatially nested field plots and aerial LiDAR-hyperspectral data. Chrysafis et al. [13] developed a workflow to obtain tree diversity maps with machine learning algorithms using multispectral and multi-seasonal Sentinel-2 images and geodiversity data at the regional scale. The most important process in these methods is to extract features from remote sensing data, which are spectral indices or LiDAR-based metrics highly relevant to forest diversity, and then using these features as a set of mixed variables for regression analysis. Although these methods have achieved good prediction accuracy, it is unclear which types of algorithms are more effective in estimating forest diversity.

Sentinel-2 satellite data with 10 m spatial resolution has large spatial coverage, short acquisition time, and rich spectral bands that offer unprecedented opportunities to estimate tree species diversity [14]. The phenological differences of plant communities can be captured by their high temporal resolution and used as metrics to calculate plant diversity [15]. Detailed spectral information is related to plant biochemical composition, canopy structure, and leaf morphology characteristics, specifically for red-edge wavelengths [16]. Then, being available for free, they can be used to process large areas and complement field surveys at a reduced cost [17]. Sentinel-2 imagery has achieved good performance in mapping tree species classification [15], vegetation phenology monitoring [18], and forest aboveground biomass [19]. However, estimating tree species diversity is still lacking, especially in temperate mixed forests. Additionally, since April 2019, the NASA Global Ecosystem Dynamics Investigation (GEDI), a spaceborne LiDAR sensor in the International Space Station, has acquired footprint data with an average diameter of 25 m [20]. GEDI is a full waveform LiDAR that was created with the purpose of detecting vegetation structure [21] and provides an unprecedented sampling density, which could be an ideal structure parameter for estimating forest diversity [22]. Potapov et al. [23] combined GEDI LiDAR and Landsat to produce a global tree height map at a 30 m resolution. Liang et al. [24] quantified aboveground biomass dynamics of charcoal degradation in Mozambique using GEDI LiDAR and Landsat. These studies provide promising examples for the potential of GEDI-Sentinel data fusion to estimate forest diversity continuously across large extents.

In this study, GEDI LiDAR data and multi-temporal Sentinel-2 images were integrated to estimate forest diversity at the pixel level within natural forests in northeast China. Specifically, this study aims to: (1) quantify the relationships between forest diversity and variables from Sentinel-2 and GEDI LiDAR, (2) explore the effective algorithm for high precision mapping of forest diversity, and (3) map forest diversity by using GEDI LiDAR data and Sentinel-2 images for forest ecological assessment.
