1. Introduction
Chlorophyll is an important biochemical index of plants. The chlorophyll content can directly reflect the photosynthetic capacity, nitrogen fixation capacity and health status of vegetation, which is an important index to evaluate the growth and stand yield of a forest [
1]. Accurately estimating the forest’s chlorophyll content is helpful to further understand the forest’s ecosystem functions and the forest’s health status. Bamboo forests are very important in maintaining forest ecosystems’ balance, mitigating global warming and promoting carbon sequestration [
2]. Among bamboo forests,
Dendrocalamus giganteus (
D. giganteus) is one of the largest bamboo species in the world. Therefore, the chlorophyll content retrieval of
D. giganteus at regional scale is the key to grasp the growth status and ecosystem functions of
D. giganteus. Traditionally, a spectrophotometer is used to measure chlorophyll content in a laboratory. This method destroys plants and results in the loss of chlorophyll content during transportation [
3]. Moreover, the traditional method operates at specific leaf scales with sample point data, which is limited in forestry application. Nowadays, non-destructive, real-time, efficient and accurate RS technology can greatly enhance the efficiency and improve the precision of vegetation chlorophyll content retrieval at a regional scale by combining it with ground survey data. Meanwhile, numerous studies have shown that using RS technology combined with ground survey data to estimate vegetation chlorophyll content at a regional scale is both effective and feasible [
4,
5,
6].
At present, the most used methodology for chlorophyll content inversion is optical RS data (such as the Landsat series of the United States; [
7,
8,
9]), while there are few studies on the inversion of chlorophyll content with spaceborne LiDAR GEDI. The reflectance and spectral index of the optical sensor are sensitive to the horizontal structure of chlorophyll content. Chlorophyll mainly absorbs red light and blue-violet light, and the maximum absorption wavelength range is 420~663 nm, which has a strong reflection effect on green light [
10,
11]. However, the acquisition of Landsat 8 RS images is susceptible to climate and saturation problems [
12,
13,
14], and previous studies have mainly focused on extracting spectral features from RS images [
15,
16], as well as on exploring the relationship between the vegetation index and the plant chlorophyll concentration, ignoring texture features, which are conducive to improving interpretation accuracy [
9]. Based on multi-spectral RS satellite Landsat 5 and Landsat 8 data, Yang Y et al. [
9] used RF regression to construct an inversion model of texture features and spectral indices with the lake chlorophyll-a concentration. The research shows that texture features have a significant correlation with the lake chlorophyll-a concentration. At the same time, it also suggests that RS image texture features have good potential in estimating biochemical indicators. Compared with ICESat-2 (Ice Cloud Land Elevation Satellite-2), GEDI has the advantage of wide coverage. Compared with the above RS satellites, the advantage is that the emitted laser beam has a certain penetration ability through the forest canopy. The forest canopy’s three-dimensional structure and the underlying topography are determined using echo waveform data. The deficiency lies in the difficulty of space-borne LiDAR in obtaining the horizontal structure parameters of the forest, and due to the discreteness of the data, the forest information is incomplete. At present, GEDI data are mostly used in the inversion of forest canopy height, leaf area index, biomass, etc. Xu L et al. [
17] used GEDI spaceborne LiDAR data to estimate the biomass of oak forests in Shangri-La. The research shows that modis_treecover, rv, sensitivity and oak tree biomass have obvious correlations, and the oak forests biomass estimation model established by random forest has the best accuracy. At the same time, it shows that the series of indicators contained in GEDI L2B product data can also show good model interpretation accuracy with the forest structure parameters and the vegetation biochemical parameters.
From the perspective of domestic and foreign research, it is rare to use optical RS data and spaceborne LiDAR data to estimate the chlorophyll content of bamboo. Therefore, in this study, Landsat 8 data and spaceborne LiDAR GEDI data were used to work together. Using the unique optical properties of chlorophyll, the statistical relationship between the chlorophyll content and RS characteristic bands, as well as the statistical relationship between the GEDI L2B product data indicators and the chlorophyll content were analyzed. The inversion model was established by RF to improve the inversion accuracy of the chlorophyll content.
Currently, the inversion methods of the chlorophyll content based on RS data mainly involve empirical models, physical models and coupling models. The empirical model method (including the parametric model and the non-parametric model) is convenient, fast and easy to operate, exhibiting high efficiency and ideal accuracy, but the optimization of feature combination needs further study [
11,
18]. The coupling model involves coupling the empirical model and the physical model, which can maximize the advantages of the statistical model, but the operation is time-consuming and inefficient [
19]. Machine learning algorithms, as a novel modeling approach, are not constrained by a fixed model framework. They have the capability to iteratively learn from feedback errors during the model correction process, enhancing the understanding of the intricate relationship between independent and dependent variables [
20]. The chlorophyll content estimation based on the machine learning algorithm can be divided into the following two processes: (1) The analysis the relationships between the chlorophyll concentration and characteristic variables. (2) The calculation of the chlorophyll concentration by using the function relationship [
21]. The machine learning algorithm usually shows a good chlorophyll concentration inversion effect because it can solve high-dimensional nonlinear problems [
22,
23]. For example, models such as RF [
24], neural network [
25] and the genetic algorithm-optimized simplified support vector machine (GA-SVM) have been proven to perform well in chlorophyll-a estimation [
26]. RF is one of the most popular machine learning methods, and its model performs better compared to other machine learning methods in forest biochemical parameter estimation [
24,
25,
26,
27]. Although there have been many studies on RS monitoring of chlorophyll concentration using machine learning algorithms [
26], a general method has not been proposed to achieve long-term monitoring of chlorophyll concentration, and the inversion model has poor universality [
27]. However, so far, the cooperative operation of RS satellite and spaceborne LiDAR, along with the use of the more mature RF algorithm to estimate the chlorophyll content of
D. giganteus, is rare in forestry applications. Real-time, fast and accurate inversion of the chlorophyll content has become one of the urgent problems for forestry researchers.
Therefore, the goal of this study is to use Xinping County, Yunnan Province, which has a large number of large, clumped D. giganteus, as the primary test area, and machine learning technology to estimate the chlorophyll content, as well as to evaluate the potential of multi-source RS data collaboration in chlorophyll content inversion. The specific objectives of this study are the following: (1) to derive a model for retrieving chlorophyll content from a single D. giganteus plant; (2) to establish an optimal model for inverting the chlorophyll content of D. giganteus; (3) to create a distribution map of D. giganteus plants in the study area, utilizing the optimal attributes derived from multispectral Landsat 8 and LiDAR GEDI satellite data. The feasibility of estimating the chlorophyll content of D. giganteus by the collaborative operation of multi-RS data was evaluated, which provided a feasible reference for the inversion of chlorophyll content of D. giganteus at medium and large regional scales.
4. Discussion
In this study, we studied the collaborative operation of multi-source RS data to estimate the chlorophyll content of D. giganteus. The power function model of the chlorophyll content in single plants of D. giganteus was established, and then the chlorophyll content of each plot was calculated by the single plant model, and the chlorophyll content in the study area was estimated by the RF machine learning algorithm. This can effectively reduce costs, improve efficiency and estimation accuracy, and provide a reference for the long-term monitoring of vegetation chlorophyll content. Combining spaceborne (such as GEDI, ICESat-2) and optical (such as Landsat, Sentinel) multi-source RS data offers researchers a novel research approach to estimate vegetation chlorophyll content. The primary challenge encountered when estimating the chlorophyll content with the GEDI data and Landsat 8 OLI data was addressing the discreteness of GEDI footprint points and resolving the resolution mismatch between GEDI and Landsat 8 images. Therefore, this study aims to solve these difficulties to improve the accuracy and precision of chlorophyll content inversion. Addressing the above problems shows that there is great potential for multi-source RS data to collaboratively estimate the chlorophyll content at the county scale and provides a reference for application at medium and large regional scales.
4.1. The Potential of Multi-Source RS Data to Estimate Chlorophyll Content
The single RS data can no longer meet the estimation requirements of medium- and large-scale chlorophyll content retrieval. Optical RS is susceptible to light saturation effects [
18,
43]. In addition to the high cost of acquisition, high-resolution images (such as GF, QuickBird, IKONOS) are quite different from the space-borne LiDAR GEDI data in terms of resolution [
47]; that is, discrepancies in the pixel scale among image data and between the pixel scale and the plot scale can introduce uncertainties in the estimation of chlorophyll content, thereby impacting the accuracy of the estimation. Although the spaceborne LiDAR is less affected by the light saturation effect and can obtain the vertical structure information of the forest, its light-spot data are discrete and discontinuous in space [
47,
48]. In view of the above situation, this study used the satellite RS data Landsat 8 OLI and the spaceborne LiDAR data GEDI L2B to carry out the inversion of the chlorophyll content in the study area, which made up for the limitations of using a single data source to invert chlorophyll content, and also solved the problem of large differences between pixel scales and between pixel scales and sample scales.
In view of the discrete and discontinuous distribution of spaceborne LiDAR GEDI L2B product data, this study used a more mature geostatistical method for OK interpolation. In order to ensure the validity of the model fitting results, the effective light spots were divided according to the ratio of 8:2 before OK interpolation; that is, 80% of the light-spot data were used as the training set, and the remaining 20% of the light-spot data were used as the verification set for OK interpolation results. Firstly, the semi-variance function analysis was carried out on 80% of the training set data [
49], and the most common linear, spherical, exponential and Gaussian models were used to fit the semi-variance function. In order to ensure the accuracy of the model, the model with strong spatial autocorrelation (a nugget effect less than 25%), the highest coefficient of determination (
R2) and the smallest residual (
RSS) was selected as the optimal semi-variance function model [
50]. In this study, except for the linear model, the spherical, exponential and Gaussian models of pai, pgap_theta and pgap_theta_a3 all satisfied the optimal range of a nugget effect of less than 25%. According to the principle of maximum
R2 and minimum
RSS, the exponential model of pai and pgap_theta is selected as the optimal semi-variance function model. In addition, considering the above factors, according to the principle of maximum
R2, pgap_theta_a3 finally chooses the exponential model as the optimal semi-variance function model. Secondly, OK interpolation is performed under ArcGIS 10.8. Finally, the interpolation results are verified by the cross-validation method. According to the index evaluation principle of Bostan P A et al. [
39], the pai, pgap_theta and pgap_theta_a3 selected in this study all had values of
ME and
MSE close to 0. The values of
RMSE and
ASE were close to each other, and the
RMSSE value was close to 1. The
R2 value was 0.63~0.71. This result is consistent with the verification results of Bargaoui et al. [
51] and Qiao et al. [
52], both of which are used to study biomass. Therefore, this study not only solves the discreteness of GEDI light-spot data, but also confirms the feasibility of OK interpolation. From
Figure 9, it can be seen that the collaborative modeling effect of two RS data sources is better and more accurate than the model estimated by the variables of a single data source. The
R2 of the model increased from 0.81~0.89 to 0.94,
RMSE decreased from 0.09~0.12 g/m
2 to 0.08 g/m
2, and
P increased from 80.19%~82.45% to 83.32%, laying the foundation for accurately estimating the chlorophyll status of the forest area and further understanding the function of the forest ecosystem and the health status of the forest. In addition, this study also confirmed that GEDI L2B data can be used not only for the estimation and inversion of structural parameters such as closing degree, biomass and carbon storage, but also for the estimation and inversion of biochemical parameters such as chlorophyll content. This provides a research case for the estimation and inversion of mesoscale and large-scale chlorophyll contents and provides a scientific basis for the health monitoring of global forest ecosystems.
4.2. Analysis of the Influence of Parameter Selection on Model Accuracy
In this study, the parameter selection includes the independent variable selection of the chlorophyll content model of a single plant and the independent variable selection of the chlorophyll content estimation model at the regional scale. The selection of independent variables for the chlorophyll content model of a single plant of
D. giganteus is almost unexplored in previous studies, but scholars have performed similar research in the field of tree biomass. There is a significant correlation between the above-ground biomass of trees and their diameter at breast height. For field measurements, there is usually a large error in the measurement results of tree height, so height is not a better modeling parameter [
50,
53]. In order to avoid the error caused by including the height of
D. giganteus and to improve the efficiency and feasibility of field measurement, it was found that the regression model with DBH as a single variable could more accurately reflect the trend in the aboveground biomass of different bamboo varieties [
53,
54,
55]. Therefore, in this study, the DBH of
D. giganteus was used as a single variable to establish a chlorophyll content model of
D. giganteus. Compared with the traditional destructive sampling and estimation of chlorophyll content, the model established by the allometric growth equation has better universality, providing a favorable reference value for estimating the chlorophyll content of large, clustered
D. giganteus and for forest health monitoring in the future.
Aiming at the selection of independent variables for the estimation model of the chlorophyll content of
D. giganteus at the regional scale, a large number of previous studies have been limited to the study of single-band reflectance and band combination information in chlorophyll inversion [
7]. These studies have often ignored the texture feature information in RS images, which is conducive to improving interpretation accuracy. In addition, they have overlooked the application of GEDI L2B RS data, which contains rich feature information useful in chlorophyll content inversion. From the results of this study (
Figure 6), the correlation between texture features and chlorophyll content is better than the relationship between the vegetation index, single-band reflectance and chlorophyll content, being consistent with the results of Yang Y et al. [
9]. The model established by GEDI L2B feature parameters is more accurate than the model established by Landsat 8 OLI feature parameters. As shown in
Figure 9c,f, the model
R2 established by GEDI L2B feature parameters is 0.89, while the model
R2 established by Landsat 8 OLI feature parameters is 0.81, indicating that the optical RS data themselves have light saturation effects. At the same time, this study also confirms that GEDI L2B product data are not limited to the study of tree biomass and carbon storage. They can also be used in the application of mesoscale and large-scale chlorophyll content inversion.
4.3. Model Selection in Uncertainty Evaluation of Chlorophyll Content Estimation Accuracy
This study involves selecting a basic model for estimating the chlorophyll content in individual
D. giganteus plants and a regional-scale model for estimating chlorophyll content. The basic model of chlorophyll content per plant of
D. giganteus chooses the allometric growth equation as its basic model. In previous studies, the allometric growth equation was mostly used to estimate the biomass, net output productivity and biogeochemical cycle budget in forest ecosystems [
56]. A small number of scholars used to establish the regression model of the relationship between the leaf area index and the DBH of a single tree to predict the change process of productivity of
Ribinia pseudoacacia forest [
57]. From
Figure 4, it can be seen that there is a significant allometric growth relationship between the independent variable DBH and the dependent variable chlorophyll content per plant. It is indicated that the allometric growth equation can also be used to estimate the chlorophyll content of individual plants by the basic model of chlorophyll content for a single plant.
Aiming at the selection of chlorophyll estimation models for regional scale
D. giganteus, this study comprehensively considers whether the selected model matches the number of known samples. Because the representativeness of the selected model are related to the number of modeling samples, the higher the number of modeling samples is, the more representative the estimation model is, and the uncertainty will also decrease. However, with the increase in the number of model samples, when the number of model samples reaches a certain critical value, the accuracy of the estimation model will no longer change significantly. Therefore, in order to save manpower, material and financial resources, and to meet the small sample principle (30) and the accuracy requirements of field investigation [
54], this study investigated 35 measured sample plots for RS modeling research. According to the previous research on the estimation of chlorophyll content, the accuracy of the chlorophyll content estimated by the more mature RF algorithm model is higher than that of other common parameter models (such as partial least squares model or the multiple linear regression model) and non-parametric models (such as the SVM model or K nearest neighbor algorithm) [
24,
58]. The results of this study indicate that the RF model provides the most accurate estimation of chlorophyll content. Therefore, it was selected as an RS model to estimate the chlorophyll content of
D. giganteus in Xinping County. The chlorophyll content of
D. giganteus in the study area was 0.24~1.02 g/m
2. At present, there are few studies on the chlorophyll content of bamboo plants, especially on the chlorophyll content of
D. giganteus. Therefore, compared with the results of Jin et al. [
59] on the RS estimation of total chlorophyll content in wheat leaves, where
R2 was 0.868 and
RMSE was 0.384 g/m
2, the estimation accuracy of this study is higher. Compared with the results of Richardson et al. [
60] and Gitelson et al. [
61], which only studied the chlorophyll content of single leaves of higher plants, this study extrapolated the chlorophyll content of single plant leaves to the RS estimation of chlorophyll content in the study area, which provided an important reference value for the assessment of forest health and for the scientific management of forest resources. In addition, for the three levels of
D. giganteus chlorophyll content, from the highest to the lowest, the number of samples was 14, 14 and 7, respectively, and the proportion of graded pixels to the total pixels was 22.55%, 66.93% and 10.62%, respectively (
Table 8). These values indicate that the distribution of samples and pixels was relatively reasonable. At the same time, this reflects the representativeness of sampling and the rationality of the modeling results, so as to reduce the uncertainty and error transmission caused by sampling.
4.4. Limitations of Estimation of Chlorophyll Content in D. giganteus
In this study, the size of the sample plot was 30 m × 30 m, so the Landsat 8 data with a resolution of 30 m and the GEDI data with a spot footprint radius of 25 m were selected. In order to ensure consistency, this study resampled the spatial resolution of GEDI data to 30 m by Kriging interpolation [
8]. Although the GEDI L2B has chlorophyll-related leaf area index, canopy cover and waveform vegetation energy values, its spot footprint points are discontinuous, and the amount of data is large. In order to solve this problem, the poor quality spots need to be eliminated before Kriging interpolation [
62] and then interpolated into surface data to obtain the characteristic parameter information of the whole study area.
Furthermore, this study only uses three common estimation models to estimate the chlorophyll content of D. giganteus. In future research, according to the different number of samples, an appropriate parameter model and an optimized non-parametric model can be selected as the estimation models for chlorophyll content, and the parameter selection can be taken into account. Assessing whether there is anti-interference between the parameters of the two RS data sources makes the model better than single RS data modeling. Then, the selection of the optimal model as the inversion model for chlorophyll content is performed. For the demand of high-precision, high-resolution and large-scale chlorophyll content inversion, high-resolution GF, IKONOS, QuickBird, Sentinel-2 and hyperspectral data can be combined with GEDI or ICESat-2 data and finally modeled and inverted with a unified resolution.