1. Introduction
As a basis of estimating forest carbon storage and evaluating the contribution to the forest carbon cycle, forest biomass plays an important role in forest ecosystems, and aboveground biomass (AGB) accounts for a large proportion of it [
1]. Therefore, the accurate quantification of forest AGB is of great significance to forest managers. Accurate biomass measurement requires the cutting down of individual trees and drying and weighting them. However, this is not practically feasible for a large forest area. Alternatively, developing allometric biomass equations using the diameters at breast height (DBH) and tree height (H) of some representative sample trees as independent variables is another method. Many studies have also proved that DBH used as the predictor in the tree biomass model performed adequately well [
2,
3]. Allometric equations are also used to estimate the biomasses of different tree components (henceforth tree-components). Different tree-components are widely used. For example, bark can be used as a soil conditioner [
4,
5], and foliage plays an important role in protecting soil from water erosion and maintaining biodiversity [
6,
7]. Thus, quickly and easily estimating tree-component biomass is also an important part of forest management. However, it can be time-consuming and costly while destructively measuring tree variables in a large forest area.
Light detection and ranging (LiDAR) technology has been applied to forestry research since the mid-1980s. Previous studies found that the LiDAR system can estimate tree height [
8,
9]. MacLean and Krabill [
10] found that tree canopy volume could be estimated using the laser radar reconstruction of the canopy profile. Subsequently, many studies demonstrated that the LiDAR system could accurately estimate forest parameters, such as basal area, stock volume, and biomass [
11,
12,
13,
14,
15]. For instance, the estimation of biomass using LiDAR is primarily divided into two aspects: plot-level and individual-tree level. The plot-level estimation is achieved by establishing mathematical relationships between LiDAR variables and the biomass, with an emphasis on model forms and parameter estimation methods [
16,
17,
18,
19]. The individual-tree level requires accurate segmentation of individual trees from LiDAR data in order to model and estimate the biomass of specific components or the entire tree [
14,
20]. Therefore, the acquisition of inventory factors by remote sensing techniques gradually matured [
20,
21,
22,
23,
24].
Nonlinear least square regression (NLS) is a common method for estimating parameters of the allometric equations, such as tree growth equations [
20,
25,
26]. NLS has a number of assumptions while modeling, and a violation of any assumption leads to the biased estimates of the parameters [
27,
28]. To deal with this problem, estimation of the NLS equation needs to be estimated using the seemingly uncorrelated regression (SUR) method [
29,
30,
31,
32,
33,
34]. Previous methods of estimating tree biomass include: (1) only stem is considered for construction with one equation, which results in the univariate model and binary model; and (2) all the tree-components are considered for a model system construction using different models. The former method is not complete, and the latter method has a remarkable drawback, such as low accuracy, because of different responses to the factors for different tree-components. Different tree-component biomass models are constructed independently, which leads to the ignorance of the correlation between the components [
35]. To solve the additivity problems, additive biomass equations can be developed [
34,
36,
37].
Similarly, tree biomass estimation based on LiDAR data has the same problems, as pointed out above. The independent variables in a biomass equation should be free of measurement errors, which is a prerequisite for the use of the nonlinear seemingly uncorrelated regression (NSUR) simultaneous equations. However, the independent variables, such as DBH, H, crown width, etc., can be derived from remote sensing products and used for data processing and analysis and may contain substantial random errors and systematic errors [
38,
39]. The potential solution to these problems is to develop LiDAR-based biomass models using error-in-variable modeling, which ensures the additivity of tree-component biomass models while taking measurement errors into account [
25,
27,
39,
40,
41]. However, only a few of the existing biomass modeling studies have considered the additivity of biomass equations and have failed to recognize the inherent correlation between the tree-component biomass and DBH.
Therefore, we developed the error-in-variable biomass models for predicting forest aboveground biomass based on airborne LiDAR data, with the aim of ensuring the compatibility of the individual tree component biomass with DBH and the additivity of components when estimating biomass. The main elements of this study were as follows: (i) The error-in-variable modeling approach was used to develop a system of compatible AGB and individual tree-components biomass with airborne LiDAR data; and (ii) NSUR was applied as the parameter estimation method. This study can be novel, as it considers the measurement errors and ensures the additivity of tree-components biomass on the airborne LiDAR data. This method may be applied for a large-scale biomass prediction purpose (from individual to stand), and the model-considering the additivity of each component would accurately estimate stand biomass.
4. Discussion
Forests are the mainstays of the terrestrial ecosystem [
61]. Precise quantification of forest biomass for the evaluation of ecosystem functions and productivities is necessary. Methods of obtaining forestry data, including biomass, carbon storage, and some other forest survey data that are timely and effective have always been a concern in forest science [
62]. In our study, we used error-in-variable modeling to develop an additive model system of tree protoxylem, branch, bark, and foliage with the use of the airborne LiDAR data. A model system not only ensured the compatibility of DBH and different component biomasses, it also ensured the additivity of different component biomasses. In the first step, we built the correlations between the DBH and different component biomasses with the assumptions that the observations of the independent variables would have no measurement error in them, while the observations of the dependent variable would contain measurement errors. Generally, if a measurement error existed in the observed value of the independent variable, the estimation of the model might be largely biased. However, the error-in-variable model could be theoretically unbiased [
63], and the findings of Fu et al. [
37] also supported this.
In our study, only LiDAR-derived tree height was included in a biomass model system. As an individual, LH had the highest contribution to DBH and was consistent with statistical modeling principles [
64]. From a remote sensing perspective, the canopy diameter would be more difficult to measure than LH and CPA because of a complex crown shape [
65]. Several studies found a strong relationship between canopy diameter and DBH [
66,
67]. For example, several researchers [
68,
69,
70] mentioned that canopy diameter could be measured based on the geometric method of the crown projection area. In our study, both the effects of canopy diameter and crown projection area on the LiDAR-DBH model were evaluated, and it was found that canopy diameter did not significantly contribute to the model improvement; however, the crown projection area had a significant contribution to improving the LiDAR-DBH model. Additionally, the crown projection area had a significant collinearity problem, and that was why we chose to exclude this. The main reasons were: (i) although the CPA could reflect the crown effects, LH had a stronger relationship with DBH, and (ii) adding the CPA diameter into the model would provide a weaker predictability, as this would increase the model complexity; these variables would significantly correlate with each other. Considering all of this, only the LH was selected as a predictor in our biomass model system. This resulted in the two-parameter model, which could enhance the robustness and stability of the models when the model system is embedded into a complete algorithm in the future.
A previous study showed that the measurement error model was more advantageous when DBH had measurement errors [
21,
25,
62]. In practice, the sources of random error largely vary and are not easy to avoid, such as the airborne LiDAR system, due to weather, space, and other factors. In the process of single-tree segmentation, height estimation accuracy is one of the many factors [
71]. In our study, we identified the most suitable parameter estimation method of the measurement error model through a rigorous comparison. For NSUR, the one-step method had only a slight advantage over the two-step method (
Table 6). The reason was that we let the stem = protoxylem + bark and the crown = branch + foliage, and there were still some errors between the stem and the crown of the biomass model and the aboveground biomass model, which would lead to an increase in model error. However, the two-step method provided an idea for the LiDAR inversion of the tree biomass when the allometric growth equations of the stem and the crown for obtaining different components of biomass were known. For TSEM, the one-step model system form had an advantage over the summation method (
Table 6), and it was because the summation method had one more equation than the proportion method, which increased the system complexity, and was subjected to more constraints in the calculation process. For different components, the performances were different. It might be related to the amounts of biomasses of different components, and a model system was constrained to prefer components with smaller biomasses. Overall, the NSUR one-step method was clearly more convenient and practical than TSEM, which needed to be computed in Forstat because the R nlsystemfit (nonlinear equation system estimation) function [
72] had a wider range of users.
Generally, the fitting effect of the model should be identical within a reasonable range [
73]. However, for different components, the performance of each method may have large variations. This may be due to fitting data, as this study included the age class of
Larix gmelinii var. principis-rupprechtii. For the whole tree, the order of the biomasses was protoxylem > branch > bark > foliage (
Table 1). The biomasses of bark and leaves in the juvenile forest were small but not much different from that of other age classes. However, the small biomass of protoxylem led to irregularity in the protoxylem data structure. Therefore, we suggest the establishment of different model systems as per age groups in practical applications to improve the estimation accuracies of biomasses.
Heteroscedasticity is a problem that must be solved [
74], and the solution is to linearize the model or increase the weight function. In contrast, the former method might increase the complexity of the calculation. Thus, it is necessary to select the appropriate weight function. However, a larger or smaller weight function might cause poor heteroscedasticity reduction; this might be because weight function cannot regulate well the weights of residuals. To explore and select the weight function, we found that the effect of heteroscedasticity reduction would be largely related to the size of the data. This provided a scheme and a suggestion for other similar studies to select the weight function while reducing heteroscedasticity.
We developed this a biomass model system not only to ensure the compatibility of DBH and tree-components but also to ensure the additivity of the aboveground biomass and the different components biomass. This would potentially expend the use of LiDAR data for estimating AGB from tree-components to individual tree and then to stand, which does not require the field measurement data; NSUR-one-step appeared to be the best method. Even though the final fitting effect of each model system was not attractive, the best R
2 was 0.61 for components of foliage. This might be mainly due to: (i) the error propagation, in which the LH-DBH had an error of 0.56, and the DBH-components, which had an error of 0.97; and (ii) the uncertainty of the sources of the relationship between DBH and different components biomass [
75]. In our study, only the independent variables were considered without error, but the uncertainties might arise from the error of using LiDAR data, the measurement error of variables, the error of using LiDAR data to delineate tree crowns and estimate LH and DBH, and the model parameters estimation error. Overall, if the error sources were identified clearly, a biomass model system could be improved timely and effectively, which will be our main task in the future.