1. Introduction
The timely and cost-effective acquisition of forest inventory data, including forest carbon, at a large scale, has long been of concern for sustainable management and planning of forest ecosystems [
1,
2]. Tree diameter at breast height (DBH) and biomass are the two of the most common measures of tree size in forest mensuration. Moreover, tree DBH and biomass are also two essential factors in forest growth and yield modeling. Although tree DBH can be easily measured on the ground with high accuracy, ground measurements of tree biomass are less accurate and often difficult, time-consuming, and costly. Consequently, DBH is usually measured for all trees in ground-based forest inventories, experimental and permanent growth plots, but tree biomass measurements are usually obtained from an affordable number of sample trees [
3]. Correspondingly, a large number of aboveground biomass-, belowground biomass-, and total biomass-DBH equations have been developed for numerous species in many forest types over the last 20 years [
2,
3,
4,
5,
6,
7,
8]. As a result of these studies, predicting tree biomass from DBH has become a well-established technique, widely applied in forest inventory and growth and yield predictions. However, the collection of DBH data at a large scale can also be costly and time-consuming.
The affordability and capability of high spatial resolution images from airborne sensors and satellite systems have made it possible to accurately extract information of tree variables from the images at a large scale and then establish the relationship of a tree biomass with the tree variables that are measurable on the ground and can be also accurately estimated from the images [
9,
10,
11,
12,
13,
14]. Therefore, the methods for obtaining DBH have shifted from conventional ground-based forest inventories to modeling and prediction based on remote sensing technologies [
11,
15,
16,
17].
Light detection and ranging (LiDAR) is one of the most promising remote sensing technologies for estimating various biophysical properties of forests and has been widely applied in forest measurement and inventory over the past two decades [
14,
18]. The technique, implying the automatic extraction of information of individual trees instead of stands on the basis of LiDAR data, has drawn great attention in recent years [
1,
11,
19]. LiDAR provides accurate estimates of terrain elevation and vegetation height, and this holds even on sloped terrain or in dense and complex forests [
20]. The use of LiDAR data is appropriate to estimate tree DBH and biomass, as point clouds generated from forest canopies can accurately depict the physical characteristics of the canopy surface and structure [
21]. The physical characteristics, including tree height, crown projection area, crown diameter and shape, are often significantly correlated with the tree DBH and biomass, and have been commonly utilized to develop their relationships with either DBH or biomass and further obtain LiDAR-derived tree variables-DBH or -biomass models [
1,
11,
15,
19].
In practice, a nonlinear least squares regression (NLS) has been used to estimate the parameters in the LiDAR-derived tree variables–DBH or –biomass models [
1,
11,
22]. This method is generally acknowledged as a standard regression technique for modeling the relationship between variables. Its major assumptions are as follows: (i) the dependent variable is a random variable; (ii) the independent variables are fixed and observed without errors; (iii) the model error term has an independent and identical normal distribution with a zero mean and constant variance [
23]. It is well known that the violation of the second assumption above may lead to biased estimates of the regression coefficients and/or of the standard errors of the coefficients and, consequently, to misleading hypothesis testing [
23,
24]. The non-normal distribution of the model error term may also mislead hypothesis testing even though the NLS coefficients and standard errors would be estimated without bias.
Although tree biomass may be estimated from LiDAR-derived tree height or crown attributes (such as crown width) alone, the accuracy of such estimates is often not satisfactory [
25]. The estimated DBH from LiDAR–DBH models and the physical characteristics extracted from remote sensing images (e.g., LiDAR-derived tree height) have been commonly used to estimate tree biomass based on LiDAR-biomass models (e.g., allometric biomass models) [
15]. However, the applications ignore the fact that the estimates of DBH and other tree variables from remote sensing data contain random and systematic errors from data acquisition, processing and analysis, tree detection, measurements of tree variables, model coefficients, etc. [
1]. Therefore, it is problematic to estimate the regression coefficients of the LiDAR–DBH and –biomass models using NLS when both response variable and regressors are subject to measurement errors, which will have impacts not only on the signs and values of the coefficients, but also on the significance testing of the coefficients [
26,
27]. Moreover, without DBH measurements, the existing stem volume and taper equations and individual tree growth functions where DBH is a predictor variable can no longer be used to predict tree and stand growth [
11]. However, such predictions are essential for the updating of forest databases [
11]. In addition, obtaining the data of intensively managed forests requires data not only for tree and stand volume, but also for different wood products. This will require the use of taper functions in which both tree DBH and height are utilized as predictor variables. All these calculations and predictions require measurements or estimates of DBH. More importantly, tree biomass and DBH are often significantly correlated with each other [
2,
3,
8]. Therefore, both LiDAR–DBH and LiDAR–biomass models should be developed simultaneously. However, the existing LiDAR–DBH and LiDAR–biomass models are separately fitted using NLS, which ignore the inherent correlation between tree biomass and DBH and lack compatible properties of the estimated tree biomass and DBH.
A potential solution to these limitations is the application of error-in-variable models that ensure the compatibility of both LiDAR–DBH and LiDAR–biomass models and also take into account the impacts of measurement errors [
1,
24,
28,
29,
30]. Many studies on error-in-variable modeling have been reported in forestry. For example, Tang and Zhang [
28] studied the unbiased estimations of regression coefficients using the error-in-variable models. Tang et al. [
29] investigated the capability of both two-stage error-in-variable and two-stage least squares regression to deal with model integration. Tang and Wang [
31] further offered a computer program for estimating the coefficients of the error-in-variable models. Li and Tang [
32] found that the error-in-variable models performed better compared with an extrapolation method based on simulation and regression calibration for addressing the measurement errors in both dependent and independent variables. Zhang et al. [
1] suggested that the error-in-variable models with maximum likelihood estimation were appropriate to predict DBH and crown width using remotely sensed data with the consideration of the independent variables containing the measurement errors. Fu et al. [
3] compared seemingly unrelated regressions with the error-in-variable models for developing a system of nonlinear additive biomass equations based on field measurements for
Pinus massoniana Lamb. They concluded that the error-in-variable models could provide greater potential to develop a system of biomass equations that are dependent on the predictors with significant measurement errors. To our knowledge, although several studies utilized the error-in-variable models to address the measurement errors in remote sensing applications [
1,
33], there have been no reports to use the error-in-variable models to deal with the measurement errors that are involved in independent variables, especially when both DBH and tree biomass are simultaneously estimated on the basis of LiDAR data. The compatibility of the estimated DBH and tree biomass using the traditional NLS is not guaranteed.
The objectives of this study were: (i) to develop a system of compatible individual tree DBH and aboveground biomass (AGB) models using the aforementioned error-in-variable modeling based on airborne LiDAR data; (ii) to evaluate and compare the system of compatible individual tree DBH and AGB models with two NLS-related models on the basis of their accuracies of predictions, using a leave-one-out cross-validation approach. This study is novel in the consideration of the measurement errors from both the dependent and independent variables (AGB and DBH) in the error-in-variable modeling, especially when both the AGB and DBH models are developed based on LiDAR data. Thus, it is expected that this proposed method can be applied to the estimation of tree DBH and AGB over large areas and expand the estimations of both DBH and AGB from individual trees to stands. It should be noted that to simplify the proposed system and make it more applicable, only DBH and AGB were considered to be error-in variables (endogenous variables) [
29], and other all variables in the developed model system were assumed to be error-out variables (exogenous variables) [
29]. In addition, this study would also like to demonstrate whether the use of the error-in-variable modeling could lead to some improvements of the DBH and AGB predictions or not, by comparison with two traditional modeling methods.
4. Discussion
Forests are considered as the most productive terrestrial ecosystems on Earth, containing more than 45% of the global terrestrial carbon stock [
63]. Quantifying the amount of forest biomass is necessary for land managers to make informed decisions about forest management and planning [
22]. In the current study, a system of compatible individual tree DBH and AGB models was developed using an error-in-variable modeling approach based on airborne LiDAR data. The measurement errors in both the dependent (AGB) and independent (DBH) variables and the correlation between DBH and AGB were taken into account in the developed model system. To make comparisons, other two widely used model structures (NLS&DD and NLS&NDD) were employed to estimate the DBH and AGB. For both NLS&DD and NLS&NDD, it was assumed that there were no measurement errors in the observed values of the independent variables, while the observed values of the dependent variable contained errors. In addition, the correlation between DBH and AGB was not considered for either of the two model structures. Thus, if the independent variables had measurement errors that were not negligible or the correlation between DBH and AGB was also highly significant, the estimates of DBH and AGB from both NLS&DD and NLS&NDD might be biased. However, the error-in-variable modeling approach can theoretically lead to unbiased estimations and is more generalized and flexible than NLS&DD and NLS&NDD [
30]. These characteristics were supported by the findings of this study (
Table 6).
The results of this study showed that both the crown projection area and the LiDAR-derived tree height could be used to infer individual tree DBH and AGB. This offers the potential to deploy remote sensing tools for large scale assessments of aboveground biomass stocks across entire landscapes. From a remote sensing perspective, crown projection area and LiDAR-derived tree height are much more easily measured than canopy diameter for the simple reason that the extracted crown diameter parameter is influenced by the direction of the measurement vector [
19]. Numerous studies have found a strong relationship between crown diameter and DBH [
64,
65]. Given an established sampling regime, determining a tree crown diameter (or canopy extent) aimed at minimizing the effect of crown asymmetry [
66]. Crown diameters could potentially be derived from the remotely sensed crown projection area measurements by applying appropriate shape parameters, as discussed in the studies of Fleck et al. [
66], Nelson [
67], and Grote [
68]. The crown diameter was also examined in the current study, and it was found that its contribution to improving the prediction accuracies of the LiDAR–DBH and LiDAR–AGB models was not significant. This was mainly because (i) the crown projection area involved in the proposed LiDAR–DBH model could reflect the crown effects, and (ii) introducing the crown diameter would complicate the model, with a very little gain in predictability. Thus, the crown diameter was not selected as a predictor in the LiDAR–DBH and LiDAR–AGB models.
In this study, the results from the error-in-variable modeling demonstrated its advantages over NLS in predicting AGB when DBH was subject to measurement errors for
Picea crassifolia Kom. in northwestern China (
Table 6). In practice, the predictor DBH in the LiDAR–AGB models was estimated from the LiDAR–DBH model, and therefore the random errors in the estimates of DBH were unavoidable. In addition, it is also difficult to avoid the measurement errors related to tree detection including the LiDAR system, the point density, the spatial pattern, the delineation algorithm, and the geometric characteristics of the crown shape in remotely sensed imagery at an individual tree level. More importantly, the regression models developed for a given dataset cannot be applied to predict the response variable for a dataset that is significantly different from the model development dataset [
30]. Lindley [
69] showed that if the prediction data were from the same population as the parameter estimation data, the predictions from the NLS models were generally unbiased even if the independent variable was subject to measurement errors. However, the prediction could be biased if the parameter estimation data were taken from a different population [
30].
The model system (7) fitted by both TSEM and NSUR performed better in terms of the predictions of the DBH and AGB, relative to NLS&DD and NLS&NDD (
Table 6). This indicated that the model system (7) was a more effective compatible model for DBH and AGB compared to both NLS&DD and NLS&NDD. When DBH and AGB were estimated from the separately fitted LiDAR–DBH and LiDAR–AGB models using NLS&DD and NLS&NDD, the prediction accuracies of DBH from the two model structures were similar to those obtained from the traditional base model II.2 in
Table 4. This was because the LiDAR–DBH models for both NLS&DD and NLS&NDD were identical to the base model II.2. However, the prediction accuracy of NLS&NDD was much higher than that of NLS&DD for AGB prediction (
Table 6). For example, the value of
from the NLS&NDD was 0.0581 and 99.12% smaller than that from NLS&DD (
), and the value of
from the NLS&NDD was 0.5370 and 0.65% larger than that from NLS&DD (
). This was probably because the DBH in the model (20) was replaced by the corresponding LiDAR–DBH base model II.2, and thus the measurement error from the estimated DBH used to estimate AGB was effectively avoided. We also attempted to develop a compatible model system for estimating individual tree DBH and AGB using base model II.2 and model (20) as two submodels with both TSEM and NSUR, but NSUR failed to converge. More importantly, the correlation between DBH and AGB in this model could not be accounted for.
Based on the leave-one-out cross-validation results (
Table 6), we found that the values of the same statistics were very similar to each other for NSUR and TSEM. However, the prediction accuracy of the model system (7) fitted by NSUR was slightly higher than that fitted by TSEM. In addition, the other main advantage of the NSUR method was that this could be readily implemented by both the R nlsystemfit (nonlinear equation system estimation) function [
70] and the SAS/ETS MODEL procedure [
71]. However, ForStat 2.1 software [
54] is not readily available for non-Chinese modelers. It should be noted that TSEM is recommended when convergence is a big issue in model applications.
Finally, this study aimed to develop a novel methodology, that is, a system of compatible individual tree DBH and AGB error-in-variable models, in which airborne LiDAR images and reference data of tree biomass were combined to derive the LiDAR–AGB and –DBH models, and the measurement errors of both AGB and DBH were simultaneously taken into account. This made it possible to expand the error-in-variable modeling for the estimation of DBH and AGB from individual trees to stands by introducing the LiDAR data. The second objective of this study was to demonstrate the potential improvements of AGB and DBH estimates as a result of the use of the error-in-variable modeling with the NSUR and TSEM approaches by comparison with the traditional modeling methods (NLS&DD and NLS&NDD). However, the results showed that the obtained improvements statistically did not significantly differ from zero at the significant level of 0.05. The reason might be mainly that the estimates of tree DBH and AGB were associated with various source uncertainties [
72,
73,
74]. The uncertainties might come from the measurement errors of the tree variables, the errors of the used LiDAR data, the errors in delineating the tree crowns and estimating the tree LH and DBH using the LiDAR data, the estimation errors of the used model parameters, etc. In this study, the measurement errors of both DBH and AGB were considered using the error-in-variable modeling method. Moreover, in order to decrease the estimation errors of the parameters of the empirical allometric models in
Table 1, the parameter estimates of the models were obtained by a nonlinear least square based on the field measurements of DBH and H, instead of their estimates, and used to estimate the biomass values of the trees. However, other source errors were ignored in this study. This implies that, in order to validate the potential improvements of AGB and DBH estimates derived from the use of the error-in-variable modeling, a systematical analysis of uncertainty propagation is needed by identifying all the error sources, modeling their propagation from inputs to outputs, and quantifying their impacts on the estimation accuracy of AGB [
72]. However, the uncertainty analysis is very complicated [
72], was not conducted in this study because of a limited space, and should be carried out in a future study. Thus, readers should take caution in using the conclusions of this study.