Determining the exact position of a forest inventory plot—and hence the position of the sampled trees—is often hampered by a poor Global Navigation Satellite System (GNSS) signal quality beneath the forest canopy. Inaccurate geo-references hamper the performance of models that aim to retrieve useful information from spatially high remote sensing data (e.g., species classification or timber volume estimation). This restriction is even more severe on the level of individual trees. The objective of this study was to develop a post-processing strategy to improve the positional accuracy of GNSS-measured sample-plot centers and to develop a method to automatically match trees within a terrestrial sample plot to aerial detected trees. We propose a new method which uses a random forest classifier to estimate the matching probability of each terrestrial-reference and aerial detected tree pair, which gives the opportunity to assess the reliability of the results. We investigated 133 sample plots of the Third German National Forest Inventory (BWI, 2011–2012) within the German federal state of Rhineland-Palatinate. For training and objective validation, synthetic forest stands have been modeled using the

Modeling and characterizing forest stands on small scales using high-resolution remote sensing data requires spatially explicit linking of inventory information and remote sensing data [

The effect of positional displacements between terrestrial reference data and ALS data has been investigated by Gobakken and Naesset [

Individual tree information has a specific added value for quantitative modeling compared to plot-based methods insofar as it includes features which address structural tree characteristics [^{2} are sufficient for tree detection [

Numerous research groups have analyzed the performance of different GNSS receivers considering various forest types and conditions [

Wing et al. [

In forest inventories (e.g., in the German National Forest Inventory, BWI), individual tree positions are typically determined by measuring their distance to the plot center (e.g., using measuring tape or an ultrasonic distance meter and laser meter), and the corresponding azimuth angle using a compass. Thus, the accuracy of a tree position depends on the experience and care of the field staff, the accuracy of the measuring tools, and the distance of the tree to the plot center. The relative horizontal accuracy-level of tree positions can be assumed to be in a range between 0.3 m and 1.0 m [

Tree heights are typically determined using distance and zenith angle measurements that result in similar error sources. Luoma et al. [

Typically, a manual co-registration is performed by identifying pairs of corresponding survey trees and aerial detected trees using canopy height models (CHMs) or high spatial resolution aerial imagery [

Olofsson et al. [

The method of Dorigo et al. [

Like Olofsson et al. [

Existing co-registration methods are especially challenged when the number of reference trees is low and the stand characteristics are complex for tree matching (high stand density, occurrence of similar tree patterns, evenly-aged stands, high proportion of deciduous trees). Heuristic methods have the disadvantages that the assumptions are defined based on the intuition of the expert, the suitability of these heuristics is hard to evaluate, and finally, the heuristics might compete. In addition, the algorithms provide no higher-order information (beyond quality flagging) about the reliability of the results.

Available automated algorithms for matching survey trees and aerial tree detections are very similar, and are all based on rules defined by an expert user. For each terrestrial reference tree, neighbored detections (candidates) are selected within a predefined [

Alternative methods which might be suitable for co-registration and tree matching can be found in the fields of image pair registration, object recognition, or point set registration. These disciplines deal with finding a transformation function between two 2D or 3D point sets.

In the case of non-distorted point sets, the usage of the random sample consensus (RANSAC) [

For both distorted and non-distorted point sets, algorithms which identify matching point pairs by iteratively applying affine transformations while minimizing some kind of an objective function are popular. Examples comprise the Softassign algorithm [

Existing methods for co-registration are challenged by complex and variable stands (unevenly or evenly aged, mixed stands, or purely coniferous/deciduous)—especially if the number of survey trees is low. The limited number of survey plots particularly impedes the parametrization of the algorithms. To overcome these limitations, a procedure is needed which uses synthetic (modeled) training data to achieve optimal parameters and to reduce the need for additional validation data.

Since mismatched surveyed and detected tree pairs reduce the performance of empirical models (e.g., individual tree-based), a higher-order valuation of reliability (e.g., an a posteriori matching probability) is needed. This information could be suitable to enhance the performance these models by increasing the amount of usable reference data. This could be achieved by weighting the importance of a training dataset in relation to its reliability.

The objective of this study is to develop an automatic method for the co-registration of forest survey plots and ALS data which also provides information about matching trees and the reliability of a specific result. To provide an objective accuracy assessment, a process chain needs to be implemented that takes the inventory and tree detection characteristics into account.

In a further step, the method will be applied to data from the Third German National Forest Inventory Bundeswaldinventur (BWI), which has shown to be an unreliable reference data set at the individual tree level because of gross GNSS errors and a small number of trees per sample plot. This application example was chosen as a preparatory step for a subsequent study focusing on forest characterization on the individual tree level.

To develop and evaluate our co-registration method, we used inventory plots from the latest BWI (2011–2012) within the area of Hunsrück-Hochwald National Park located in Rhineland-Palatinate, Germany (

In the BWI, a tree is recorded as a sample tree according to the angle count sampling technique [

As a result of this sampling technique, each plot is characterized by an individual maximum radius (determined by the most distant tree) as well as the maximum limiting circle (maximum distance where the strongest tree would still have been selected). The angle count sampling technique realizes the sample inclusion probability of a tree being proportional to its diameter. Therefore, this technique prefers in particular the selection of diameter strong (usually tall) trees.

We focused on plots with at least two recorded trees of at least 4 m height, which resulted in 133 plots (65 plots surveyed 2011 and 76 surveyed 2012). These plots comprise 1015 trees in total, consisting of 43.3% Norway spruce (

The ALS data acquisition was accomplished from 24 March to 7 April 2015 using a ^{2}. The ALS datasets were provided by the state forest service of Rhineland-Palatinate in the form of pre-classified (ground vs. other classes) LAS-files. For each of the investigated BWI plots, we selected a circular subset with a radius of 38 m around the plot center.

Each tree

Following these definitions, each survey plot forms a data-set with a set of surveyed trees

A digital terrain model (DTM) is generated for each plot based on filtered ALS points. To achieve these filtered points all points already classified as ground are selected. Then each ground point is investigated iteratively in ascending order of z-coordinate. For a given point, all points within a horizontal radius of 0.8 m are removed, except for the one with minimum z-coordinate. This procedure results in a subset of ground points which is characterized by a point spacing of 0.8 to 1.6 m. A Delaunay triangulation of this subset finally serves as DTM.

Since the angle counting technique prefers the selection of diameter strong (usually tall) trees, the detection of particularly dominant trees is expedient to register surveyed trees and detected trees. In the context of matching these surveyed trees to ALS detected trees, an identification of small or suppressed trees is not necessary. Thus, we decided to detect individual trees by identifying local maxima within the ALS point clouds. Hence, a point is assumed to correspond to the top of a tree if no point within a radius of 3 m has greater z-values. The threshold of 3 m seemed to be feasible due to the observed stand densities and the ALS pulse density. Unreliable tree detections might reduce the performance of the proposed algorithm, but may help to develop a robust approach. The heights of the detected trees are estimated by height normalization using the DTMs. To avoid commission errors, detections with heights below 4 m and with a horizontal distance to the plot center above 33 m are omitted.

Usually, the exact spatial location of a survey plot center is unknown; studies focusing on co-registration [

Olofsson et al. [

For each of the BWI sample plots within our study area, we simulated one hundred synthetic forest stands with a circular area of 0.7 ha using the

We derived synthetic inventory data in each simulated stand by applying the BWI sampling technique after randomly relocating (GNSS error) the plot center. Each tree is labeled with the ID taken from the synthetic stand. Based on literature research on GNSS accuracy (

Simulated ALS point clouds were generated to derive synthetic tree detections. Like Frazer et al. [

A tree crown is defined by the tree species-specific parameters

For each plot, we generated uniquely-distributed xy-coordinates with the same pulse density as the original point cloud. For each point, the height above ground was calculated by applying Equation (

Since the use of crowns shape functions might result in well-shaped crowns, which would allow an unambiguous co-registration, we simulated irregularities by adding residuals. After a coarse manual optimization (with the aim of realistic tree detection characteristics), we chose normally-distributed horizontal residuals with a standard deviation of 1.5 m and gamma-distributed vertical residuals with a shape of 3 and a scale of 0.3 m.

Finally, the tree detection method presented in

When using simulations for algorithm training and validation, it must be considered that characteristics such as the spatial distribution of the trees can have severe impacts on a successful registration and the transferability of the method. Therefore, a high similarity between the actual and simulated forest stands has to be guaranteed. Assuming full comparability between the synthetic and the original datasets, similar point statistics should be achieved. Consequently, we evaluated the number of trees, the number of detections per reference tree, and the mean nearest-neighbor distance

To assess the suitability of the simulated datasets, various statistical analyzes of the previously mentioned variables have been performed. To decide if the values between the original and simulated datasets generally differ, the two sample Wilcoxon rank sum test (comparison of medians) and the F-test (comparison of variances) have been performed. To evaluate the correlation between the original and the simulated values on plot level, the Pearson’s correlation coefficient has been calculated. To investigate the overall agreement on plot level, the RMSE has been calculated and a paired Wilcoxon rank sum test has been applied for each plot. We have chosen the nonparametric Wilcoxon test to not rely on normally distributed data.

We developed a tree-matching and co-registration method which consists of two major components.

Classification-based estimation of the matching probability for each potential tree pair.

Co-registration of the survey trees based on the estimated matching probabilities.

Since both components require linking surveyed trees and detected trees, we firstly define a generic point assignment process. We also define a method for pure distance based tree assignment.

The generally defined point assignment process assigns unique pairs of two n-dimensional point sets

The elements of

To assign trees within a given distance and to combine the two-dimensional neighborhood criterion with the height criterion of e.g., [

Based on this

Following the idea of feature descriptors, the proposed tree matching method derives features which have the potential to classify each potential tree pair

Caused by GNSS errors, a shift between the surveyed trees

Further, we assume the surveyed tree

For a given sequence of matching tree pairs

In addition to these indicators for similar point patterns, additional features are derived which give information about uncertainties associated with a potential tree pair. Since the probability of identifying a correct tree pair by chance decreases with an increasing number of potential tree pair combinations, the number of trees

Based on the given features, a feature vector

The classifier needs to be trained using a series of representative datasets with known matching tree pairs. For each dataset (with

Since the tree matching method presented in

The proposed co-registration algorithm initially selects all candidate pairs

As such a tree pair sequence might include pairs with low probabilities, the pair with lowest probability is omitted iteratively until the probability

Finally, the point pair sequence or subsequence

We implemented the co-registration and point matching method using

As more tree pairs do not necessarily lead to better results, we set the parameter

For each of these datasets (with inventory trees

Finally, we also applied both approaches (with and without GNSS correction) to the original BWI datasets.

To assess the accuracy of the proposed algorithm, confusion matrices were derived using the synthetic validation datasets. Each tree pair

If the probabilities predicted by the algorithm (

To verify the expectations towards the features, we have analyzed the co-registration results of one randomly selected simulation round. To analyze the effect of a specific feature on the matching probabilities, we have subdivided the results of all plots into three groups. The groups have been defined by the intervals 0–0.33 (low probability), 0.33–0.67 (intermediate probability) and 0.67–1 (high probability). Based on these groups a boxplot has been generated for each feature, which allows conclusions about which values are associated with higher or lower matching probabilities.

To investigate the effect of different forest characteristics on the co-registration performance, we have evaluated the results of all validation datasets. The effect of a selected variable (e.g., tree species) on the co-registration probability has been assessed by grouping the results using equidistant intervals of the variable. Based on these groups a boxplot of the selected variable has been created and a statistical analysis has been performed to support the conclusions. To decide if the values of a given variable differ between the groups (meaning at least one group differs significantly from another group), we have applied the Kruskal–Wallis rank sum test. In case of just two groups the Wilcoxon rank sum test has been used instead, which also gives information about the direction of the effect. In case of ordered groups, we have also verified the significance of potential (linear) trends by performing an one-way ANOVA.

Using the synthetic forest stand data, the individual tree detection method achieved an average commission error (number of unassigned detections per number of detections) of 11% and an average omission error (number of missed trees per number of reference trees) of 72%. The planar RMSE of the tree positions was about 0.75 ± 0.39 m, and the RMSE of the heights was about 1.21 ± 0.55 m. These accuracy descriptors are consistent with most of the methods benchmarked in [

The test results for the amount of trees and the amount of detections per reference tree indicate realistic stand densities. The increased average NNDs of the synthetic datasets indicate an overestimation of the tree distances. This effect might be caused by the fact that the original stands are more clustered (e.g., because of aisles or leaning trees), while the simulated trees are distributed more uniformly (with consideration of tree competition). This effect also leads to an incomplete coverage of the value range for this attribute.

As lower NNDs reduce the separability of points, the synthetic inventory plots might be easier to co-register than the original ones. Nevertheless, since the variance within the synthetic datasets is greater than within the original datasets and the value ranges of the other attributes are completely covered, the simulated datasets can be assumed to be appropriate. Thus, by training the algorithm with the synthetic datasets, it should be applicable to the original datasets without losing accuracy.

The algorithm for individual tree matching (

The proposed co-registration method achieved an overall accuracy of 82.7% and a Cohen’s kappa of 0.70 using the synthetic validation datasets (

For tree matching, the observed probability was underestimated for values above 0.5, which resulted in a conservative assignment of tree pairs. This effect might be caused by involving the characteristics of almost all trees in the decision process. Since the expected one-to-one relationship was not archived, the predicted probabilities might be handled with caution. To derive real probabilities, a correction of the predicted values (e.g., by using the given regression model) would be necessary.

Although the predicted co-registration probability depends on the predicted matching probability, an almost one-to-one relationship was achieved. These results indicate that the algorithm is able to correctly estimate the reliability of a specific co-registration result.

To analyze the meaningfulness of rules identified by the classifier, we accomplished an evaluation of the feature importance and the effects of the features on the matching probability. In general, the rules identified by the classifier aligned with our previous expectations presented in

The high feature importance of the 3D displacement and the vertical displacement go along with our prior expectations.

Since the distances between the closest tree pairs indicate the agreement of the point patterns, these form the most important group of features.

The highest matching probabilities are achieved if a positive correlation between the survey tree heights and ALS tree heights occurs (

Since the number of linked trees is an indicator of matching point patterns, the probability of a successful co-registration increased with an increasing number of linked trees (

An increasing number of tree species (as an indicator of stand variability) results in a positive effect on the co-registration result. The ANOVA of the co-registration probability grouped by the number of species within a plot (

The ANOVA of the co-registration probability grouped by the height of the tallest tree (

The proposed method was able to successfully co-register about 69.5% of the synthetic inventory plots, while it linked 81.5% of the tree pairs correctly. By applying a pure distance-based tree assignment (ignoring GNSS errors), only 41.0% of the trees were linked correctly, and the criterion of at least 50% matching tree pairs was fulfilled for only 32.5% of the plots.

The original BWI plots showed similar results. By applying the pure distance-based tree assignment, 288 tree pairs were identified, but only 107 (37%) of these pairs were classified as matching. Only 29 of the plots (22%) fulfilled the criterion of at least 50% tree pairs classified as matching. By applying the proposed co-registration algorithm, 517 tree pairs were assigned. Of these tree pairs, 261 (50%) were classified as matching. The algorithm indicated that 80 of the 133 inventory plots (60%) were co-registered correctly, with 414 assigned tree pairs and 230 (72%) pairs classified as matching. Based on these results, the investigated original BWI plots were characterized by horizontal GNSS errors of up to 21 m, while 80% of the horizontal GNSS errors were in a range between 1.4 and 8.7 m.

The analysis of the effect of the time-lag between the field survey and the ALS flight campaign indicates that the proposed co-registration method will be challenged if there is major changes between both dates of observation. This is particularly the case if the trees might have grown, toppled, or been harvested within the time lag between field survey and ALS data collection.

Since we needed to model synthetic training data based on various assumptions, the transfer of the probability estimations to the original datasets should be done with caution. The procedure of simulating ALS point clouds using crown shape functions is justified by the observation that the most ambiguous cases for co-registration occur in dense forest stands, where individual trees are hard to detect because of overlapping crowns. Nonetheless, non representative point clouds might affect the tree detection characteristics. Although the suitability of the simulated datasets has been tested, it cannot be ensured that the datasets are representative. This is particularly the case because the synthetic survey plots were characterized by increased NNDs compared to the original BWI data. However, the usage of simulated forest stands leads to more objective validation results compared to reference locations derived by visual interpretation.

Since the number of potential tree pairs and positional inaccuracies complicates the identification of matching tree pairs, the rudimentary tree detection method used in this study might have a negative effect on the algorithm performance. The ANOVA of the co-registration probability grouped by the number of detections per hectare (

Another limitation of the method is that no perfect one-to-one correlation between the predicted and observed probabilities could be achieved. Thus, the predicted matching probabilities provided by the algorithm are a higher order reliability estimation but do not necessarily correspond to actual probabilities. However, since the predicted co-registration probabilities rarely differ from the one-to-one correlation, they seem to represent actual probabilities.

The parametrization of the

We performed a co-registration of 133 BWI sampling plots to ALS-derived individual tree detections of a study area in Rhineland-Palatinate, Germany as a preparatory step for a forest characterization on the individual tree level. As erroneous tree pair assignments result in a reduction of model quality (e.g., biomass estimation or tree species classification), we searched for a method for co-registration and tree matching which also rates the reliability of a match.

Since existing methods seemed to be unsuitable for this task, we developed a hybrid co-registration and tree matching algorithm. We trained the machine-learning-based method using synthetic individual tree detections and inventory data, whose representativity has been tested empirically.

The method reached an overall accuracy of 89.7% for tree matching and a user’s accuracy for co-registration of 82.7% using simulated datasets. The method has been applied to the study area, and was able to successfully relocate 60% of the BWI plots, which makes these usable for a further analysis on the individual tree level. As machine-learning methods have been proven powerful to identify patterns, a similar performance of the algorithm compared to a human expert can be expected. Since the interpreter might be biased to the proposed solution, a manual post-processing is not expedient.

The derived feature importance depends on forest characteristics and the assumed GNSS error distribution. The feature importance and its effect on the matching probability has shown to be consistent with prior expectations. Thus, the machine-learning approach seems to be an appropriate alternative to heuristic methods, with the additional advantages of an automated parametrization and a robust estimation of the reliability of a single co-registration and tree matching result.

We found that with five or more linked tree pairs, at least 90% of the co-registration results can be expected to be correct. If the number of terrestrial trees is limited, heuristic methods typically cannot comply with this requirement. We found that the highest probabilities for a correct co-registration were achieved in heterogeneous stands (mixed species, differing tree heights, presence of tall trees). Stands dominated by conifers achieved significantly better co-registration results than stands dominated by deciduous trees. These findings support the results of previous studies.

To achieve better results, the local-maxima-based tree detection method used in this study should be replaced by a more advanced method (e.g., by tree stem detection). Since more accurate tree positions result in a better agreement of the point patterns, the pre-trained algorithm can directly be applied to these more reliable positions without losing explanatory power. To transfer the method to forests with different characteristics or a different sampling design, only an adaption of the (simulated) training datasets is required.

The probability estimations provided by the algorithm—as an objective indicator of the reliability of a specific result—lead to a clear added value compared to existing methods. Since it can serve as a weighting-factor for model training, for example, the proposed method is a relevant tool for gaining further knowledge in the field of forest characterization on small scales or even on the individual tree level.

The authors wish to thank the state forest service of Rhineland-Palatinate for providing the ALS data and the Thünen Institute for providing the BWI data. The study was embedded in the TriCSS-project (Trier Center for sustainable Systems) which was funded by the research initiative of Rhineland-Palatinate. The publication was funded by the Open Access Fund of Universität Trier and the German Research Foundation (DFG) within the Open Access Publishing funding programme.

Andreas Hill and Sebastian Lamprecht initiated the study and conceived the study design. Andreas Hill generated the synthetic forest stands. Sebastian Lamprecht derived the simulated ALS point clouds, developed and implemented the algorithm, analyzed the data and wrote the paper. Andreas Hill, Johannes Stoffels and Thomas Udelhoven cross-checked the analysis and the manuscript. Thomas Udelhoven led the research group.

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

The following abbreviations are used in this manuscript:

Study area and German National Forest Inventory (

Study design. For each inventory plot, 100 simulations are generated which serve for algorithm training and validation. Finally, the algorithm is applied to the original inventory plots. ALS: airborne laser scanning.

(

(

Correlation between predicted probability and observed probability for (

Effect of (

Feature importance derived by the random forest classifier.

Effect (

Effect of (

(

Effect of the number of detections per hectare on the matching probability using the simulated datasets. The numbers in brackets correspond to the number of plots. The whiskers extend to ten times the interquartile range. Outliers are marked by circles.

Basic statistics of the 133 investigated BWI plots. DBH: diameter at breast height; HDOP: horizontal dilution of precision.

Attribute | Minimum | 1th Quartile | Median | Mean | 3th Quartile | Maximum |
---|---|---|---|---|---|---|

DBH (cm) | 7.0 | 23.8 | 36.2 | 36.73 | 47.9 | 112.5 |

Tree height (m) | 5.2 | 19.8 | 24.8 | 24.65 | 29.9 | 50.6 |

Stems per hectare | 48 | 202 | 412 | 742.4 | 839 | 6014 |

Number of recorded trees per plot | 2 | 5 | 7 | 7.6 | 10 | 16 |

Maximum radius (m) | 0.3 | 3.0 | 5.25 | 5.9 | 8.1 | 20.8 |

Limiting circle radius (m) | 2.6 | 8.9 | 12.6 | 12.4 | 15.7 | 28.1 |

HDOP | 0.8 | 1.1 | 1.2 | 1.32 | 1.5 | 3.2 |

Tree species-specific light crown model parameters according to Pretzsch [

Species | ||
---|---|---|

0.50 | 0.50 | |

0.40 | 0.33 | |

0.50 | 0.50 | |

0.56 | 0.50 | |

0.66 | 1.00 | |

0.64 | 0.50 | |

0.80 | 0.45 |

Evaluation results of the synthetic training data

Attribute | Overall | On Plot Level | ||||
---|---|---|---|---|---|---|

Characteristic Values (Minimum, Mean, Maximum) | Two Sample Wilcoxon Test | F-Test | Pearson’s Correlation Coefficient | RMSE | Paired Wilcoxon Test | |

Amount of Surveyed Trees | 0.98 | 0.69 | ||||

Amount of Detected Trees | 0.64 | 15.00 | ||||

Amount of Detected Trees per Survey Tree | 0.83 | 3.30 | ||||

Mean NND for Surveyed Trees (m) | 0.81 | 1.90 | ||||

Mean NND for Detected Trees (m) | 0.58 | 1.36 |

Confusion matrix for tree matching using synthetic validation data.

Actual Class | Totals | Users’s Accuracy | |||
---|---|---|---|---|---|

Not Matching | Matching | ||||

Not Matching | 12,747 | 1663 | 14,410 | 88.5% | |

Matching | 683 | 7662 | 8345 | 91.8% | |

13,430 | 9325 | 22,755 | |||

94.9% | 82.2% |

Confusion matrix for co-registration using synthetic validation data.

Actual Class | Totals | Users’s Accuracy | |||
---|---|---|---|---|---|

Co-Registration Failed | Co-Registration Successful | ||||

Co-Registration failed | 1731 | 595 | 2326 | 74.4% | |

Co-Registration successful | 890 | 5377 | 6267 | 85.8% | |

2621 | 5972 | 8593 | |||

66.0% | 90.0% |