1. Introduction
Forests are the Earth’s largest terrestrial carbon store, holding more than three decades worth of global
emissions [
1] and consuming a quarter of new anthropogenic emissions [
2]. Pressingly, climatic trends are revealing grave uncertainty for long-term stability. According to U.S. Forest Service aerial surveys, over 200 million trees died in California since 2010, with 62 million dead in 2016 alone [
3]. The warming climate and the consequence of longer, more severe drought cycles are the primary culprits of this mass die-off. Significant numbers of dead and dying trees dramatically increase the risk of wildfires; these counts do not include tree deaths caused by wildfires, which adds hundreds of millions to the toll.
Forest management is a recognized, cost-effective approach to mitigating the effects of the climate crisis [
4]. Global carbon accounting, which is a crucial contribution to informed climate change policies, relies on large-scale forest surveys. Tree diameter at breast height (DBH) is a primary data point used in ecological monitoring and carbon accounting efforts; the conventional means of determining DBH relies on a human forester with a measuring tape. Accurate, automated methods of DBH estimation could drastically reduce the time and effort needed to perform surveys, opening doors for large-scale mapping efforts.
Three-dimensional (3D) reconstruction is the task of digitally representing real-world settings, typically in the form of a point cloud. Metric reconstruction is a subclass that preserves true scale in the recovered geometry and affords the ability to indirectly measure scene features (e.g., volume, length). Terrestrial laser scanning (TLS) is a well-studied approach for mapping forest inventories, offering the potential for rapid ecological assessment. However, these systems are expensive, costing 80,000–20,000 USD, and require a skilled operator. Survey-grade TLS systems can provide centimeter-level diameter estimation in forests with uniform tree structures and even terrain [
5]. The main technical issue faced by TLS methods is tree occlusions that require stitching many scans together from different spatial locations, a critical step in recovering complex geometries of forests. Research toward using mobile robot platforms in combination with Simultaneous Localization and Mapping (SLAM) algorithms addresses this problem using optimized pose-graphs to align thousands or millions of LiDAR scans taken along a robot’s trajectory. A 2017 paper [
6] using SLAM reports a best-case DBH estimation RMSE of 2.38 cm for well-represented trees. A 2024 study [
7] achieved 1.93 cm RMSE using a mixed Hugh-RANSAC trunk modeling approach. However, these solutions require expensive 3D LiDAR and inertial measurement unit (IMU) hardware (10,000–25,000 USD).
Recent advances in the fields of computer vision and deep learning offer a new paradigm for generating 3D reconstructions. Neural Radiance Fields (NeRFs) [
8] are an emergent technology enabling the recovery of complex 3D geometry by training a neural network on conventional imagery. NeRFs are a remarkable advance over traditional photogrammetry, producing higher quality, photorealistic 3D reconstructions from sparser input imagery and at an accessible efficiency. Since 2020, a community of developers has contributed hundreds of methodological improvements, rapidly improving its performance and accessibility. The ability to export NeRFs as point clouds lends itself as an aggressive alternative to expensive LiDAR-based mapping.
We present an evaluation of NeRF-based forest reconstruction for the task of DBH estimation of mixed-evergreen redwood forest located in Santa Cruz, California. This study compares NeRF reconstructions trained on conventional mobile phone imagery to LiDAR-inertial SLAM reconstructions sourced from a quadruped robot equipped with a custom multi-modal sensing platform. This paper also expands the viability and accuracy of TreeTool [
9], a Python toolkit for rendering DBH estimates from forest point clouds. Specifically, we propose a new set of features to support robust tree detection and accurate DBH estimation. Many studies have relied on cylinder-fitting to model trunk morphology. We observe a DBH underestimation trend with this method and propose an improved convex-hull modeling approach. In summary, the contributions of this work are:
Quantitative field study evaluating the performance of NeRF-based forest reconstructions compared to LiDAR-inertial SLAM with regards to DBH estimation accuracy.
Improved DBH estimation accuracy via a trunk modeling approach using convex-hull and density-based filtering methods.
Open-source modeling code and forest datasets, including SLAM and NeRF reconstructions of a mixed-evergreen Redwood forest, are freely available at
https://github.com/harelab-ucsc/RedwoodNeRF (accessed on 7 January 2025).
3. Design and Methods
3.1. Mobile Laser Scanning via LiDAR-Inertial SLAM
In order to perform SLAM-based reconstruction, we designed a robot based on the Unitree B1 quadruped platform. Terrain maneuverability was a prioritized feature to cope with rough, uneven forest terrain and complex obstacles. A custom-built multi-modal sensor head is attached, which includes LiDAR, stereo vision, inertial, and GNSS+RTK sensing modes. For online processing, the robot is equipped with an external ×86 mini computer (Commell LE-37R; Taiwan), which includes a 4.5 GHz Core i7-1270pe CPU, 64 GB RAM, and 2 TB storage. The LiDAR is an Ouster OS0-128 (Ouster, Inc; San Francisco, CA, USA) with a vertical field-of-view and 128 horizontal channels. The IMU is an IMX5-RUG3 (Inertial Sense, Inc.; Provo, UT, USA) capable of 1 KHz output, and fused EKF attitude estimates.
The computer runs ROS2 Iron for sensor recording and running LiDAR inertial odometry smoothing and mapping (LIOSAM) [
14]. This software fuses LiDAR and IMU data together to create dense spatial reconstructions in real-time along with optimized pose estimations. LIOSAM tightly couples LiDAR and inertial data in a joint optimization using a factor-graph SLAM architecture. Through loop-closure factors, LIOSAM is able to achieve minimal drift in large exploration volumes [
14] (see
Figure 2).
3.2. NeRF Reconstruction Pipeline
3.2.1. Training Data
Metrically relevant camera poses are needed in order to measure world features from NeRF reconstructions, while COLMAP can perform high quality vision-based reconstruction, it suffers scale ambiguity. The typical solution is to use VI or LiDAR-inertial SLAM pose estimation in which the metric information is derived from IMU or LiDAR range measurements. In this study, we use an iOS application called NeRFCapture [
15] which uses Apple ARKit to provide camera poses in real time. ARKit uses VI odometry with multi-sensor fusion, which lends itself as a good option for metric pose estimation.
3.2.2. Software Implementation
The base NeRF method discussed in
Section 2.2 has seen hundreds of proposed improvements over the years. Nerfacto [
16] is a method that draws from several other methods [
17,
18,
19,
20,
21] and has been shown to work well in a variety of in-the-wild settings. For this reason, we chose the Nerfacto method for this study.
Nerfacto improves on the base method in a few key dimensions, the first of which is pose refinement. Error in image poses results in cloudy artifacts and loss of sharpness in the reconstructed scene. The Nerfacto method uses the back-propagated loss gradients to optimize the poses for each training iteration. Another improvement is in the ray-sampling. Rays of light are modeled as conical frustums, and a piece-wise sampling step uniformly samples the rays up to a certain distance from the camera origin and then samples subsequent sections of the conical ray at step sizes that increase with each sample. This allows for high-detail sampling of close portions of the scene while efficiently sampling distant objects as well. The output is fed into a proposal sampler, which consolidates sample locations to sections of the scene that contribute most to the final 3D scene render. The output of these sampling stages is fed into the Nerfacto field, which incorporates appearance embedding, accounting for varying exposure among the training images.
We used the nerfstudio [
22] API, which makes training and exporting NeRF reconstructions extremely simple. Posed image data is copied to a remote desktop PC for training. This computer hosts a 3.8 GHz AMD 3960X CPU (AMD; Taiwan), 64 GB RAM, and 2 TB storage. The PC is also outfitted with two NVIDIA RTX-3070 (Nvidia; Taiwan) graphics cards, which aggregate to 16 GB of VRAM. The system runs Ubuntu 22.04 with CUDA-11.8 to interface with GPU hardware.
3.3. Tree Segmentation and Modeling
3.3.1. TreeTool Framework
To process forest reconstructions and estimate tree DBH, we use TreeTool [
9], a Python library built on Point Data Abstraction Library (PDAL) and Point Cloud Library Python (pclpy). TreeTool breaks the process down into three distinct steps covering filtering, detection, and modeling stages.
The goal of the filtering stage is to remove all non-trunk points, mainly the ground and foliage. The ground is segmented using an improved simple morphological filter (SMRF) proposed by Pingel et al. [
23]. This technique uses image-inpainting to accurately model complex, uneven terrain. Once the ground points are removed, TreeTool uses a surface-normal filter to remove foliage points. This is based on their observation that the surfaces created by trunk points have horizontal normals. TreeTool also filters non-trunk points by considering the curvature of surfaces. Curvature is interpreted as the percentage of information held by the eigenvalue associated with the normal vector. High-curvature points are discarded, which removes foliage.
The detection stage groups the filtered trunk points into individual stem sections. TreeTool uses Euclidean-distance to perform nearest-neighbor clustering. Some points belonging to the same trunk are inevitably grouped into separate clusters due to occlusions and errors in the reconstruction. To cope with this, TreeTool groups same-trunk clusters together. We add an extra clustering step using density-based clustering applications with noise (DBSCAN) [
24], which addresses the case where points from different trunks are grouped together. This is especially prevalent for resprouting trees like redwoods, which commonly grow with conjoined trunk bases.
The last stage involves modeling the segmented and filtered trees to estimate diameter and location. Tree clusters are vertically cropped such that the remaining clusters represent the trunks between 1.0 and 1.6 m above the modeled ground surface. DBH is estimated by taking the maximum diameter reported between cylinder and ellipse fitting methods. Random sample consensus (RANSAC) fits a cylinder to the cropped trunk cluster. An additional ellipse model is generated using least-squares on a 2D projection of the points.
3.3.2. Convex Hull Modeling Approach
The use of RANSAC for modeling trees as cylinders to estimate DBH is common in the literature [
6,
7]. An advantage of this approach is the ability to extrapolate DBH from partially represented tree trunks, a common occurrence since optimal scene coverage is often not possible in complex forest terrain. These papers consider forests with uniform, cylindrical tree structure and an absence of near-ground trunk foliage, rendering their cylinder-approach as a viable modeling method with impressive accuracy. A downside of this method is that for well-represented trunks, a cylinder model is prone to underfitting the true trunk diameter. This is even more prevalent for tree species with deeply furrowed, irregular bark texture. Another limitation of this method is the inability to model irregular, bowed trunk shapes.
We propose a modeling approach that considers tree point clouds as stacks of convex-hull slices, as seen in
Figure 3. We relax the morphological assumption of cylinder-modeling methods, which opens the possibility to model highly irregular trunk shapes. The trunk is vertically partitioned into 20 cm thick slices. Each slice is extracted and rotated to be collinear with the z-axis. By manipulating each slice independently, our method accounts for skewed, contorted trunk structures commonly found in non-coniferous forests. A 2D xy projection of the points is used to fit a convex-hull around the surface of the cloud, which emulates manual DBH measurement via girth tape. To deal with noisy points, we introduce a layer of DBSCAN that removes low-density regions. DBH is estimated by considering the slice at 1.3 m above the ground.
We take the maximum value across LS, RANSAC, and convex-hull methods as the final diameter to account for partial trunk cases. DBSCAN parameters control the maximum distance of points to be considered in a neighborhood, and the minimum point count (minPts) within that region to be considered a dense region. We found an range between 1 and 3 cm to have good outlier rejection on 2D trunk clouds. The minPts parameter is dependent on the 2D surface density of the trunk cloud; we observed successful filtering in the range of 5–40 points.
3.4. Study Area and Data Collection
To validate the precision and accuracy of the proposed NeRF-derived convex-hull DBH method, we conducted an experiment in the Forest Ecology Research Plot (FERP) [
25], a globally recognized ForestGEO [
2] site in the Santa Cruz mountains along the central coast of California, USA. This plot spans 16 ha with over 51,000 recorded stem locations and DBH measurements. The forest census is repeated on a 5-year cycle.
The FERP is partitioned into 400 20 × 20 m subplots denoted by E
x_N
y, where
x and
y are the distance in meters from the SW corner of the FERP (37.012416, −122.074833) to the SW corner of the subplot. This study considered sections of forest in subplots E340_N360 and E340_N380. The data collection effort was accomplished over two visits and spans two datasets. LiDAR and IMU data were recorded at 10 Hz and 500 Hz, respectively. NeRF training data was collected using an iPhone 14 camera (1920 × 1440) and NeRFCapture [
15]. Training via nerfstudio [
22] lasted 300K iterations and took 15 min for both datasets. As a reference technique, DBH was taken manually via girth tape by a trained research assistant. The dataset parameters, including field-work duration across methods, are listed in
Table 1.
3.4.1. Dataset A
In the first dataset, the robot was teleoperated around a cluster of 11 coast redwood trees (
Sequoia sempervirens) to generate a SLAM reconstruction. The robot was also navigated through an opening between the trees to recover additional occluded trunk and ground geometry within the cluster. NeRF training imagery was collected by an untrained human, traversing the tree cluster in a similar fashion. The trees had
coverage in the training data since nearby terrain afforded easier maneuvering
Figure 4.
3.4.2. Dataset B
The second dataset is larger by area, spanning the entire E340_N360 subplot which included more challenging terrain and foliage occlusions. This area consisted of 6 coast redwood and 3 Douglas-fir (
Pseudotsuga menziesii) trees. The robot’s obstacle-avoidance mode enabled maneuvering in complex terrain, but at a significantly reduced pace (
Table 1). Stems less than 8 cm in diameter where not considered in this study, as robust DBH estimation was unstable in this size range.
Figure 4.
Forest reconstructions produced by SLAM (bottom row) and NeRF (top row) methods of both datasets. Adjacent plots are data collection trajectories for each reconstruction. In dataset A, we illustrate the effectiveness of segmentation between the ground points (orange) and trees (violet). We use a z-axis color gradient to enhance the visualization of dataset B reconstructions, as this region included more complex ground-level vegetation. The figure also compares a zoomed-in section of a tree trunk. The NeRF reconstruction is approximately 4× denser compared to SLAM, and is of higher surface quality.
Figure 4.
Forest reconstructions produced by SLAM (bottom row) and NeRF (top row) methods of both datasets. Adjacent plots are data collection trajectories for each reconstruction. In dataset A, we illustrate the effectiveness of segmentation between the ground points (orange) and trees (violet). We use a z-axis color gradient to enhance the visualization of dataset B reconstructions, as this region included more complex ground-level vegetation. The figure also compares a zoomed-in section of a tree trunk. The NeRF reconstruction is approximately 4× denser compared to SLAM, and is of higher surface quality.
Table 1.
Study area parameters, quantity of recorded data, and a comparison of reconstruction density and fieldwork duration between NeRF and SLAM approaches. Duration includes field-validation of recorded data, not post processing.
Table 1.
Study area parameters, quantity of recorded data, and a comparison of reconstruction density and fieldwork duration between NeRF and SLAM approaches. Duration includes field-validation of recorded data, not post processing.
| | | | | Fieldwork Duration | Point Count |
---|
Dataset | Area () | Tree Count | LiDAR Frames | Images | NeRF | SLAM | Manual | NeRF | SLAM |
---|
A | 140 | 11 | 2172 | 166 | 5 min | 30 min | 45 min | 2.69 M | 704 K |
B | 400 | 9 | 7498 | 847 | 8 min | 40 min | 52 min | 26.87 M | 7.83 M |
4. Results
We assess the accuracy of DBH estimation for each dataset independently and combined, using bias, which gives an idea of over/under estimation-trends), root mean squared error (RMSE), and standard deviation, as these are commonly referenced metrics in this domain. These are defined as:
where
and
are the estimated and reference diameters across
n estimations. We also provide relative RMSE which is obtained by dividing the RMSE from (3) by the mean of the reference diameters.
The low bias values achieved with the NeRF convex-hull approach (−0.28 cm in Dataset A and −0.86 cm in Dataset B;
Table 2) indicate minimal systematic error, compared to the significant under-estimation trends observed in RANSAC/LS (−4.35 cm and −4.59 cm, respectively;
Table 2). Furthermore, the RMSE results, with a best-case value of 1.26 cm and an average of 1.68 cm across datasets, align well with the low bias values observed, demonstrating the reliability and consistency of our method.
The superior performance of the NeRF convex-hull method can be attributed to its ability to accurately model complex and irregular trunk geometries, which are prevalent in natural forests. In contrast, RANSAC’s reliance on predefined geometric primitives, such as cylinders and ellipses, leads to significant underfitting, particularly for trees with non-uniform diameters or irregular bark structures (
Figure 5).
Dataset B presented additional challenges due to complex terrain and darker light conditions, which could have impacted the quality of the NeRF reconstruction. Despite these challenges, the convex-hull method maintained robust performance, with only a marginal increase in RMSE from 1.26 cm in Dataset A to 2.09 cm in Dataset B.
The NeRF reconstructions were consistently 3–4× more dense than the SLAM reconstructions (
Table 1). The sparsity associated with LIOSAM is due to the nature of points being registered by laser pulse returns that have a resolution of 262 k points (128 × 2048) per LiDAR frame, a physical limitation of the hardware. NeRFs are capable of higher-density reconstruction since the geometry is rendered by sampling the learned color-ray space and filtering out low-density sections to only represent surfaces. This increased point density directly translates to finer surface representations and improved DBH estimation accuracy. This advantage is particularly pronounced in forests with intricate geometries, where the sparse point clouds generated by LIOSAM often fail to capture key structural details. By leveraging the ability of NeRFs to segment and filter high-density regions, our approach overcomes these limitations, resulting in more precise trunk modeling and DBH estimation.
5. Discussion
In comparison to the existing body of literature on automated DBH estimation, several studies provide a useful starting point for benchmarking and evaluation. Liang et al. [
26] report a RMSE of 0.82 cm (4.21%) for automated DBH measurements compared to manual measurements obtained from TLS data. However, their analysis does not include a comparison of the automated estimates with ground truth stem measurements obtained using girth tape. Pierzchała et al. [
6] achieve a RMSE of 2.38 cm (9%) compared to manual ground truth measurements using a mobile robot and LiDAR-inertial SLAM. Freißmuth et al. [
7], using a similar MLS approach, report 1.93 cm RMSE. All of these papers relied on LiDAR-based reconstruction and applied some form of circle-fitting in their tree modeling approaches. Additionally, the trees examined in these studies were part of human-made forest stands or naturally occurring, spatially uniform forests characterized by consistent tree structure.
In contrast to these studies, we investigated the potential of NeRF forest reconstruction combined with hull-based trunk modeling for DBH estimation in challenging terrains, featuring irregular tree structures and densely clustered tree groves. A more rigorous comparison requires evaluating methods on the same datasets. However, the datasets used in the other studies have not been made publicly available. To address this limitation and facilitate future research, we openly source our datasets for benchmarking and comparison.
The forest environment consists of harsh lighting conditions that add challenges to the use of photometric methods such as NeRFs. The dark understory created by dense forest canopy requires appropriate exposure control; long exposure times can lead to blurry images that are unusable for reconstruction purposes. A potential solution is offered by RawNeRF [
27], which enables reconstruction in near-darkness environments.
The NeRF method’s impressive speed-up of fieldwork time comes with the challenge of reconstructing and modeling smaller stems and complex foliage. Without optimal camera coverage, this geometry is poorly represented by the NeRF. Additionally, the filtering methods used by TreeTool need to be developed to support smaller stems and foliage. One potential avenue for improved clustering and filtering performance is to use the color of points provided by NeRF representation to aid in complex branch and foliage segmentation.
We show that convex-hull modeling is an improvement over cylinder approaches for measuring tree diameters when well-represented tree clouds are available. In practice, the presence of ground-level vegetation introduces significant occlusions in the recovered geometry. A potential solution for partial clouds could be to interpolate the cross-section from the set of stacked convex-hulls at known spacing along its height. This same slice-based modeling could be used to extract additional science measurements beyond DBH.
Parameter selection is still a semi-manual process for TreeTool. Not all parameters can be derived from density (e.g., terrain morphology dictates ground and trunk segmentation parameters). Robust, automatic parameterization can enable real-time DBH estimation in various rugged settings; further work is needed to understand the relationship between the parameter selection process and the tree-species composition of the environment.
Large-scale adoption of the methods presented in this work will require further development to support exploration volumes on the order of 100–1000
. The NeRF scene reconstruction pipeline and corresponding computational resources described in
Section 3.2 are designed to train on relatively small datasets (100–1000 images). Block-NeRF [
28] provides a motivating solution to this issue by decomposing the scene into individually trained NeRFs with the ability to seamlessly align NeRFs together to reconstruct city-scale environments. The authors demonstrate the largest neural scene representation made to date, comprised of over 2.8 million training images reconstructing the neighborhoods of San Francisco, CA.
6. Conclusions
In summary, we present a field study exploring the benefits of NeRFs for DBH estimation in mixed-evergreen redwood forest. We consider an MLS comparison using LiDAR-inertial SLAM hosted on a quadruped robot. In addition, we propose a convex-hull DBH modeling technique that considerably outperformed common cylinder-fitting approaches by 3–4×. In a small-scale experiment, NeRF reconstructions made using mobile phone data outperformed SLAM in terms of DBH estimation accuracy (2.81% RMSE), at a 20× cost reduction and 5× less field-work time. In terms of relative DBH estimation accuracy, the proposed integration of NeRF with our enhanced convex-hull trunk modeling approach surpasses the performance reported in several published forest mapping studies [
6,
7,
13,
26].
While additional development is needed for autonomous ecological assessment in wilderness settings, this paper motivates the ability for rapid forest data collection using commodity mobile phone hardware. This drastic increase in accessibility has the potential of furthering community engagement, and increasing the volume of globally mapped forest terrain.