4.1. Field Data
The structural properties of the researched forest area are similar to those of managed woodlands in Austria. This is not surprising, as the study area incorporates forests that were managed until 2001. More than 85% of the trees found were European Spruce or Common Beech. On average, managed forests in Austria are also dominated by European Spruce (48%) and Common Beech (11%) [
51]. With 11%, the share of Common Beech is significantly lower in the managed forests than in the study area [
51].
The average live tree AGB derived from the in situ measurements amounts to 313.421 ± 44.507 m
3 ha
−1. Considering the altitude of 700 m and the geomorphological structure of the study area, this estimation is also in line with the national average for managed forests with 351 ± 3.3 m
3 ha
−1 [
51]. With 250 to 315 tons of tree biomass at a height of 800 to 900 m above sea level, Duduman et al. [
52] came to a similar result in a comparable study area in Romania. The differences in biomass can be explained by the variances in altitude and forest structure, as the altitude of the habitat strongly influences the amount of live tree biomass in alpine regions [
52,
53].
Based on the measured live tree biomass, the average carbon storage capacity per hectare amounts to 371.423 ± 51.106 t of CO
2. This value is in line with the methodology and the conclusions stated in Austria’s Inventory Report 2023 [
43].
4.2. Land Cover Classification
Adding the DSM to the orthomosaic improved the overall performance of the classification by 6%. This is a bigger improvement than Schiefer et al. [
35] documented in their study, in which they focused on the classification of individual tree species. Al-Najjar et al. [
54] stated that the combination of RGB imagery and height information (e.g., DSM) improves the accuracy of vegetation classification by up to 1.8% compared to stratifications based on an orthomosaic.
Combining the orthomosaic and the NDVI indices improved the overall accuracy by 2.3%. Compared to the performance of the composite consisting of the orthomosaic and the DSM, this is a minor enhancement. A study conducted by Daryaei et al. [
55] came to the conclusion that high-resolution RGB imagery combined with canopy height information can detect woodlands with a high overall accuracy without Sentinel-2 datasets. This interpretation is concurrent with the findings in this study, although the present study did not incorporate a full canopy height model (CHM).
Adding the NDVI values further enhanced the overall accuracy of the third composite (orthomosaic + DSM) by 0.8%. Daryaei et al. [
55] managed to improve the classification accuracy of Sentinel-2 datasets by around 2% using UAV-based datasets. Although the combination of RGB and multispectral datasets did not significantly enhance the overall accuracy, Daryaei et al. [
55] also concluded that the combination of high-resolution RGB imagery and vegetational indices based on multispectral satellite data is important for the robustness of the classification of different vegetation classes and tree species.
Today, a wide variety of vegetational indices are in use [
15]. For example, the Green Normalized Vegetation Index (GNDVI), NDVI2, the Canopy Chlorophyll Content Index (CCCI), and the Green Leaf Index (GLI) are utilized to evaluate different vegetational parameters [
15,
23]. As the focus of this study was not the validation of the most meaningful vegetational index, we used the NDVI index because it is often described as robust and applicable for many different use cases and scenarios [
16,
20,
23,
56,
57]. Nasiri et al. [
23] proved in their study that the NDVI indices, partially derived from the Sentinel-2 datasets (Band 8), are the most important indicators for detecting forest canopy cover. Following work should potentially concentrate on evaluating the influence of the different vegetational indices on the results of the carbon estimation.
With an overall accuracy of 80.8% and a Kappa value of 0.743, the composite consisting of the orthomosaic, NDVI indices, and DSM achieved the best result of all four datasets. The results are comparable to those from Heuschmidt et al. [
58], who classified cork oak woodlands with an accuracy of 79.5% using images captured by a UAV fitted with an RGB sensor. Another study [
59] also achieved 80% classification accuracy with multitemporal airborne RGB images. With 90%, Zhou et al. [
60] managed to score a higher overall accuracy for the classification of different vegetation classes using RGB UAV imagery than the methodology described herein. A study conducted by Schiefer et al. [
35] classified different tree species with an accuracy of 89% using comparable RGB datasets. Schiefer et al. [
35] mentioned that the classification accuracies of different studies cannot be directly compared, as various machine learning algorithms and approaches are applied. Nevertheless, most studies using RGB images achieved an accuracy of about 80 to 90% [
35,
58,
59,
60]. Thus, the proposed methodology delivered a result that lies within the expected range of accuracy.
Like other studies before, our research has proven that the combination of Sentinel-2 imagery and UAV-derived data is a viable option for detecting and monitoring forest parameters [
38,
55,
61,
62]. However, the classification generated in this study does not reach the scientifically accepted total accuracy threshold of 85% and therefore cannot be recommended without further clarification [
63,
64,
65]. As Foody [
64] points out, this limit might be too harsh in some cases. Distinguishing different types of vegetation especially can be very challenging without high-resolution multispectral imagery [
66]. For example, Ayhan et al. [
66] suggest that an overall classification accuracy of about 78% for a high-resolution RGB imagery segmentation should be viewed as sufficient.
The deficits in the overall accuracy of the classification are due to the low user’s and producer’s accuracy in the Forest and Grass classes. The Forest category achieved the overall lowest P-Accuracy value, while the Grass category achieved the worst U-Accuracy score of all four categories. The low producer’s accuracy for the Forest class means that the classifier produces a lot of false negatives (type 2) [
67,
68,
69]. A low U-Accuracy for the Grass category indicates that the classification is not very reliable and produces numerous errors of commission (type 1) [
68,
69]. As Dash et al. [
70] pointed out, the differences between U-Accuracy and P-Accuracy for a single class can be attributed to similarities in the spectral characteristics of the different categories. This effect seems to be exaggerated by the usage of low-resolution Sentinel-2 images and high-resolution RGB images lacking additional spectral information.
Figure 9 proves that during the classification process, the distinction between the two different types of vegetation is prone to error. For example, the single trees in plot 7 were categorized as grass. The detection of single trees also caused problems in other areas. The single tree standing north of plot 7 was not identified correctly. In plot 8, lighter colored trees were mistaken for grass, while dense and darker shaded vegetation was often wrongly identified as forest. Komárek et al. [
71] stated that the robust classification of similar vegetation types requires high-resolution thermal or multispectral imagery. The study conducted by Furukawa et al. [
21] also came to the same conclusion, but as Oddi et al. [
72] pointed out, acquiring high-resolution multispectral imagery would be costly and would make the methodology more complicated. Hence, an approach using high-resolution multispectral datasets is not suitable for the use case described herein.
The Deadwood and Gravel classes continuously achieved producer’s and user’s accuracies of more than 80% [
73]. Plot 10 in
Figure 9, however, shows that the distinction between Gravel and Deadwood is not error free, either. Some of the standing deadwood was mistaken for gravel. Zielewska-Büttner et al. [
74] pinpointed that the distinction between bare-ground and standing deadwood relying on orthophotos, and DSM is difficult due to the similarities in their spectral characteristics. Using an uncertainty model (UM) for the classification of the classes Bare Ground, Live, Declining, and Dead, Zielewska-Büttner et al. [
74] achieved higher scores for P- and U-Accuracy than this study.
4.3. Carbon Stock Estimation
As the here-described methodology can be described as a highly generalized approach, it also focuses on minimizing the expected errors in detecting land coverage and AGB estimation. These limitations also mean that the here-presented findings may not be applicable to other forest types. Error metrics were chosen that describe the distribution of the different errors for applying the average values for AGB and carbon storage capacity on the broad category Forest. The statistical indicators’ average error, standard deviation, median, skewness, and Pearson kurtosis as well as the Q-Q plot aim to document the dispersal of the estimation errors across the heterogeneous sample plots [
13]. Our findings show that the high variability of structural parameters of diverse forests results in a strong distribution of the approximation error and requires the here-described statistical evaluation to understand the overall estimation error.
With an overall error of about 1%, the carbon storage capacity of the field plots was estimated accurately. The average error per plot for the carbon stock estimation is very low (−0.830 t of CO
2), while the calculated standard error (SE) of ±4.648 t of CO
2 (5.9%) can be described as substantial. The carbon stock estimation error varies significantly from plot to plot, which can be attributed to the structural parameters, e.g., the sum of AGB and species composition, differing significantly across the study area. The findings of the in situ measurements also show that the studied forest area is very heterogenous. The key figures median, skewness, and Pearson kurtosis indicate that the estimation errors do not describe a perfect Gaussian distribution.
Figure 10 verifies this assumption and implies that the error distribution only approximates a standard distribution and varies significantly with the structural parameters of the corresponding sample plot.
For most plots, the carbon storage capacity was slightly overestimated. Only the carbon stock of the densely wooded plots 6 and 8 was significantly underestimated. This discrepancy in the error distribution is the main reason for the low overall estimation error. The heterogeneity of the studied forest area makes the estimation of the carbon storage capacity difficult and requires profound knowledge about the existing live tree biomass. A comparable study also came to the conclusion that precise in situ measurements are critical for the initial calibration phase of the AGB models for accurately monitoring unmanaged and old growth forests [
11].
Since the average carbon storage capacity is assigned to the broad vegetation class Forest per unit area (m
2), the estimated average value plays a critical role in the estimation. If the in situ data is sufficiently representing the structural complexity of the study area, the proposed methodology proves to be capable of achieving statistically significant outcomes. While the proposed method works well for the surveyed area, it requires meaningful validation data. More complex or detailed scenarios may require a different approach if fewer field and validation data are available. For example, Fernandes et al. [
16] used multispectral images to categorize individual tree species to estimate live tree AGB and carbon stock of riparian woodlands. As Austria’s National Inventory Report [
43] proposes, a distinction between coniferous and deciduous species allows for more robust carbon stock estimations. This may be necessary for fully automated approaches, as unmanaged forests are dynamic ecosystems that consist of a wide variety of habitats.
4.4. Cost and Time Expenditure
The methodology described herein focuses on reducing expenses while producing meaningful results. However, the costs of regular ArcGIS Pro and Metashape licenses are substantial [
75,
76]. In this case, Metashape and ArcGIS Pro were used to ensure the compatibility of the proposed approach with the tools that were already in use for managing the wilderness area. Compared to the prices of LiDAR [
77] and multispectral UAVs [
78], the cost for this approach is significantly lower.
Table 8 lists the expenses of the study.
To reduce expenditures for the necessary software products, free-of-charge open-source software, such as QGIS or OpenDroneMap, can be used for the methodology described herein [
82,
83]. Using open-source software would also improve the general availability of the proposed approached and potentially lead to an extensive and interoperable database regarding near-natural forests [
84].
The cost of the hardware used for the evaluation and the number of working hours were also not included, as these values are highly dependent on the size of the study area and the applied field data collection protocol. Because the efficiency of the in situ measurements plays a key part in this methodology, a generalized approach is necessary. Winter et al. [
34] propose a model to moderately harmonize European NFIs, which could be integrated into the here-proposed methodology.