A New Forest Growing Stock Volume Estimation Model Based on AdaBoost and Random Forest Model

Wang, Xiaorui; Zhang, Chao; Qiang, Zhenping; Xu, Weiheng; Fan, Jinming

doi:10.3390/f15020260

Open AccessArticle

A New Forest Growing Stock Volume Estimation Model Based on AdaBoost and Random Forest Model

by

Xiaorui Wang

¹,

Chao Zhang

^2,*,

Zhenping Qiang

¹

,

Weiheng Xu

¹

and

Jinming Fan

²

¹

College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming 650224, China

²

College of Forestry, Southwest Forestry University, Kunming 650224, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(2), 260; https://doi.org/10.3390/f15020260

Submission received: 14 January 2024 / Revised: 25 January 2024 / Accepted: 27 January 2024 / Published: 29 January 2024

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Forest growing stock volume is a crucial indicator for assessing forest resources. However, contemporary machine learning models used in estimating forest growing stock volume often exhibit fluctuating precision and are confined to specific tree species, lacking universality. This limitation impedes their capacity to provide comprehensive forest survey services. This study designed a novel model for predicting forest growing stock volume named RF-Adaboost. The model represented the inaugural application of the Adaboost algorithm in estimating forest growing stock volume. Additionally, the authors innovatively refined the Adaboost algorithm by integrating Random Forest as its weak learner. To substantiate the model’s effectiveness, the authors designed three data combination schemes at different scales and conducted regression estimation using the RF-Adaboost model, traditional Random Forest, and Adaboost models, respectively. The results indicated that the RF-Adaboost model consistently outperforms others across various data schemes. Furthermore, utilizing a combined data scheme of remote sensing and Continuous Forest Inventory, the RF-Adaboost model demonstrated optimal performance in estimating forest growing stock volume (R² = 0.81, RMSE = 7.08 m³/site, MAE = 3.36 m³, MAPE = 8%). Finally, the RF-Adaboost model exhibits greater universality, eliminating the need for strict differentiation between tree species. This research presented an efficient and cost-effective approach to estimate forest growing stock, addressing the challenges associated with conventional survey methods.

Keywords:

forest growing stock volume; multi-source data; RF-Adaboost model; model comparison

1. Introduction

Forest growing stock volume serves as a fundamental data source for estimating forest biomass and carbon sequestration [1,2]. It is a vital indicator for assessing forest quality and represents a key parameter reflecting the proficiency in forest management and resource management [3].

A significant method for assessing forest growing stock volume involves the Continuous Forest Inventory [4], and in China, this is the most reliable means of forest resource inventory. This process includes preliminary sampling based on a kilometer grid, considering the distribution of forest resources within each province and the topographical conditions. Clear precision requirements for sampling are established in advance [5,6,7]. Moreover, the sample plots undergo periodic reviews every 5 years, with fixed sample plots expected to maintain a reset rate of over 98% and fixed sample trees, a reset rate of over 95% [8]. It is evident that the traditional forest growing stock survey method is characterized by objectivity and precision. However, it does come with certain drawbacks, including lengthy survey periods, challenging fieldwork, and high investigation costs.

The rapid advancement of remote sensing technology has prompted a shift in forest growing stock surveys, moving away from traditional manual ground surveys towards remote-sensing-based estimation methods [9,10,11]. This approach involves the utilization of image-scanning equipment to capture remote sensing data within the study area [12]. Additionally, Geographic Information System (GIS) technology is employed to gather terrain-related factors [13], compute various feature parameters such as forest type information and topographic factors [14], and construct forest growing stock estimation models using both linear and nonlinear modeling techniques [15]. However, it is noteworthy that the precision of these models, based on the values of statistical indices, is not very high.

Since the 1980s, numerous researchers have dedicated substantial efforts to exploring the connection between optical remote sensing data and forest growing stock [11,16]. Through their investigations, it has been discerned that the reflectance values from the red (TM3) and near-infrared (TM4) bands of Landsat TM data can be combined to form a vegetation index [17], allowing for the estimation of community characteristics. Moreover, forest growing stock can be estimated by establishing a regression relationship between ground survey data and spectral values [18,19]. This relationship was subsequently extended to include the AVHRR1 and AVHRR2 bands of NOAA/AVHRR data [20]. As research progressed, it became evident that the reflectance values from SPOT and Landsat TM showed a negative correlation with forest wood volume, especially in the near-infrared bands [21]. Building upon this insight, some scholars delved into the extraction of texture features from Landsat TM images and conducted correlation analyses with forests of varying ages to derive the spectral change characteristics associated with different forest ages. Consequently, they were able to estimate forest growing stock more effectively [22]. However, this approach can only estimate the growing stock volume for a specific tree species.

Indeed, optical remote sensing encounters challenges in capturing forest vertical structural characteristics and is susceptible to cloud cover [23]. Relying on a single remote sensing data source often falls short in accurately estimating forest growing stock. Consequently, the approach of integrating multiple data sources for forest growing stock estimation has emerged as a solution that leverages their complementary strengths, leading to enhanced estimation precision [24]. In the early 21st century, scholars delved into remote sensing estimation models that made use of multi-source data [25]. By combining remote sensing data with terrain factors and applying a linear regression model, they achieved a significant enhancement in the precision of forest growing stock estimation [26]. Over the past decade, as remote sensing technology has continued to advance, high-resolution image data, such as ground laser scanning, hyperspectral imagery, and unmanned aerial vehicle (UAV) data, have gained widespread adoption and experimentation [27]. These high-resolution datasets provide more precise information on terrain, ground features, and land cover types. By harnessing multi-source high-resolution images in conjunction with machine learning models, scholars have been able to achieve even greater precision in forest growing stock estimation [28]. This approach indeed improves the precision of forest growing stock volume estimation, but obtaining hyperspectral and high-resolution imagery is challenging and unsuitable for large-scale geographic analysis [29].

To enhance the precision of forest growing stock volume estimation, it is a critical focus for forest growing stock volume researchers to identify more precise machine learning models [30,31,32,33]. These models commonly can be categorized into two main groups: parametric models and non-parametric models [34]. Parametric models generally assume that the data follow a specific distribution, which can be characterized by certain parameters, which form the basis of the model construction. Parametric models can be built using both linear and nonlinear approaches [35]. They are typically simple and straightforward to explain, but there is a risk of underfitting due to their inherent assumptions. In contrast, non-parametric models are constructed by fitting the training data without imposing strict constraints on the form of the objective function [36]. Non-parametric models tend to provide a good fit to the data but may have complex and less interpretable structures [32]. Among the machine learning models, the random forest model stands out as one of the most widely used and highly accurate models in forest growing stock volume estimation [37].

In recent years, an improved version of the boosting algorithm called Adaboost (Adaptive Boosting) has emerged, employing forward stagewise additive modeling to construct an ensemble model [38]. During each iteration, AdaBoost superimposes a base classifier onto the model, focusing on the error between the model’s predictions and the actual label values. This incremental process aims to gradually reduce the model’s deviation from the true values. AdaBoost achieves this by optimizing the weights assigned to the samples. It increases the weights of samples misclassified by the previous base classifier and decreases the weights of correctly classified samples. The subsequent base classifier is then trained with these updated weights. In each iteration, a new weak classifier is added to the ensemble, and the final strong classifier is not determined until either a predetermined sufficiently low error rate is achieved or a specified maximum number of iterations is reached. The AdaBoost algorithm is known for significantly improving prediction precision, and its performance is particularly enhanced when the weak classifiers used within it have higher precision [39]. However, the Adaboost algorithm has not yet been applied to forest growing stock volume estimation.

Considering the superiority of the Adaboost algorithm and the role of multi-source data in forest growing stock volume estimation, we employed various data sources, including Landsat remote sensing data, Digital Elevation Model (DEM), and Continuous Forest Inventory Data. From these datasets, we extracted vegetation indices, elevation, and selected survey factors as model features. We built an AdaBoost model with Random Forest as weak learners for estimating growing stock volume in the study area. Additionally, we established Random Forest and AdaBoost models to estimate the forest growing stock volume. Finally, we compared the three models based on different data schemes. Ultimately, we observed that the Adaboost model consistently outperformed the others and demonstrated universality without the need for specific tree species differentiation.

2. Materials and Methods

2.1. Overview of the Research Area

Yunnan Province is located in the southwest of China. It spans from approximately 21°8′~29°15′ N and 97°31′~106°11′ E, with a maximum east–west width of 864.9 km and a maximum north–south length of 990 km (shown in Figure 1). The total land area covers 39.41 million km². Yunnan Province features a mountainous plateau terrain and has a subtropical plateau monsoon climate. The forest growing stock volume reached 20.67 billion m³, and the forest coverage rate reached 65.04% [40].

2.2. Research Data

2.2.1. Landsat Data

In this study, optical remote sensing data consisting of nine-scene images generated by Landsat 8 on 1 January 2017 and 31 December 2017 were employed. These Landsat 8 data were sourced from Google Earth Engine (https://code.earthengine.google.com/, accessed on 5 September 2023 (LANDSAT/LC08/C01/T1)). It is worth noting that the dataset LANDSAT/LC08/C01/T1 has undergone meticulous atmospheric correction and orthophoto correction, ensuring its reliability and favorable radiometric properties for our analysis.

2.2.2. Ground Data

The ground survey data include both Digital Elevation Model (DEM) data and the National Continuous Forest Inventory Data for Yunnan in 2017. The DEM data, with a resolution of 30 m and utilizing the World Geodetic System 1984 (WGS84), were acquired from Google Earth Engine (https://code.earthengine.google.com/, accessed on 5 September 2023 (SRTM Digital Elevation Data Version 4)). This DEM dataset has undergone pre-processing steps, including coordinate system conversion, tessellation, and cropping for our analysis.

A total of 1282 sample sites were selected from the Continuous Forest Inventory Data of Yunnan Province in 2017. These sites were selected based on three criteria that align with the conditions of forest stands:

Sample sites with forest land classification.
Sample sites that meet the initiation criteria (average diameter at breast height greater than 5 cm).
Sample sites from Landsat 8 images where vegetation indices can be accurately extracted.

Unlike other studies that focus on predicting a specific tree species, our study classifies these sites into seven categories based on the “tree species structure” attribute in the Continuous Forest Inventory including pure coniferous forests, relatively pure coniferous forests, mixed coniferous forests, pure broad-leaved forests, relatively pure broad-leaved forests, mixed broad-leaved forests, and coniferous-broad-leaved mixed forests. This categorization allowed our model to be more universal.

2.3. Feature Extraction

2.3.1. The Independent Variable Factors from Landsat 8 Image

Five original factors (band2, band3, band4, band5, band6) from Landsat 8 images were utilized to compute 12 derived vegetation indices (as shown in Table 1).

2.3.2. The Independent Variable Factors from Ground Data

From the National Continuous Forest Inventory Data, three independent variables were included in the research: crown density (CD), average diameter at breast height (AD), and tree species structure (TS). Among these, tree species structure is the classification factor, and the data type used in the model is “Object”. Additionally, we extracted the independent variable of elevation (ELE) from DEM data.

2.3.3. Data Preprocessing

Combining the factors from Section 2.2.1 to Section 2.2.2, there are a total of 12 Vegetation Index features and 4 characteristic variables (as shown in Table 2).

In ENVI 5.3, the 12 Vegetation Index values (as shown in Table 1) derived from Landsat 8 were computed using the band math function. Subsequently, in ArcGIS 10.4, DEM data, Vegetation Index values, and characteristic variables were extracted using the multi-values extraction to points function. Finally, all 1282 sample sites, including the 16 model features, were consolidated into a single database. The data were then randomly divided into a test set and a training set in a 2:8 ratio.

2.3.4. Feature Schemes

Three data schemes, denoted as A, B, and C (as shown in Table 3), have been designed based on various data sources. These schemes include 12 Vegetation Index features, 3 ground survey factors, and DEM data.

2.4. Methods

2.4.1. Random Forest Model (RF)

Random Forest [53] is an Ensemble Learning algorithm, belonging to the bagging type, and it is based on the decision tree algorithm. The fundamental concept of the random forest model is as follows: First, k samples are extracted from the original training set using the bootstrap sampling method, and the sample size matches that of the original training set. Second, a decision tree model is established for each sample, resulting in k classification outcomes. Finally, the ultimate prediction result is determined by aggregating or averaging the k-classification outcomes. The Random Forest model is renowned for its high precision and strong generalization performance. The “Random” component imparts the ability to combat overfitting, while the “Forest” aspect enhances precision.

2.4.2. Adaptive Boosting Model (AdaBoost)

AdaBoost (Adaptive Boosting), classified under the Ensemble Methods’ Boosting category [38], is recognized for its remarkable detection rate and robustness against overfitting. Demonstrated as an effective and practical algorithm within the Boosting framework, AdaBoost operates on the principle of training weak classifiers through iterative adjustment of sample weights. Post-training, the weight of weak classifiers with low classification error rates is increased, while the weight of those with high error rates is reduced. The predicted sample values are then obtained through a weighted summation of these trained weak classifiers.

2.4.3. Adaptive Boosting Based on Random Forest Model (RF-Adaboost)

The Random Forest model is a bagging algorithm based on decision trees, but it can be prone to overfitting when dealing with noisy features. Additionally, when there are too many features, the efficiency and precision of the model can decrease [54].

On the other hand, the AdaBoost model is an Adaptive Boosting algorithm that combines different weak learners. It addresses different loss problems within these weak learners by assigning varying weight values to each of them, achieving stepwise optimization. The final prediction result is obtained by combining the median of the weights of the basic models. The AdaBoost model has fewer parameters, takes into account the weight of each classifier, and offers high precision [55].

This study leveraged the strengths of both the Random Forest algorithm and the AdaBoost model, introducing an AdaBoost model with Random Forest as a weak learner. This novel approach led to enhanced precision in predicting forest growing stock. The specific advantages include the following:

Updating Sample Weights:

Traditional boosting models tend to overlook the presence of noise within the samples. During the training process of weak learners, no corrections are made to the samples, and the noise within them impacts all the weak learners, ultimately leading to model overfitting. The Adaboost algorithm addresses this concern by altering the data distribution. It determines the weight of each sample based on the prediction precision of each sample after each training iteration and the overall precision of the previous predictions. In essence, the weight of a sample is inversely proportional to its precision. The newly weighted data are then passed on to the next layer of weak learners for training. Finally, the results of each weak learner from each iteration are amalgamated to create the ultimate decision model.

2.: Updating Weak Learner Weights:

In contrast to traditional boosting models that do not assign varying weights to different weak learners, Adaboost introduces a crucial refinement. Conventional models merely engage in repetitive learning from the same samples, and the final prediction outcome hinges on the consensus of these weak learners. Consequently, the overall model performance is significantly influenced by the learning precision of these weak learners. However, Adaboost takes a different approach. After each training round, the Adaboost model evaluates the error rate of each weak learner. It then increases the weight allocated to weak learners with a lower error rate, thereby granting them a more influential role in determining the final prediction outcomes. Conversely, weak learners with a higher error rate have their weights reduced, diminishing their impact on the final predictions.

3.: Calculating the Feature Importance:

Using the Random Forest model as the base learners (

h_{t} (x)

) offers a dual advantage. It not only enables the assignment of distinct weights to weak learners but also facilitates the automated calculation of feature importance. This is achieved through metrics like the Gini Index or out-of-bagging data error rate, which gauge the contribution of each feature to individual trees within the ensemble. Subsequently, the average contribution across all trees is computed, yielding the feature’s overall importance score.

Suppose the training dataset

D = {(x_{1}, y_{1}), (x_{1}, y_{1}), \dots, (x_{N}, y_{N})}

, where

x_{i} \in R^{d}

,

y_{i}

is the true value of forest growing stock volume corresponding to the feature set

x_{i}

, sourced from the National Continuous Forest Inventory Data. The model calculation process unfolds as follows:

To begin, we initialize the weight distribution of the dataset:

D_{1} = (w_{11}, w_{12}, \dots w_{1 N})

, where

w_{1 i} = \frac{1}{N}

, signifying that the weight value for all data is set to 1/N during the initial iteration. Here, N represents the total number of samples.

Next, we proceed by training the base learner

h_{t} (x)

using the weighted dataset and then computing the regression error rate for the training dataset:

ϵ_{t} = \sum_{i = 1}^{N} w_{t i} ϵ_{t i}

(1)

ϵ_{t i} = \frac{|y_{i} - h_{t} (x_{i})|}{E_{t}}

(2)

where

w_{t i}

is the data weight and

ϵ_{t i}

is the relative error of the sample;

E_{t}

represents the maximum absolute error between all predictions

h_{t} (x)

and the true values

y

in the dataset during the t-th iteration:

E_{t} = m a x (|y_{i} - h_{t} (x_{i})|)

(3)

Thirdly, the coefficient

α_{t}

of

h_{t} (x)

is calculated as follows:

α_{t} = \frac{ϵ_{t}}{1 - ϵ_{t}}

(4)

Fourthly, the weights of the dataset are updated as follows:

w_{t + 1} (x) = \frac{w_{t} (x)}{Z_{t}} \times α_{t}^{1 - ε_{t i}}

(5)

where

Z_{t}

is a normalized factor:

Z_{t} = \sum_{i = 1}^{N} w_{t i} α_{t}^{1 - ε_{t i}}

(6)

Finally, the median of the predictions from each base learner is used as the final result.

2.4.4. Model Parameters

In this study, the Random Forest model, Adaboost model, and RF-Adaboost model were trained. The specific parameters for each model can be found in Table 4.

2.5. Model Performance Indicators

We used the “gridsearchCV” method to identify the optimal model parameters. Performance metrics encompass the coefficient of determination (R-squared, R²), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), which are computed using Formulas (7)–(10).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(7)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(8)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(9)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} \frac{|\hat{y_{i}} - y_{i}|}{y_{i}} \times 100 %

(10)

where

y_{i}

is the true value,

{\hat{y}}_{i}

is the predicted value, and

\bar{y}

is the average of the target value.

3. Results

3.1. Analysis for Data Scheme A

In data scheme A, 13 factors (shown in Table 3) serve as independent variables with forest growing stock volume per sample site as the dependent variable. The importance of these independent variables varies across different models, as illustrated in Figure 2. In the Random Forest model, ELE holds the highest weight at 0.25, followed by NDI at 0.17, and TVI at 0.10, with the remaining variables showing similar levels of importance. In the Adaboost model, ELE also holds the highest weight at 0.28, followed by NDI at 0.16, and TVI at 0.12, while the importance levels of other variables are relatively consistent. However, in the RF-Adaboost model, there is a noticeable shift in the distribution of variable importance. The top two features, ELE (0.23) and NDI (0.17), remain unchanged, but the importance of other features has shifted. NIRG’s importance now ranks third at 0.09, and the importance of NDWI (0.06) and RI (0.03) has significantly increased. Meanwhile, the importance of ELE and TVI (0.05) has decreased. From the perspective of the variance of feature importance, the feature importance variance in the Random Forest model is 0.004, in the Adaboost model is 0.005, and in the RF-Adaboost model, the feature importance variance is 0.003. It is evident that the RF-Adaboost model achieves a more balanced distribution of feature importance.

The performance indicators of the models for estimating forest growing stock volume based on Data Scheme A are presented in Table 5. The RF-Adaboost model proposed in this study exhibits superior performance indicators compared to the Random Forest model, with an increase in R-squared (R²) by 7.3% (0.03), a decrease in RMSE by 0.9% (0.07 m³/site), a decrease in MAE by 2.0% (0.1 m³), and a 0.45% reduction in MAPE. In comparison to the Adaboost model, the RF-Adaboost model demonstrates an improvement in R² by 10% (0.04), a reduction in RMSE by 0.5% (0.04 m³/site), a decrease in MAE by 1.6% (0.08 m³), and a 1.15% decrease in MAPE.

3.2. Analysis for Data Scheme B

In Data Scheme B, the 4 factors (shown in Table 3) are independent variables and the growing stock volume per sample site is the dependent variable. The importance of these independent variables varies across different models, as illustrated in Figure 3. It is evident that AD holds the highest significance (0.71 in the Random Forest model, 0.73 in the Adaboost model, and the lowest is 0.65 in RF-Adaboost). Following closely is CD, with values of 0.27 in the Random Forest model, 0.22 in the Adaboost model, and 0.23 in the RF-Adaboost model. The importance of feature ELE has significantly increased in the RF-Adaboost model, with a value of 0.08, surpassing the values in Random Forest (0.01) and Adaboost (0.03). Similarly, the importance of feature TS has also notably increased in the RF-Adaboost model, with a value of 0.02, surpassing the values in Random Forest (0.001) and Adaboost (0.009). From the perspective of the variance of feature importance, the feature importance variance in the Random Forest model is 0.11, in the Adaboost model is 0.11, and in the RF-Adaboost model is 0.08. It is evident that the RF-Adaboost model achieves a more balanced distribution of feature importance.

The performance indicators for models estimating forest growing stock volume using Data Scheme B are presented in Table 6. The RF-Adaboost model proposed in this study exhibits superior performance indicators compared to the Random Forest model, with an increase in R-squared (R²) by 4.1% (0.03), a reduction in RMSE by 8.6% (4.9 m³/site), a decrease in MAE by 7.4% (0.24 m³), and a 0.53% decrease in MAPE. In comparison to the Adaboost model, RF-Adaboost shows an improvement of 5.5% (0.04) in R², a substantial RMSE reduction of 12.2% (0.72 m³/site), a notable MAE reduction of 10.1% (0.34 m³), and an MAPE reduction of 0.47%.

3.3. Analysis for Data Scheme C

In Data Scheme C, we employed 16 factors (as shown in Table 3) as independent variables, with forest growing stock volume per sample site as the dependent variable. The importance of these independent variables is illustrated in Figure 4. Notably, among the three models, AD contributes the most, followed by CD. However, it is noteworthy that, in the RF-Adaboost model, despite the uneven distribution of feature importance, there is a noticeable enhancement in feature importance balance. The RF-Adaboost model demonstrates improved balance with a variance of 0.019, outperforming the Random Forest (0.032) and Adaboost (0.030) models.

The performance indicators of the models used to estimate forest growing stock volume based on Data Scheme C are presented in Table 7. The RF-Adaboost model proposed in this paper exhibits better performance indicators compared to the Random Forest model. It shows an increase in R² by 3.8% (0.03), a decrease in RMSE by 7.7% (0.59 m³/site), MAE reduced by 9.4% (0.35 m³), and a decrease in MAPE by 0.86%. When compared to the Adaboost model, the R² improves by 3.8% (0.03), RMSE decreases by 8.3% (0.64 m³/site), MAE reduces by 11.1% (0.42 m³), and a decrease in MAPE by 0.89%.

Figure 5 provides a comprehensive and intuitive comparison of the performance indicators among the Random Forest, Adaboost, and RF-Adaboost models. It clearly illustrates that RF-Adaboost is the superior model for estimating forest growing stock volume. Additionally, when considering the data sources, Data Scheme C stands out as the most effective scheme. The incorporation of multi-source data enhances model performance.

In greater detail, Figure 6 provides a line chart that illustrates the estimated values (calculated by the models) and the true values (derived from the National Forest Continues Inventory data) for the initial 20 sets. This chart encompasses the three data schemes (A, B, and C) and the three models (Random Forest, Adaboost, and RF-Adaboost).

Experimental results demonstrate that the comprehensive performance metrics of the RF-Adaboost model introduced in this study significantly outperform those of the conventional Random Forest and Adaboost models. Moreover, the RF-Adaboost model exhibits outstanding performance and stability across various datasets. Furthermore, in terms of data sources, the model’s performance indicators based on multi-source data fusion surpass those of the single-source model.

4. Discussion

The primary objective of this study was to propose a universal model for estimating forest growing stock volume, a crucial indicator for evaluating forest quality. The establishment of a predictive model based on continuous inventory data of national forest resources and remote sensing data held particular significance for Yunnan. In Yunnan, where the forest cover was extensive, the terrain was complex, and forestry survey tasks were demanding, such a model became especially important. Therefore, a growing stock volume estimation model based on partial continuous inventory data and remote sensing data could greatly enhance survey efficiency and reduce the risks associated with forestry investigations.

Overall, the findings from our research yield several noteworthy conclusions that deepen our understanding of forest growing stock volume estimation: (1) Our investigation unequivocally demonstrates the exceptional performance of the RF-Adaboost model in estimating forest growing stock volume. Regardless of the data scheme, the advantages of this model are particularly evident when compared to other machine learning models. This adaptability underscores its robustness and reliability, making it a valuable tool for forest resource assessment in varying geographical and environmental contexts. (2) Our research highlights the significant positive impact of incorporating multi-source features on model performance. By amalgamating data from various sources, we not only enhance the predictive precision of the model but also improve its robustness against variations in input data. (3) Compared to other machine learning models, our approach excels in achieving a more balanced consideration of the importance attributed to various features. It delves deeper into understanding the impact of each feature on the accurate estimation of forest growing stock. Our objective is to optimize the utilization of different features, thereby minimizing the model’s dependence on any single feature. (4) RF-Adaboost does not require strict differentiation between different tree species, only distinguishing tree species structures, making it more versatile and universal.

The RF-Adaboost model proposed in this study consistently outperforms traditional Random Forest and Adaboost models across various data schemes, demonstrating superior performance in terms of multiple evaluation metrics. This improvement can be attributed to the inherent limitations of traditional Random Forest models, which rely on a regression algorithm using decision trees with equal weights for each tree. This uniform weighting makes the model vulnerable to the influence of outliers and reduces its universality. Specifically, the traditional Random Forest model excels in estimating the growing stock volume of a specific tree species in a small local area but may falter when applied to different regions for estimating various tree species. To address these limitations, our proposed RF-Adaboost model integrates the Adaboost algorithm, allowing for the assignment of different weights to weak learners and data based on the iteration. This adaptive weighting strategy mitigates the negative impact of outliers, enhancing the model’s precision and overall performance.

In terms of data schemes, it is evident that the combination of ground survey data with remote data provides higher precision for predicting forest growing stock volume compared to data from a single source. Additionally, we observed that the model’s precision is lowest when using only Landsat remote sensing data. This is attributed to the large temporal and spatial scale of remote sensing data, along with issues such as cloud cover, which hinder the accurate real-time reflection of vegetation cover in the monitoring area.

As for features, although there are always some features with extremely high importance and others with very low importance in all models, the RF-Adaboost model excels in balancing the treatment of features compared to other models. It significantly reduces the importance of some features while increasing the importance of others, thereby reducing the model’s reliance on certain features and enhancing the overall balance of features, making the model more stable.

For the purpose of facilitating comparison with other studies, we utilized the coefficient of determination (R²) and the mean absolute percentage error (MAPE%) as comparison metrics. These ratio-based indicators enable meaningful horizontal comparisons across different studies. We selected the most relevant and recent studies for comparison with the content of our research (as shown in Table 8).

Our comprehensive analysis reveals that our RF-Adaboost model, when applied to Data Scheme C, achieves notable improvements in both R² and MAPE compared to previous studies: Our research attains significantly higher model precision (R²) when compared to Mauya (2019) [12], Ruyi Zhou (2018) [33], and Huajian Huang (2022) [56]. This underscores the superior predictive capabilities of our model. Furthermore, our study achieves a substantially lower MAPE than Ruyi Zhou and slightly lower than Huajian Huang. Notably, Mauya and Jingjing Zhou did not report MAPE values, precluding direct comparisons.

Nonetheless, it is worth noting that our study’s R², though impressive, is marginally lower than the remarkable R² of 0.82 reported by Jingjing Zhou (2020) [31]. We employed the Random Forest method as utilized in Jingjing Zhou’s study, using Landsat data to calculate the vegetation indices. In the absence of tree species distinction, the obtained R² is only 0.43, significantly lower than the 0.82 reported in Jingjing Zhou’s study. This divergence may be attributed to three main factors. (1) Variation in remote sensing image precision: The SPOT6 satellite imagery used in Jingjing Zhou’s study has a much higher resolution compared to the Landsat imagery used in our study. (2) Study scale and complexity: Jingjing Zhou’s (2020) [31] study focused on Taizi Mountain in Jingshan County, China, which has a smaller geographic area and exhibits less variability in topographic and climatic conditions. In their study, the range and complexity of these variables were more limited. In contrast, our study, covering the entire Yunnan Province, encompasses a broader and more diverse geographic region, dealing with a multitude of complex factors influencing forest growing stock volume. This diversity may contribute to lower predictive precision. (3) Tree species structure: Jingjing Zhou’s (2020) [31] research primarily centered on massoniana plantations. In contrast, our study categorizes tree species into five major classes: coniferous pure forest, broad-leaved pure forest, coniferous relatively pure forest, broad-leaved relatively pure forest, and mixed needle and broad forest. This added complexity in our study’s tree species structure may have introduced greater variability and reduced predictive precision.

Yangyang Zhou (2023) [57], whose study bears the closest resemblance to ours, also employed remote sensing data and forest inventory data for estimating forest stock volume. They highlighted the optimal performance of the Random Forest model in forest stock volume estimation, achieving an R² of 0.776 when the remote sensing data source was Landsat 8, consistent with our findings (as shown in Table 6). However, our research, with improvements to the Adaboost model, attains an R² of 0.82, demonstrating that the RF-Adaboost model is more effective than the Random Forest model. Intriguingly, when Yangyang Zhou incorporated Sentinel-2 data, the R² reached 0.831, providing a promising direction for our subsequent research: combining the RF-Adaboost model with higher-resolution remote sensing data for stock volume estimation could lead to even more outstanding results.

During the research process, we also identified some issues that warrant further exploration. (1) Although we identified the best-performing model to be the one based on Data Scheme C and the RF-Adaboost model, it is noteworthy that all models exhibited some degree of variability in their predictions, with precision showing slight fluctuations. This variability raises intriguing questions, with one possible explanation being the presence of outliers in the dataset. However, further investigation is needed to conclusively ascertain the cause and nature of this model variability. (2) The RF-Adaboost model boasts a more extensive feature set compared to both the Random Forest and Adaboost models. Nevertheless, we did not undertake the crucial task of feature selection from within the same data source. Consequently, the RF-Adaboost model’s operational efficiency is diminished in comparison to its counterparts. Future research endeavors should explore feature selection methodologies to streamline the model’s feature set, enhancing efficiency without compromising predictive precision.

5. Conclusions

In this study, we utilized Landsat 8 remote sensing data, DEM data, and the Continuous Forest Inventory Data of Yunnan from 2017 to estimate the forest growing stock volume across 1282 sample plots. Three distinct machine learning algorithms were employed for this purpose. The research findings can be summarized as follows: (1) The RF-Adaboost model developed in this study consistently outperformed the Random Forest and Adaboost models across various data structures, demonstrating superior estimation capabilities. (2) The study demonstrated that the precision of forest growing stock volume estimation significantly improves through the utilization of multi-source data fusion, as opposed to relying solely on single-source data. (3) The RF-Adaboost model demonstrates greater universality by classifying all “tree species structures” in the model, rather than specifically estimating individual tree species. Consequently, the model displayed enhanced generalization abilities. These findings emphasize the importance of integrating data from multiple sources and indicate the strong performance of the RF-Adaboost model, especially regarding its ability to generalize and accurately estimate forest growing stock volume.

In practical applications, the model can act as a substitute for specific ground surveys, especially in challenging or hard-to-reach areas. It presents a more cost-effective and efficient method for estimating forest growing stock volume, offering valuable alternatives in particular scenarios. Additionally, although the current model is primarily designed for static growing stock volume estimation, its versatility allows for dynamic estimation as well. This adaptability positions it to play a more substantial role in refining traditional forest management plans.

Author Contributions

Conceptualization, C.Z.; methodology, X.W.; software, J.F.; validation, W.X.; formal analysis, Z.Q.; writing—original draft preparation, X.W.; funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (32160405, 32360387); Joint Special Project for Agriculture of Yunnan Province, China (202301BD070001-238, 202301BD070001-008, 202101BD070001-066); Epartment of Education Scientific Research Fund of Yunnan Province, China (2023J0698).

Data Availability Statement

The datasets analyzed during the current study are available from the Institute of Forestry Survey and Planning, but restrictions apply to the availability of these data, which were obtained from the second author, and so are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Debeljak, M.; Poljanec, A.; Ženko, B. Modelling forest growing stock from inventory data: A data mining approach. Ecol. Indic. 2014, 41, 30–39. [Google Scholar] [CrossRef]
Wang, S.H.; Zhang, M.Z.; Zhao, P.A.; Chen, J.X. Modelling the spatial distribution of forest carbon stocks with artificial neural network based on TM images and forest inventory data. Acta Ecol. Sin. 2011, 31, 998–1008. [Google Scholar]
Hong, W.; Wu, C.Z.; He, D.J. A study on the model of forest resources management based on the artificial neural network. J. Nat. Resour. 1998, 13, 69–72. [Google Scholar]
Tomppo, E.; Gschwantner, T.; Lawrence, M.; McRoberts, R.E.; Gabler, K.; Schadauer, K.; Vidal, C.; Lanz, A.; Ståhl, G.; Cienciala, E. National forest inventories. In Pathways for Common Reporting; European Science Foundation: Vienna, Austria, 2010; pp. 541–553. [Google Scholar]
Chen, Z. A Brief Discussion on the Ninth National Forest Resources Inventory in Yunnan Province. Guizhou For. Sci. Technol. 2018, 46, 61–64. (In Chinese) [Google Scholar]
Zeng, W.S.; Xia, R. Discussion on Statistical Methods for Annual Data Compilation of National Forest Resources Inventory. For. Resour. Manag. 2021, 2, 29–35. (In Chinese) [Google Scholar]
Zeng, W.S.; Huang, G.S.; Dang, Y.F.; Zhi, C.G. Exploration of Sampling Design and Estimation Methods for National Forest Resources Macro Monitoring. For. Resour. Manag. 2016, 3, 1–6. (In Chinese) [Google Scholar]
Margolis, H.A.; Nelson, R.F.; Montesano, P.M.; Beaudoin, A.; Sun, G.; Andersen, H.E.; Wulder, M.A. Combining satellite Lidar, airborne Lidar, and ground plots to estimate the amount and distribution of aboveground biomass in the boreal forest of North America. Can. J. For. Res. 2015, 45, 838–855. [Google Scholar] [CrossRef]
Lu, D.; Chen, Q.; Wang, G.; Liu, L.; Li, G.; Moran, E. A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems. Int. J. Digit. Earth 2014, 9, 63–105. [Google Scholar] [CrossRef]
Wang, K.; Lv, J.; Li, C. Estimation of forest volume based on multi-scale remote sensing image texture features. J. Cent. South Univ. For. Technol. 2017, 37, 6. (In Chinese) [Google Scholar]
Sellers, P.J. Canopy reflectance, phosynthesis, and transpiration. Int. J. Remote Sens. 1985, 6, 1335–1372. [Google Scholar] [CrossRef]
Mauya, E.W.; Koskinen, J.; Tegel, K.; Hämäläinen, J.; Kauranne, T.; Käyhkö, N. Modelling and Predicting the Growing Stock Volume in Small-Scale Plantation Forests of Tanzania Using Multi-Sensor Image Synergy. Forests 2019, 10, 279. [Google Scholar] [CrossRef]
Maselli, F.; Bottai, L.; Chirici, G.; Corona, P.; Marchetti, M.; Travaglini, D. Estimation of forest attributes by integration of field sampling and remotely sensed data under Mediterranean environments. Ital. J. For. Mt. Environ. 2003, 58, 251–263. [Google Scholar]
Tanaka, S.; Takahashi, T.; Nishizono, T.; Kitahara, F.; Saito, H.; Iehara, T.; Kodani, E.; Awaya, Y. Stand Volume Estimation Using the k-NN Technique Combined with Forest Inventory Data, Satellite Image Data and Additional Feature Variables. Remote Sens. 2015, 7, 378–394. [Google Scholar] [CrossRef]
McRoberts, R.E.; Gobakken, T.; Naesset, E. Post-stratified estimation of forest area and growing stock volume using lidar-based stratifications. Remote Sens. Environ. 2012, 125, 157–166. [Google Scholar] [CrossRef]
Fedrigo, M.; Meir, P.; Sheil, D.; Van Heist, M.; Woodhouse, I.H.; Mitchard, E.T. Fusing radar and optical remote sensing for biomass prediction in mountainous tropical forests. In Proceedings of the IGARSS 2013—2013 IEEE International Geoscience and Remote Sensing Symposium, Melbourne, VIC, Australia, 21–26 July 2013. [Google Scholar]
Curran, P.J. Multispectral Remote Sensing for the Estimation of Green Leaf Area Index. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 1983, 309, 257–270. [Google Scholar]
dos Reis, A.A.; Carvalho, M.C.; de Mello, J.M.; Gomide, L.R.; Ferraz Filho, A.C.; Acerbi Junior, F.W. Spatial prediction of basal area and volume in Eucalyptus stands using Landsat TM data: An assessment of prediction methods. New Zealand J. For. Sci. 2018, 48, 1. [Google Scholar] [CrossRef]
Zharko, V.O.; Bartalev, S.A.; Sidorenkov, V.M. Forest growing stock volume estimation using optical remote sensing over snow-covered ground: A case study for Sentinel-2 data and the Russian Southern Taiga region. Remote Sens. Lett. 2020, 11, 677–686. [Google Scholar] [CrossRef]
Kressler, F.P.; Steinnocher, K.T. Detecting land cover changes from NOAA-AVHRR data by using spectral mixture analysis. Int. J. Appl. Earth Obs. Geoinf. 1999, 1, 21–26. [Google Scholar] [CrossRef]
Trotter, C.M.; Dymond, J.R.; Goulding, C.J. Estimation of timber volume in a coniferous plantation forest using Landsat TM. Int. J. Remote Sens. 1997, 18, 2209–2223. [Google Scholar] [CrossRef]
Lu, D.; Batistella, M. Exploring TM image texture and its relationships with biomass estimation in Rondônia, Brazilian Amazon. Acta Amaz. 2005, 35, 249–257. [Google Scholar] [CrossRef]
Jing, R.; Duan, F.; Lu, F.; Zhang, M.; Zhao, W. Cloud removal for optical remote sensing imagery using the SPA-CycleGAN network. J. Appl. Remote Sens. 2022, 16, 034520. [Google Scholar] [CrossRef]
Saarela, S.; Grafström, A.; Ståhl, G.; Kangas, A.; Holopainen, M.; Tuominen, S.; Nordkvist, K.; Hyyppä, J. Model-assisted estimation of growing stock volume using different combinations of LiDAR and Landsat data as auxiliary information. Remote Sens. Environ. 2015, 158, 431–440. [Google Scholar] [CrossRef]
Crammer, K.; Kearns, M.; Wortman, J. Learning from Multiple Sources. J. Mach. Learn. Res. 2008, 9, 1757–1774. [Google Scholar]
Duncanson, L.I.; Niemann, K.O.; Wulder, M.A. Integration of GLAS and Landsat TM data for aboveground biomass estimation. Can. J. Remote Sens. 2010, 36, 129–141. [Google Scholar] [CrossRef]
Puliti, S.; Saarela, S.; Gobakken, T.; Ståhl, G.; Næsset, E. Combining UAV and Sentinel-2 auxiliary data for forest growing stock volume estimation through hierarchical model-based inference. Remote Sens. Environ. 2018, 204, 485–497. [Google Scholar] [CrossRef]
Sánchez-Ruiz, S.; Chiesi, M.; Maselli, F.; Gilabert, M.A. Mapping growing stock at 1-km spatial resolution for Spanish forest areas from ground forest inventory data and GLAS canopy height. In Proceedings of the Earth Resources and Environmental Remote Sensing/GIS Applications VII, Edinburgh, UK, 27–29 September 2016; SPIE: Bellingham, WA, USA, 2016; Volume 10005, pp. 412–419. [Google Scholar]
Sankey, T.; Donager, J.; McVay, J.; Sankey, J.B. UAV lidar and hyperspectral fusion for forest monitoring in the southwestern USA. Remote Sens. Environ. 2017, 195, 30–43. [Google Scholar] [CrossRef]
Vafaei, S.; Soosani, J.; Adeli, K.; Fadaei, H.; Naghavi, H.; Pham, T.D.; Tien Bui, D. Improving Accuracy Estimation of Forest Aboveground Biomass Based on Incorporation of ALOS-2 PALSAR-2 and Sentinel-2A Imagery and Machine Learning: A Case Study of the Hyrcanian Forest Area (Iran). Remote Sens. 2018, 10, 172. [Google Scholar] [CrossRef]
Zhou, J.; Zhou, Z.; Zhao, Q.; Han, Z.; Wang, P.; Xu, J.; Dian, Y. Evaluation of different algorithms for estimating the growing stock volume of Pinus massoniana plantations using spectral and spatial information from a SPOT6 image. Forests 2020, 11, 540. [Google Scholar] [CrossRef]
Esteban, J.; McRoberts, R.E.; Fernández-Landa, A.; Tomé, J.L.; Nӕsset, E. Estimating Forest Volume and Biomass and Their Changes Using Random Forests and Remotely Sensed Data. Remote Sens. 2019, 11, 1944. [Google Scholar] [CrossRef]
Zhou, R.; Wu, D.; Fang, L.; Xu, A.; Lou, X. A Levenberg–Marquardt Backpropagation Neural Network for Predicting Forest Growing Stock Based on the Least-Squares Equation Fitting Parameters. Forests 2018, 9, 757. [Google Scholar] [CrossRef]
Chirici, G.; Barbati, A.; Corona, P.; Marchetti, M.; Travaglini, D.; Maselli, F.; Bertini, R. Non-parametric and parametric methods using satellite images for estimating growing stock volume in alpine and Mediterranean forest ecosystems. Remote Sens. Environ. 2008, 112, 2686–2700. [Google Scholar] [CrossRef]
Jawad, M.; Rafique, A.; Khosa, I.; Ghous, I.; Akhtar, J.; Ali, S.M. Improving disturbance storm time index prediction using linear and nonlinear parametric models: A comprehensive analysis. IEEE Trans. Plasma Sci. 2018, 47, 1429–1444. [Google Scholar] [CrossRef]
Fygenson, M. Modeling and predicting extrapolated probabilities with outlooks. Stat. Sin. 2008, 18, 9–90. [Google Scholar]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Freund, Y. An Adaptive Version of the Boost by Majority Algorithm. Mach. Learn. 2001, 43, 293–318. [Google Scholar] [CrossRef]
Central People’s Government of the People’s Republic of China. Yunnan Forest Coverage Reaches 65.04%. Available online: https://www.gov.cn/xinwen/2021-02/03/content_5584655.htm (accessed on 3 February 2021).
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with Erts. NASA Spec. Publ. 1974, 351, 309. [Google Scholar]
Mcfeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Hunt, E.R.; Rock, B.N. Detection of changes in leaf water content using near- and middle-infrared reflectances. Remote Sens. Environ. 1989, 30, 43–54. [Google Scholar]
Jiang, Z.; Huete, A.R.; Didan, K.; Miura, T. Development of a two-band enhanced vegetation index without a blue band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Zhang, X.Y.; Li, J.F. The derivation of a reflectance model for the estimation of leaf area index using perpendicular vegetation index. Remote Sens. Technol. Appl. 1995, 10, 6. [Google Scholar]
Rondeaux, G.; Steven, M.; Baret, F. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Deering, D.W.; Harlan, J.C.; Rouse, J.W., Jr.; Haas, R.H. Effective Use of Landsat for Range Monitoring and Management—An Example on a Regional Scale. Plenary Meeting; 1977. Available online: https://ntrs.nasa.gov/citations/19770062913 (accessed on 20 September 2023).
Clevers, J. Application of a weighted infrared-red vegetation index for estimating leaf area index by correcting for soil moisture. Remote Sens. Environ. 1989, 29, 25–37. [Google Scholar] [CrossRef]
Xu, H.Q. Fast information extraction of urban built-up land based on the analysis of spectral signature and normalized difference index. Geogr. Res. 2005, 24, 311–320. (In Chinese) [Google Scholar]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
McDade, I.C.; Llewellyn, E.J.; Greer, R.G.H.; Murtagh, D.P. ETON 3: Altitude profiles of the nightglow continuum at green and near-infrared wavelengths. Planet. Space Sci. 1986, 34, 801–810. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Hastie, T.; Rosset, S.; Zhu, J.; Zhu, H. Multi-class adaboost. Stat. Its Interface 2009, 2, 349–360. [Google Scholar] [CrossRef]
Huang, H.J.; Wu, D.S.; Fang, L.M.; Zheng, X. Comparison of Multiple Machine Learning Models for Estimating the Forest Growing Stock in Large-Scale Forests Using Multi-Source Data. Forests 2022, 13, 1471. [Google Scholar] [CrossRef]
Zhou, Y.; Feng, Z. Estimation of Forest Stock Volume Using Sentinel-2 MSI, Landsat 8 OLI Imagery and Forest Inventory Data. Forests 2023, 14, 1345. [Google Scholar] [CrossRef]

Figure 1. Administrative map of the study area.

Figure 2. Importance of the independent variables based on Data Scheme A.

Figure 3. Importance of the independent variables based on Data Scheme B.

Figure 4. Importance of the independent variables based on Data Scheme C.

Figure 5. Comparison of performance indicators generated by the models Random Forest, Adaboost, and RF-Adaboost based on data schemes A, B, and C; A—Data Scheme A; B—Data Scheme B; C—Data Scheme C.

Figure 6. Line relation plot between the true values and predicted values of forest growing stock volume for Random Forest, Adaboost, and RF-Adaboost: (a) based on Data Scheme A; (b) based on Data Scheme B; (c) based on Data Scheme C.

Table 1. Vegetation Index formula.

No.	Vegetation Index	Formula	Reference
1	Normalized Difference Vegetation Index (NDVI)	$N D V I = \frac{N I R - R}{N I R + R}$	[41]
2	Normalized Differential Water body Index (NDWI)	$N D W I = \frac{G - N I R}{G + N I R}$	[42]
3	Ratio Vegetation Index (RVI)	$R V I = \frac{R}{N I R}$	[43]
4	Differential Vegetation Index (DVI)	$D V I = 2.4 \times (N I R - R)$	[44]
5	Perpendicular Vegetation Index (PVI)	$P V I = \frac{N I R - a \times R - b}{1 + a^{2}}$	[45]
6	Renormalized Difference Vegetation Index (RDVI)	$R D V I = \sqrt[2]{N D V I \times D V I}$	[46]
7	Enhanced Vegetation Index (EVI)	$E V I = \frac{2.5 \times (N I R - R)}{N I R + 6 \times R - 7.5 B + 1}$	[47]
8	Transformed Vegetation Index (TVI)	$T V I = \sqrt[2]{N D V I + 0.5}$	[48]
9	Red Vegetation Index (RI)	$R I = \frac{R - G}{R + G}$	[49]
10	Normalized Difference Index (NDI)	$N D I = \frac{N I R - S I R}{N I R + S I R}$	[50]
11	Soil Regulates Vegetation Index (SAVI)	$S A V I = \frac{(N I R - R) \times (1 + L)}{R + N I R + L}$	[51]
12	Near Infrared divided by Green (NIRG)	$N I R G = \frac{N I R}{G}$	[52]

Note: a = 10.489; b = 6.604; L = 0.5; R, red; G, green; B, blue; NIR, near-infrared; SIR, short infrared.

Table 2. Characteristic factors.

No.	Factor Name	Explanation	Source of Data
1–12	Refer to Table 1		Landsat 8
13	CD	Canopy density	Continuous Forest Inventory Data
14	AD	Average diameter at breast height
15	TS	Tree species structure
16	ELE	Elevation	DEM

Table 3. Data schemes.

Data Scheme	Features	Data Source
A	NDVI, NDWI, RI, DVI, PVI, RDVI, EVI, TVI, RVI, NDI, SAVI, NIRG, ELE	Landsat 8, DEM
B	CD, AD, TS, ELE	Continuous Forest Inventory Data, DEM
C	Scheme A and Scheme B	Landsat 8, Continuous Forest Inventory Data, DEM

Table 4. Model parameters.

Parameters	Random Forest	Adaboost	RF-Adaboost
estimators	150	100	100
max_depth	5	N/A	N/A
learning_rate	N/A	0.06	0.06
base_estimator	N/A	Decision tree	Random forest
loss	N/A	Linear	linear

Table 5. Performance indicators based on Data Scheme A.

	Random Forest	Adaboost	RF-Adaboost
R²	0.41	0.40	0.44
RMSE (m³/site)	8.20	8.17	8.13
MAE (m³)	4.92	4.90	4.82
MAPE (%)	20.43	21.13	19.98

Table 6. Performance indicators based on Data Scheme B.

	Random Forest	Adaboost	RF-Adaboost
R²	0.74	0.73	0.77
RMSE (m³/site)	5.68	5.91	5.19
MAE (m³)	3.25	3.35	3.01
MAPE (%)	10.90	10.84	10.37

Table 7. Performance indicators based on Data Scheme C.

	Random Forest	Adaboost	RF-Adaboost
R²	0.78	0.78	0.81
RMSE (m³/site)	7.67	7.72	7.08
MAE (m³)	3.71	3.78	3.36
MAPE (%)	8.83	8.86	7.97

Table 8. Comparison with other methods.

	Mauya [12]	Ruyi Zhou [33]	Huajian Huang [56]	Jingjing Zhou [31]	Yangyang Zhou [57]	Our Study
R²	0.63	0.65	0.78	0.82	0.78	0.81
MAPE (%)	N/A	32.89%	16.20%	N/A	N/A	15.00%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Zhang, C.; Qiang, Z.; Xu, W.; Fan, J. A New Forest Growing Stock Volume Estimation Model Based on AdaBoost and Random Forest Model. Forests 2024, 15, 260. https://doi.org/10.3390/f15020260

AMA Style

Wang X, Zhang C, Qiang Z, Xu W, Fan J. A New Forest Growing Stock Volume Estimation Model Based on AdaBoost and Random Forest Model. Forests. 2024; 15(2):260. https://doi.org/10.3390/f15020260

Chicago/Turabian Style

Wang, Xiaorui, Chao Zhang, Zhenping Qiang, Weiheng Xu, and Jinming Fan. 2024. "A New Forest Growing Stock Volume Estimation Model Based on AdaBoost and Random Forest Model" Forests 15, no. 2: 260. https://doi.org/10.3390/f15020260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Forest Growing Stock Volume Estimation Model Based on AdaBoost and Random Forest Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Research Area

2.2. Research Data

2.2.1. Landsat Data

2.2.2. Ground Data

2.3. Feature Extraction

2.3.1. The Independent Variable Factors from Landsat 8 Image

2.3.2. The Independent Variable Factors from Ground Data

2.3.3. Data Preprocessing

2.3.4. Feature Schemes

2.4. Methods

2.4.1. Random Forest Model (RF)

2.4.2. Adaptive Boosting Model (AdaBoost)

2.4.3. Adaptive Boosting Based on Random Forest Model (RF-Adaboost)

2.4.4. Model Parameters

2.5. Model Performance Indicators

3. Results

3.1. Analysis for Data Scheme A

3.2. Analysis for Data Scheme B

3.3. Analysis for Data Scheme C

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI