• SVR

SVM is a class of generalized linear algorithms that performs the classification of data in a supervised learning manner, where the decision boundary is the hyperplane of maximum margins solved for the learned samples. SVR is a transformation of SVM designed for regression problems and can perform nonlinear problems by kernel method. Linear kernel and penalty factor of 1 were applied for SVR in this study.

• KNN

The KNN method is a multivariate nonparametric algorithm that uses a set of predictors (Xs) to match each target pixel to a number (K) of most similar (nearest neighbors) reference pixels for which values of response variables (Y) are known. The number of nearest neighbors was set to 5 and uniform weights were utilized in this study.

• CNN

CNN, firstly developed in 1995 for the classification of handwritten images [91], is one of the most representative algorithms of deep learning. CNN interprets spatial data by scanning it using a series of trainable moving windows and has the capability of representation learning in a translation-invariant manner according to its hierarchical structure. In this study, the CNN model had a simple structure with an input layer, a hidden layer, and an output layer, and was implemented using an epoch of 1000 and a batch size of 30.

#### 2.3.4. Ensemble Learning Algorithms

Stacked generalization (SG) which is a layered ensemble learning algorithm [92] was applied in this study. There are two layers designed in the SG algorithm here, including basic models and meta models. The input of the base model is the original training set and the output of the base model is applied as the training set for meta model [93]. The meta model could be a single model or an ensemble model [93,94], like RF. To obtain a better performance of SG, the base models should be accurate and different as much as possible. Thus, the four best-performing machine learning algorithms described in Section 2.3.3 were selected for the base models according to leave-one-out cross-validation and meta models for establishing SG algorithms in this study, which resulted in four SG algorithms. The flowchart of the SG algorithm in this study was presented in Figure 3.

**Figure 3.** Flowchart of stacked generalization (SG) algorithm in this study. Note: The number of the base model (N) was set to four in this study and 195 iterations were running within each model because of the leave-one-out cross-validation of 195 sample plots.

#### 2.3.5. Model Evaluation

This study adopted a leave-one-out cross-validation method to evaluate the model accuracy. Since 195 sample plots were used in this study, the training and testing data were 194 plots and 1 plot, respectively; and 195 iterations were run for each model. Due to the problems of coefficient of linear determination (*R*2) for nonlinear models [95], we avoid applying *R*<sup>2</sup> of machine learning models established by selected features and AGB. However, *R*<sup>2</sup> of actual and predicted AGB could be used as an indicator since the relationship of actual and predicted AGB can be described by a simple linear model. Therefore, six indices were applied for model evaluation, including *R*<sup>2</sup> of actual and predicted AGB, root mean squared error (RMSE), relative root mean squared error (rRMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and precision measure (PM). The equations were shown as follows:

$$R^2 = \frac{\sum\_{i=1}^{n} \left(\hat{y}\_i - \overline{y}\right)^2}{\sum\_{i=1}^{n} \left(y\_i - \overline{y}\right)^2} \tag{5}$$

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left( y\_i - \mathcal{Y} \right)^2} \tag{6}$$

$$rRMSE = \frac{\sqrt{\frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y})^2}}{\overline{y}} \tag{7}$$

$$MAE = \frac{1}{n} \sum\_{i=1}^{n} |y\_i - \hat{y}\_i| \tag{8}$$

$$MAPE = \frac{1}{n} \sum\_{i=1}^{n} \frac{|y\_i - \hat{y}\_i|}{y\_i} \times 100\% \tag{9}$$

$$PM = \frac{\sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2}{\sum\_{i=1}^{n} (y\_i - \overline{y})^2} \tag{10}$$

where *n* represents the number of observation samples, *yi* represents the actual AGB of the *i*th plot, *y*ˆ*<sup>i</sup>* represents the predicted AGB of the *i*th plot, and *y* represents the mean of the actual AGB. All the model fitting and evaluation procedures in this study were implemented by python 3.7 (https://www.python.org/downloads/ (accessed on 1 September 2021)), TensorFlow 2.2 (https://tensorflow.google.cn/ (accessed on 1 September 2021)) and sklearn (https://scikit-learn.org/stable/ (accessed on 1 September 2021)).

#### **3. Results**

#### *3.1. Feature Selection*

Due to a large number of extracted features (199 features in total), two-step feature selection was implemented in this study, including preliminary selection using Pearson correlation coefficient; and further selection based on variable importance measure using random forest. Finally, nine ALS features were selected and sorted from highest to lowest variable importance as follows: elev\_mean, int\_AII\_5th, elev\_cv, density\_7th, int\_max, int\_AII\_40th, int\_per\_60th, int\_per\_80th, and int\_AII\_50th; nine features extracted from Landsat 8 were selected and ranked in descending order of variable importance: MVI5, B1, B76, B65, B53, Entr\_B5, B2, ND563, and MVI7. The selected features and their descriptions were listed in Table 3.


**Table 3.** Feature Selection of ALS and Landsat 8 imagery.

To grope for the best-performing ALS feature, simple linear regressions were established to model the relationship between AGB and each ALS feature. The result of univariate models showed that the elevation mean outperformed other ALS features due to higher *R*<sup>2</sup> and lower *RMSE*, *rRMSE*, *MAE*, *MAPE*, and *PM* (Table 4). Thus, elevation mean was selected as the best-performing ALS feature to generate *COLI*1 and *COLI*2 using Equations (3) and (4).


**Table 4.** Accuracy assessment of the univariate models with AGB and each ALS feature.

#### *3.2. Performance of Classic Machine Learning Algorithms*

#### 3.2.1. Experiment I

The goal of feature experiment I was to explore the effects of features from different data sources (optical imagery, ALS, and combined data) on AGB estimation based on seven classic machine learning algorithms, including ELM, BP, RegT, RF, SVR, KNN, and CNN. MLR was implemented as a baseline for model comparison. Table 5 shows the performance of the eight models using the three sets of features designed in Experiment I.

**Table 5.** Accuracy assessment of classic machine learning algorithms with three sets of features designed in experiment I.


<sup>1</sup> MLR- multiple linear regression; ELM—extreme learning machine; BP—back propagation; RegT—regression tree; RF—random forest; SVR—support vector regression; KNN—k-nearest neighbor regression; CNN convolutional neural networks

In general, the optimal ALS features (Feature 1) performed significantly better than the optimal Landsat 8 features (Feature 2) for AGB estimation, no matter of algorithms; the combination of the optimal ALS and Landsat 8 features (Feature 1 + 2) performed differently for various algorithms. For each data source, the accuracy of CNN was greatly higher than that of other algorithms, especially for applying both ALS and Landsat 8 features (*R*<sup>2</sup> = 0.97, *RMSE* = 12.6, *rRMSE* = 0.08, *MAE* = 6.43, *MAPE* = 4.02, *PM* = 0.13). However, it is worth mentioning that the accuracies of other algorithms (except CNN) based on two data sources (Feature 1 + 2) were not significantly improved compared with those based on optimal ALS features (Feature 1), which suggested that the accuracy of AGB estimation not only depends on data sources but also different algorithms. Some algorithms (like RF and SVR) could provide very similar accuracy using both optimal ALS and Landsat 8 features to that using only optimal ALS features, making it meaningless to involve optical imagery. Thus, ALS data are of significance to AGB estimation.

#### 3.2.2. Experiment II

After determining the best-performing ALS feature (i.e., elevation mean), we designed feature experiment II to investigate how to efficiently combine the unique feature with the optimal Landsat 8 features for AGB estimation. Is it better to utilize a novel feature extracted from elevation mean and optimal Landsat 8 features (i.e., *COLI*1 and *COLI*2) or directly combine all the features? A similar feature size in experiment II (i.e., 9 or 10) could avoid the unfair comparison due to the big difference in feature number. Table 6 presented the accuracy assessment of classic machine learning algorithms with three sets of features designed in experiment II. The results showed that the addition of elevation mean significantly improves the accuracies of AGB estimation compared to those using optical features only (Feature 2), no matter how to add it. The models except CNN had very similar performances in AGB estimation for the three feature combinations in experiment II. CNN still showed great advantages like Experiment I, especially for the case of simply combining the optimal Landsat 8 features and elevation mean together (Feature 2 + 3) with the accuracy of *R*<sup>2</sup> = 0.88, *RMSE* = 24.48, *rRMSE* = 0.16, *MAE* = 10.19, *MAPE* = 7.23, and *PM* = 0.24, followed by the case of all *COLI*2 (Feature 5), and then the case of all *COLI*1 (Feature 4). Thus, it seemed unnecessary to generate the new features (i.e., *COLI*1 or *COLI*2) when CNN was applied for AGB estimation based on the optimal Landsat 8 features and the best-performing ALS feature for NSFs.


**Table 6.** Accuracy assessment of classic machine learning algorithms with three sets of features designed in experiment II.

#### 3.2.3. Experiment III

To investigate the effect of combing optimal ALS and Landsat 8 features and two types of novel features (*COLI*1 or *COLI*2) using classic machine learning algorithms, experiment III was implemented (Table 7). Comparing to the result of applying optimal ALS and Landsat 8 features (Feature 1 + 2) in Table 5, the additions of the novel features, no matter *COLI*1 or *COLI*2, slightly improved the accuracies of most models, like MLR, BP, RegT, RF, and KNN. In addition, the accuracies of all models except RF using optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) were slightly improved compared to those using optimal ALS and Landsat 8 features and all *COLI*1 (Feature 1 + 2 + 4), indicating *COLI*2 were more efficient than *COLI*1 for AGB estimation of NSFs. CNN was still much superior to other algorithms and reached the highest accuracies (*R*<sup>2</sup> = 0.99, *RMSE* = 6.85, *rRMSE* = 0.04, *MAE* = 2.95, *MAPE* = 1.02, *PM* = 0.03) when optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) was applied.

**Table 7.** Accuracy assessment of classic machine learning algorithms with two sets of features designed in experiment III.


*3.3. Performance of Ensemble Learning Algorithms*

3.3.1. Experiment I

To explore the performances of ensemble learning algorithms in estimating AGB based on different feature combinations, Experiment I, II, and III were also implemented using the designed SG algorithms. According to the results of classic machine learning algorithms (Tables 5–7), four best-performing models, that is, RF, SVR, KNN, and CNN, were selected as base models for the SG algorithm. The predictions of base models were applied as the input of the meta model of the SG algorithms, which were also RF, SVR, KNN, and CNN. Thus, there were four SG algorithms due to four meta models, including SG(RF), SG(SVR), SG(KNN), and SG(CNN). Table 8 presented the accuracy assessment of ensemble learning algorithms with three sets of features designed in experiment I. Comparing to the results of base models (Table 5), the SG algorithms greatly improved the accuracy of AGB estimation using the optimal Landsat 8 features (Feature 2) and the combined optimal features (Feature 1 + 2). However, for the case of optimal ALS features (Feature 1), the SG algorithms had slightly lower accuracies than those of base models, except CNN. In general, CNN still performed best as a meta model of SG algorithm, followed by SG algorithm with SVR meta model, and finally with RF meta model as well as KNN model. Although CNN was still an outstanding meta model for all the cases, it was worth noting that the drastic improvements of accuracies brought by SG(SVR), SG(RF), and SG(KNN) compared with their corresponding base model, especially for the Feature 2 and Feature 1 + 2. For example, *R*<sup>2</sup> of SG(SVR), SG(RF), and SG(KNN) increased approximately 30%–40% and 60%–70% for Feature 2 and Feature 1 + 2, respectively; alternatively, *R*<sup>2</sup> of SG(CNN) only increased 49% and 0% for Feature 2 and Feature 1 + 2, respectively. Other indices (*RMSE*, *rRMSE*, *MAE*, *MAPE*, and *PM*) had similar trends, but in the opposite direction. Thus, it had more room for improvement to apply the SG algorithms for relatively weaker learners (like SVR, RF, and KNN) than strong deep learning learners (like CNN).

**Table 8.** Accuracy assessment of ensemble learning algorithms with three sets of features designed in experiment I.


#### 3.3.2. Experiment II

Feature experiment II was also implemented to investigate how to integrate elevation mean and the optimal Landsat 8 features for AGB estimation based on ensemble learning algorithms (Table 9). It showed that the SG algorithms greatly improved the accuracies for all the cases except the SG(CNN) for Feature 5 and Feature 2 + 3, comparing to the accuracies using the corresponding base model (Table 6). When SG algorithms were utilized, the trend that the simple combination of optimal Landsat 8 features and elevation mean (Feature 2 + 3) performed best, followed by all *COLI*2 (Feature 5), and finally all *COLI*1 (Feature 4) was much more obvious than that using classic machine learning algorithms (Table 6 vs. Table 9). The advantage of applying deep learning algorithm CNN as meta model decreased with the dramatic increase in the accuracies of the other three algorithms (i.e., RF, SVR, and KNN), especially for Feature 5 and Feature 2 + 3. In other words, when the feature set of all *COLI*2 or the feature set of optimal Landsat 8 features and elevation mean was applied for AGB estimation, SG(RF), SG(SVR), and SG(KNN) had comparable accuracies to SG(CNN).

**Table 9.** Accuracy assessment of ensemble learning algorithms with three sets of features designed in experiment II.


#### 3.3.3. Experiment III

The effect of combing optimal ALS and Landsat 8 features and two types of novel features (*COLI*1 or *COLI*2) on AGB estimation using ensemble algorithms was investigated with experiment III (Table 10). Unlike classic machine learning algorithms, the addition of *COLI*1 in ensemble algorithms did not improve the accuracies of AGB estimation, compared to the result of applying optimal ALS and Landsat 8 features (Feature 1 + 2) in Table 8. The SG(SVR) or SG(KNN) with the addition of *COLI*1 even lower R<sup>2</sup> by about 10%–20% than SG(SVR) or SG(KNN) with only Feature 1 + 2 (Table 8). However, the addition of *COLI*2 in ensemble algorithms slightly increased the accuracies of most models except SG(KNN), even though SG algorithms with Feature 1 + 2 had already performed well (Table 8). In general, the SG algorithms with optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) had more stable accuracies than that with optimal ALS and Landsat 8 features and all *COLI*1 (Feature 1 + 2 + 4), no matter which meta model was used, indicating *COLI*2 were more efficient than *COLI*1 for AGB estimation of NSFs. It is still the SG model with CNN meta model that has the highest accuracy (*R*<sup>2</sup> = 0.99, *RMSE* = 2.02, *rRMSE* = 0.01, *MAE* = 0.87, *MAPE* = 0.73, *PM* = 0.02) when optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) was applied.

**Table 10.** Accuracy assessment of ensemble learning algorithms with two sets of features designed in experiment III.


In addition, the ensemble algorithms greatly improved the accuracies of the corresponding features and base model (Table 10 vs. Table 7). For example, if the combination of optimal ALS and Landsat 8 features and all *COLI*1 (Feature 1 + 2 + 4) was utilized, the *R*<sup>2</sup> of SG(RF) increased more than 60% compared with that of the RF model; *RMSE*, *rRMSE*, *MAE*, *MAPE* and *PM* of SG(RF) decreased by 75%, 73%, 71%, 76%, and 81%, respectively, compared with those of the RF model. Although the CNN base model had already achieved high accuracy, especially when applying the combination of optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5 in Table 7), the SG(CNN) still decreased the group of *RMSE*, *rRMSE*, and *MAE* and the group of *MAPE* and *PM* by about 70% and 30%, respectively.

#### *3.4. Wall-to-Wall AGB Predictions*

Based on the above results and algorithm efficiency, CNN and the feature set of optimal ALS and Landsat 8 and all *COLI*2 (Feature 1 + 2 + 5) were selected for a wall-towall AGB prediction of the entire Maorshan Experimental Forest Farm of NEFU (Figure 4). The predicted AGB varied from 0 to 491.04 Mg/ha, with a mean value of 59.9 Mg/ha and a standard deviation of 48.69 Mg/ha. The area with AGB of 0 or low values was located along rivers, roads, or residential regions, whereas the area with high AGB values was located in the center part (e.g., Zhonglin, Yuejin, Beiling, Donglin, and Xinken working districts) of Maorshan (Figure 4a). However, the embedded pattern of high and low AGB values was obvious for most of the study area, as the enlarged area in Zhonglin working district (Figure 4b).

**Figure 4.** (**a**) The wall-to-wall AGB prediction of the entire study area estimated by the CNN model with optimal ALS features, optimal Landsat 8 features, and all *COLI*2 (Feature 1 + 2 + 5); (**b**) Spatial distribution of AGB for a partial area in Zhonglin working district.

Figure 5 showed the relationship of actual and estimated AGB (Mg/ha) of 195 plots using the CNN algorithm based on different feature sets. For experiment I, it was better to apply ALS than Landsat 8 to predict AGB if only one data source had to be used, which indicated the vertical forest structure was more vital than spectral information for AGB estimation of NSFs. The synergism of optical imagery and ALS markedly increased the accuracy of a single data source (Figure 5c vs. Figure 5a or Figure 5b) since it could effectively alleviate the underestimation of high AGB values. Even only one ALS feature (i.e., elevation mean) was added to the Landsat 8 features (Experiment II), the improvement was obvious and significant. However, it was unnecessary to generate novel features like *COLI*1 or *COLI*2 using the optimal Landsat 8 and elevation mean. It was in evidence that the performance of directly combining them was much better than that of new features (Figure 5f vs. Figure 5d) or Figure 5e), but worse than that of all optimal ALS and Landsat 8 features (Figure 5f vs. Figure 5c) due to the smaller number of features (i.e., 10 vs. 18). The effectiveness of *COLI*1 was very limited because Feature 1 + 2 provided a comparable result to Feature 1 +2+4 (Figure 5c vs. Figure 5g). It is the most efficient to combine all optimal ALS, Landsat 8, and *COLI*2 features, especially for estimating high AGB values (Figure 5h).

**Figure 5.** The relationship of actual and estimated AGB (Mg/ha) of 195 plots using CNN algorithm based on (**a**) Feature 1: optimal ALS features; (**b**) Feature 2: Optimal Landsat 8 features; (**c**) Feature 1 + 2: Optimal ALS and Landsat 8 features; (**d**) Feature 4: All *COLI*1; (**e**) Feature 5: All *COLI*2; (**f**) Feature 2 + 3: Optimal Landsat 8 features and the best performing ALS feature; (**g**) Feature 1 + 2 + 4: Optimal ALS features, optimal Landsat 8 features, and all *COLI*1; (**h**) Feature 1 + 2 + 5: Optimal ALS features, optimal Landsat 8 features, and all *COLI*2. Note: The red and black lines represent the fitted regression lines and the line of 45◦, respectively.

#### **4. Discussion**

#### *4.1. AGB Estimation Using Different Features*

The differences in features are responses to the characteristics of different data sources. In this study, we extracted a variety of features and investigated the effects of different synergistic approaches of features derived from ALS and Landsat 8 OLI imagery on AGB estimation of NSFs of northeastern China. For ALS data, besides elevation features, density- (e.g., density\_metrics7) and intensity-related (e.g., int\_AII\_5th, int\_max, int\_AII\_40th, int\_per\_60th, int\_per\_80th, and int\_AII\_50th) metrics also had great potentials in AGB estimation; for Landsat 8 imagery, band combinations and texture are more efficient than vegetation indices, especially MVI5 (i.e., the band combination of band 5, 4 and 2). Unfortunately, some traditional vegetation indices that commonly applied in previous studies [48], for example, the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), atmospherically resistant vegetation index (ARVI), soil adjusted vegetation index (SAVI), etc., were excluded due to the low correlations with AGB. Only

one vegetation index (i.e., ND563) was selected. It might be because that study area is a natural secondary forest with high canopy density which could easily result in the saturation (insensitivity to AGB) of the traditional vegetation indices, which was also confirmed in [96,97]. The low accuracies (e.g., *R*<sup>2</sup> < 0.3) of AGB estimations using the optimal Landsat 8 features (Feature 2), no matter of algorithms, indicating the difficulties of AGB estimation of NSFs as well. Due to the vegetation characteristics, near-infrared and shortwave infrared bands (i.e., band 5, 6, and 7) were more related to AGB estimation than other bands.

Similar to previous studies [48,52,98], it was beneficial to combine ALS data and optical imagery, even only combining one significant feature derived from ALS (like elevation mean). The synergistic method of extracting novel features (i.e., *COLI*1 and *COLI*2) using optimal Landsat 8 features and the best-performing ALS feature (i.e., elevation mean) yielded higher accuracy of AGB estimation than either optical-only or ALS-only features when the same model was implemented. From experiment II and III, it showed that *COLI*2 had more advantages than *COLI*1 in AGB estimations of NSFs, which is different from [48] due to different forest types (NSFs of northeastern China vs. mixed forests of southern China). However, it is surprised to find out that the novel extracted features (*COLI*1 and *COLI*2) were not efficient in improving the accuracy compared to the simple combination of the untransformed features (optimal Landsat 8 features + BLV), which indicated the great convenience and effectiveness brought by just adding the best-performing ALS feature (i.e., elevation mean) to the original set of Landsat 8 features for AGB estimation of NSFs. The number of features was also a vital factor to influence the AGB accuracy. To make sure a fair comparison of synergistic approaches of features, we keep the number of features consistent as much as possible within each experiment. It is a trend that the accuracy of AGB estimation raises with the increase in the number of involved features under the same conditions (e.g., algorithms). Thus, it was not surprising that the combination with 27 features (i.e., Feature 1 + 2 + 4 or Feature 1 + 2 + 5) in experiment III provided the best performances in this study, from a feature size perspective.

#### *4.2. AGB Estimation Using Machine Learning Algorithms*

The effect of classic machine learning and ensemble learning algorithms on AGB estimation using different features was explored in this study. The RF algorithm that is one of the most commonly used algorithms in forestry only provided very modest accuracy in this study since it constantly overfits the data, often with poorer predictions [33]. CNN, a deep learning algorithm firstly developed in 1995 for the classification of handwritten images [91], showed absolute advantages compared with other classic algorithms (e.g., ELM, BP, RF, KNN, SVR, etc.). As a representative of deep learning algorithms that is a branch of machine learning, a large and deep CNN (consisting of many-layered convolutions) was further developed in 2012 and achieved a winning top-5 test error rate of 15.3% in the ImageNet ILSVRC-2012 competition [99]. In recent years, the CNN model has been increasingly applied in forestry, for example, for the prediction of forest inventory parameters and identification of different tree species [100,101]. CNN interprets spatial data by scanning it using a series of trainable moving windows and sufficiently complex artificial neural networks and does not require human-derived feature selection in essence [100]. However, to make sure a fair comparison of different models, we keep the feature selection procedure consistent for all models. It means that the CNN model was applied for two-dimensional data of AGB and a set of human-derived features instead of a three-dimensional image. Although the CNN model lost the advantage of automatically extracting and selecting features, it is still sensitive to changes in features and significantly superior to other models (e.g., ELM, BP, RF, KNN, SVR, etc.).

The SG algorithms, a kind of ensemble learning algorithms, applied heterogeneous ensemble methods with different base models and greatly improved the AGB estimation accuracy in this study. RF, KNN, SVR, and CNN were selected as base models since SG algorithms could take advantage of the good and stable predictions from base models.

The good prediction of the CNN base model successfully made the accuracy of the SG algorithms improved and stable no matter of meta-models, which indicated that SG has a stronger generalization ability than base models. In other words, it is more beneficial for weaker learners (e.g., RF, KNN, and SVR) to become stronger learners using SG algorithms than strong learners (e.g., CNN).

However, although the SG algorithm is superior to its corresponding base model, we still recommend employing the CNN model for AGB estimation in practice due to its comparable accuracy and good efficiency. Table 11 summarized the efficiency (i.e., runtime) of all the algorithms with the combination of the optimal ALS and Landsat 8 features, and all COLI2 (Feature 1 + 2 + 5) for AGB estimation of 195 plots on a computer with AMD RX3700x + 16GB + GTX960 4GB. It showed that the runtime of ensemble algorithms (i.e., SG(RF), SG(KNN), SG(SVR), SG(CNN)) was dramatically augmented compared with their corresponding base model (i.e., RF, KNN, SVR, CNN). For example, the efficiency of SG(CNN) is only half of that of the CNN model. Other SG algorithms (i.e., SG(RF), SG(KNN), SG(SVR)) raised the runtime of the corresponding algorithm (i.e., RF, KNN, SVR) even more. The CNN model had the longest runtime but yield the highest accuracy (see Tables 5–7) among classic machine learning algorithms due to the most complex structure. Thus, to balance the workload and accuracy, the wall-to-wall AGB prediction map was generated using the CNN model with the combination of the optimal ALS and Landsat 8 features, and all COLI2 (Feature 1 + 2 + 5) in this study.

**Table 11.** The runtime of all algorithms with the combination of the optimal ALS and Landsat 8 features, and all COLI2 (Feature 1 + 2 + 5).


#### *4.3. Comparison of Estimated Forest AGB and Current Publications*

From the AGB accuracy perspective, the highest accuracy (*R*<sup>2</sup> = 0.99, *RMSE* = 2.02, *rRMSE* = 0.01, *MAE* = 0.87, *MAPE* = 0.73, *PM* = 0.02) was yielded by SG(CNN) algorithm with the combination of the optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) in this study, which was better than other similar AGB studies that applied both LiDAR and optical imagery (e.g., [48,61,69,98,102]). Besides features and algorithms, the high accuracy of this study also benefited from the case of a local study with a relatively small area. It tends to decrease the accuracy for national and global scales. For example, Su et al. [69] provided the *R*<sup>2</sup> of 0.75 and the RMSE of 42.39 Mg/ha for the AGB estimation of China based on ICESat GLAS laser altimetry data, MODIS, and forest inventory data. Yang et al. [103] produced a global forest AGB map with the *R*<sup>2</sup> of 0.90 and the *RMSE* of 35.87 Mg/ha using gradient augmented regression trees algorithm based on multiple data sources (e.g., LiDAR-derived forest AGB datasets, field measurements, high-level products from optical satellite imagery, etc.).

Further, we dig into the predicted AGB values of the wall-to-wall map of the entire Maorshan and compared the distributions of AGB values of the wall-to-wall prediction map and 195 sample plots (Figure 6). Although the spatial distribution of AGB values of the wall-to-wall prediction map seemed to be reasonable (Figure 4), it showed that there was still a big difference between the two distributions, especially for the ranges of 0–50 Mg/ha and >200 Mg/ha (Figure 6), indicating the underestimation of high AGB values and overestimation of low AGB values. It suggested that the data saturation in

Landsat imagery was not fully eliminated in this study of natural secondary forests. For Heilongjiang province, the average forest AGB density estimated by [69,104] was 81 Mg/ha and 85 Mg/ha, respectively (using a ratio of 50% for the conversion from forest AGB to AGB carbon stock); for the entire northeastern China, the average forest AGB density estimated by [57,105] was 83.50 Mg/ha and 89.30 Mg/ha, respectively. All these values were significantly higher than the average AGB of 59.9 Mg/ha in this study. The first reason for that could be the different study area: the area of either Heilongjiang province or northeastern China is much larger than Maorshan Experimental Forest Farm and includes the areas with high AGB values, such as Daxing'an Mountains, Xiaoxing'an Mountain, or Changbai Mountains, which results in a higher average AGB value. The second reason could be that the data saturation in this study greatly causes the relatively low average AGB, although the range of predicted AGB (0–491.04 Mg/ha) is reasonable. Thus, how to eliminate data saturation and quantitatively determine saturation for NSFs still need further investigation.

**Figure 6.** The distributions of AGB values of wall-to-wall prediction map (blue bars with one slash) and 195 sample plots (orange bars with double slashes).

#### *4.4. Limitations and Recommendations*

The AGB retrievals with high accuracy from remotely sensed data is not an easy task. Every procedure or factor could greatly influence the accuracy, including data sources, feature extraction and selection, estimation models, and model evaluation, and so on. Although high accuracies of AGB estimation were yielded by the CNN and SG(CNN) models based on the combination of the optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5), there were still limitations in this study. First, in this study, we only tested the features (*COLI*1, *COLI*2) proposed by [48] and compared them with the direct combination of these original features that generated them for the AGB estimation of NSFs. It is possible to find a more effective approach to combine ALS and Landsat 8 imagery than *COLI*s for NSFs. Thus, it is still valuable to propose novel features or explore other synergistic approaches based on multiple data sources for various forest types.

The second limitation is that the underestimation of high AGB values and the overestimation of low AGB values were not eliminated from the wall-to-wall prediction map, although the CNN model had good efficiency and high accuracy according to model evaluation results. Data saturation might be responsible for this phenomenon and lead to a much lower average of AGB estimates of the entire study area than those values in similar studies [57,69,104,105]. The high risks of overfitting resulted from the data-driven models could be another possible reason for the big discrepancy between model evaluation results and final wall-to-wall prediction. Thus, the development of models with good

generalizability in the estimation of biomass and the interpretation of the physical meaning of models are strongly recommended in further research [17].

In addition, the model evaluation procedure based on leave-one-out cross-validation may be another incentive for the high accuracy of the CNN model using reference data. Leave-one-out cross-validation is a special case of K-fold cross-validation where the number of folds equals the number of records in the data set [106]. Since the evaluated model is applied once for each record, using all other records as a training set and the selected record as a single-item test set, it could tend to yield higher accuracy due to overfitting compared to ten-fold cross-validation, for example, which only uses 90% records to train the model. However, the quantitative effects of different cross-validation procedures on AGB estimations still need to be further investigated. Sometimes, it could be a big difference between the accuracy of the model evaluation procedure using reference data and wallto-wall prediction values. Thus, besides the traditional model evaluation procedure, we strongly suggest assessing the spatial distribution of AGB estimates based on a wall-towall prediction map and distribution of AGB estimates based on histogram compared to existed data.

The AGB estimation in this study was based on an area-based approach (ABA) that develops models to relate AGB with features derived from remotely sensed data at a plot level and apply the models over the whole study area [17]. The fixed plots of continuous forest resources inventory obtained in 2016 had an area of 20 m ×30 m with the geolocation error of 5 m, while the pixel size of Landsat 8 was 30 m × 30 m. Thus, geolocation mismatch between remotely sensed data (i.e., Landsat 8 imagery) and field measurements is another source of uncertainty of AGB estimation [107]. Fortunately, the large plot size (i.e., 195) in this study could greatly decrease the geolocation errors according to [107]: the geolocation errors will be stabilized below 5 m with 20 measurement points and below 3 m with 50 measurement points. Another drawback of this study is the lack of assessing biomass uncertainty based on ABA. It is difficult for AGB estimation using ABA to understand biomass uncertainties at different spatial scales [108]. In recent years, with the development of automatic individual tree crown delineation algorithms in precise forestry (e.g., [109,110]), the AGB estimation based on individual-tree-based approach (ITA) has received more and more attention because field data are needed only for a sample of trees instead of a sample of plots or stands [17]. In addition, ITA allows AGB estimation of tree-level, plot-level, and propagation of errors in an up-scaling framework [108]. Thus, it is appealing and worth estimating AGB based on ITA for a large-scale forest and quantifying its uncertainty from tree-level to plot-level then to stand-level in an up-scaling framework in subsequent research.

#### **5. Conclusions**

Accurate quantification of AGB plays a vital role in forest carbon sequestration in the context of climate change. In this study, we investigated the effects of different synergistic approaches of features and ensemble learning algorithms on AGB estimation of natural secondary forests of northeastern China based on ALS and Landsat 8 OLI imagery. It is conducive to combine active and passive data to improve the accuracy of AGB estimation. Unlike the previous study implemented in southeastern China [48], we found that *COLI*2 features are more effective in AGB estimation than *COLI*1 features for the NSFs. Sometimes, it might be more convenient and efficient to adopt the simple combination of the untransformed features (e.g., the optimal Landsat 8 features + BLV) than the novel features (i.e., *COLI*1 or *COLI*2), especially for NSFs of northeastern China. The CNN model was much superior to multiple linear regression and other classic machine learning algorithms (i.e., ELM, BP, RegT, RF, SVR, KNN) no matter of feature sets, and reached the highest accuracies (*R*<sup>2</sup> = 0.99, *RMSE* = 6.85, *rRMSE* = 0.04, *MAE* = 2.95, *MAPE* = 1.02, *PM* = 0.03) when optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) was applied. Ensemble learning algorithms (SG(RF), SG(SVR), SG(KNN), SG(CNN)) that took advantage of the good and stable predictions from the base models (i.e., RF, SVR, KNN, CNN) greatly

improved the accuracy of AGB and had stronger generalization ability compared to its corresponding base model. The ensemble learning algorithm is exceedingly adept to train weaker learners to strong learners, especially when applying heterogeneous ensemble strategy. The SG model with CNN meta-model performed best (*R*<sup>2</sup> = 0.99, *RMSE* = 2.02, *rRMSE* = 0.01, *MAE* = 0.87, *MAPE* = 0.73, *PM* = 0.02) with the feature combination of the optimal ALS and Landsat 8 features and all *COLI*2 (Feature 1 + 2 + 5) in this study. However, considering both the efficiency (i.e., runtime) and accuracy, a wall-to-wall AGB prediction map of Maoershan was generated using the CNN model and Feature1+2+5, instead of the SG(CNN) model. The average and standard deviation of the estimated AGB of Maoershan Experimental Forest Farm in 2015 was 59.9 Mg/ha and 48.69 Mg/ha, respectively, ranging from 0 to 491.04 Mg/ha. The lower average value than that of similar studies for northeastern China maybe because of the different study areas, data saturation, overfitting of the algorithm, and leave-one-out cross-validation. Estimating data saturation, developing advanced algorithms, understanding the effects of the different cross-validation procedures, and quantifying the sources of error are still fundamental and significant to AGB estimation at all levels.

**Author Contributions:** Conceptualization, W.F. and Z.Z.; methodology, C.D.; software, Y.M.; validation, C.D., Y.M., and Z.Z.; formal analysis, C.D. and H.-I.J.; data curation and preprocessing, W.F. and H.-I.J.; writing—original draft preparation, C.D.; writing—review and editing, Z.Z.; visualization, Y.M.; supervision, Z.Z.; project administration, Z.Z.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China "Multi-scale forest aboveground biomass estimation and its spatial uncertainty analysis based on individual tree detection techniques", 32071677; "The Fundamental Research Funds for the Central Universities", 2572019CP15,2572020BA05.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


**Table A1.** The 101 features extracted from ALS data in this study.


#### **Table A1.** *Cont.*

<sup>1</sup> *Nveg*: point number of vegetation; *N*: the total return number; *N'*: the number of ground points whose elevation is lower than the height threshold of 2m for separating ground and tree points; *A*: average scanning angle; *k*: extinction coefficient, which is closely related to the leaf inclination angle distribution of the canopy. <sup>2</sup> n is the number of points in a pixel; *Zi*: the elevation of *i* point within a pixel, *Z*, *Z*min, *Z*max, *Z*std are the average, minimum, maximum, and standard deviation of elevation of all points within a pixel, respectively; AIH75% and AIH25% represents the 75% and 25% AIH statistical layer, respectively. <sup>3</sup> *Ii*: the elevation of *i* point within a pixel, *I*, *I*min, *I*max, *I*std are the average, minimum, maximum, and standard deviation of intensity of all points within a pixel, respectively; Int75% and Int25% are 75% and 25% intensity statistical layer, respectively.

Int75%–Int25%


**Table A2.** The 98 spectral features extracted from Landsat 8 OLI imagery in this study.



<sup>1</sup> The index *<sup>i</sup>* represents the band index (1–7). <sup>2</sup> GLCM: gray-level co-occurrence matrix. <sup>3</sup> *<sup>L</sup>* = 2·s·(B5 − B4)·(B5 −

s·B4)/(B5 + B4) where s is the slope of the soil line from a plot of red versus near infrared brightness values.

#### **References**

