**Estimating Forest Aboveground Carbon Storage in Hang-Jia-Hu Using Landsat TM**/**OLI Data and Random Forest Model**

**Meng Zhang 1,2,3, Huaqiang Du 1,2,3,\*, Guomo Zhou 1,2,3, Xuejian Li 1,2,3, Fangjie Mao 1,2,3, Luofan Dong 1,2,3, Junlong Zheng 1,2,3, Hua Liu 1,2,3, Zihao Huang 1,2,3 and Shaobai He 1,2,3**


Received: 27 September 2019; Accepted: 7 November 2019; Published: 9 November 2019

**Abstract:** Dynamic monitoring of carbon storage in forests resources is important for tracking ecosystem functionalities and climate change impacts. In this study, we used multi-year Landsat data combined with a Random Forest (RF) algorithm to estimate the forest aboveground carbon (AGC) in a forest area in China (Hang-Jia-Hu) and analyzed its spatiotemporal changes during the past two decades. Maximum likelihood classification was applied to make land-use maps. Remote sensing variables, such as the spectral band, vegetation indices, and derived texture features, were extracted from 20 Landsat TM and OLI images over five different years (2000, 2004, 2010, 2015, and 2018). These variables were subsequently selected according to their importance and subsequently used in the RF algorithm to build an estimation model of forest AGC. The results showed the following: (1) Verification of classification results showed maximum likelihood can extract land information effectively. Our land cover classification yielded overall accuracies between 86.86% and 89.47%. (2) Additionally, our RF models showed good performance in predicting forest AGC, with R2 from 0.65 to 0.73 in the training and testing phase and a RMSE range between 3.18 and 6.66 Mg/ha. RMSEr in the testing phase ranged from 20.27 to 22.27 with a low model error. (3) The estimation results indicated that forest AGC in the past two decades increased with density at 10.14 Mg/ha, 21.63 Mg/ha, 26.39 Mg/ha, 29.25 Mg/ha, and 44.59 Mg/ha in 2000, 2004, 2010, 2015, and 2018. The total forest AGC storage had a growth rate of 285%. (4) Our study showed that, although forest area decreased in the study area during the time period under study, the total forest AGC increased due to an increment in forest AGC density. However, such an effect is overridden in the vicinity of cities by intense urbanization and the loss of forest covers. Our study demonstrated that the combined use of remote sensing data and machine learning techniques can improve our ability to track the forest changes in support of regional natural resource management practices.

**Keywords:** Landsat dataset; forest AGC estimation; random forest; spatiotemporal evolution

#### **1. Introduction**

Forests comprise a major part of the terrestrial ecosystems, occupying about 30% of the world's land area, and they are the main contributor to carbon (C) emissions and removal [1–3]. They store more than 80% of forest aboveground carbon (AGC) in terrestrial ecosystems, more than 70% of global soil organic C [4–6] and more than double the amount of C in the atmosphere [7]. As an important part of terrestrial ecosystems, forest ecosystems are a huge global carbon pool and they will likely play a long-term and sustained role in mitigating the impacts of global warming [8–11]. Forest AGC is not only an important indicator reflecting the basic characteristics of forest ecosystems, but also a basis for evaluating forest structural functions and production potentials [12–14]. Accurate quantitative evaluation of forest AGC storages and their spatiotemporal patterns is critical for understanding the mechanisms that control the global terrestrial C cycle [15–20].

The methods of estimating in forest AGC generally include field survey, model simulation, and remote sensing inversion [21–24]. The traditional field survey method has high precision [17], but it is often limited by manpower and material resources, yielding short durations of observation. Additionally, because the field survey only covers limited locations, when used alone, it cannot estimate AGC over large areas. The ecological process models take the biological characteristics and growth mechanisms of vegetation into account, resulting in a higher accuracy of estimation [25]. However, it needs various parameters of vegetation to simulate forest AGC. If the input data are insufficient or missing, it will have a great impact on the prediction results. Remote sensing technology has been commonly used in monitoring forest AGC over broad areas due to its wide coverage of observation, timeliness, and repetitive data availability [26,27]. As spectral characteristics of land cover show great differences [28–32], it is one of the important links in current research to accurately quantify various indicators of forest resources [33,34]. Moreover, studies have found that remote sensing data and its derived bands have good practicability for simulating forest AGC [35–37]; this is especially the case when combined with machine learning algorithms that allow for large scale automated analysis of high dimensional data from satellites [38]. The machine learning approach can derive rich information from remote sensing data as the input data, and continuously optimize the algorithm's performance via empirical learning to make the results more feasible and credible [39–41]. Remotely sensed datasets, combined with machine learning algorithms for intelligent estimation of forest AGC, support more efficient and precise observation and management of forest resources [42–44].

Among the numerous machine learning techniques, random forest (RF) has recently emerged as popular due to its ability to select and rank a large number of predictor variables [45,46] and its reliance on an ensemble of decision trees as a strategy to improve model robustness. RF is unexcelled in accuracy among the current algorithms and it can run efficiently on large data bases. It can not only handle thousands of input variables without variable deletion, but can also give estimates of what variables are important in the model. Khatami et al. [47] classified images used a surveillance classification algorithm based on remote sensing data to classify images, and the results showed that the RF algorithm is superior to the traditional decision tree algorithm. Bargiel et al. [48] included phenology factors in the classification model and revealed that RF based on the phenology can be better to identify the crop types. Chen et al. [49] used climatic factors to simulate carbon dioxide flux and the results showed that the RF model had R2 values of 0.96 and 0.85 at the training and testing phases.

In this study, we focused on Hang-Jia-Hu, a region with rapid economic development in the northwestern part of Zhejiang, China. This region has a forest area of nearly 6.06 million hectares and LUCC caused by urban expansion due to socio-economic factors has great impact on forest resources. Therefore, the timely and efficient estimation of forest AGC in Hang-Jia-Hu has significant importance to the rational allocation of forest resources. Dynamic monitoring of forest AGC in Hang-Jia-Hu was carried out based on the Landsat time series remote sensing data combined with the RF algorithm. The importance of variables in the models and spatiotemporal evolution of forest AGC in the past two decades were analyzed.

#### **2. Materials and Methods**

#### *2.1. Study Area*

Hang-Jia-Hu is located in the northwestern part of Zhejiang Province, China, ranging from 118◦50 15 E to 121◦19 6 E and from 29◦42 52 N to 31◦11 53 N (Figure 1). Its climate is subtropical monsoon and the annual average temperature is 15–18 ◦C, with an average annual precipitation of about 1100 mm. The 18.1 million hectares study area administratively covered the entire Huzhou City and Jiaxing City and the northeastern part of Hangzhou City. Most forest in this place is distributed in the southwest of the study area with the main forest types being broad-leafed forest (BLF), coniferous forest (CNF), and bamboo forest (BMF).

#### *2.2. Datasets and Processing*

#### 2.2.1. Processing Landsat Times Series Products

30-m multispectral data of Landsat5 TM (2000, 2004, 2010) and Landsat8 OLI (2015, 2018) were downloaded from the United States Geological Survey (USGS). We selected cloud-free images from the years with ground observations for four scenes that cover our study area (Table 1). Few scenes image contain a high amount of cloud, but the region covering the study area is cloudless. Additionally, due to the poor image quality from 2014, we used the images in 2015 to correspond with field data in 2014.

Satellite remote sensed data are easily influenced by water vapor, aerosol, bidirectional reflection, and data transmission, which will result in serious fluctuations of time series data and influence the desired effect in data analysis [50,51]. Therefore, this study applied the Fast line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) [52–54] to eliminate such atmospheric interference in each

image. A digital elevation model (DEM) [55,56] was used to make terrain corrections, as terrain factors may affect the brightness values of original imagery.


**Table 1.** Acquisition date and cloud coverage (C) (%) of the Landsat datasets.

In addition to the original image bands, we calculated vegetation indices and texture variables to use as input data to the forest AGC model. The vegetation indices [57–59] included in this study included: NDVI (Normalized Difference Vegetation Index), SAVI (Soil Adjusted Vegetation Index), SR (Simple Ratio Index), DVI (Difference Vegetation Index), and EVI (Enhanced Vegetation Index). Additionally, the texture variables based on the gray-level co-occurrence matrix (GLCM) [60,61] included Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Angular second moment, and Correlation [27,58,59,62–64] with different windows (3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11) [27,65,66]. There were 251 total variables derived with the details that are shown in Table 2.


**Table 2.** Information of remote sensing variables.

#### 2.2.2. Processing Observed Data

Forest AGC plots and classification verification plots in 2000, 2004, 2010, and 2014 are all derived from the data of National Forest Inventory (NFI) [20,27] in Zhejiang province. The investigation method of NFI was systematic sampling, which is usually evenly placed at the intersection of kilometer grids of 1:50,000 topographic maps. Each plot size is 28.5 m × 28.5 m. Forest AGC data that were used in 2018 were derived from a field survey in July, 2019. Each plot of 30 m × 30 m was placed to cover a homogenous area of bamboo, inside which the number of trees and average diameter at breast height (DBH) were measured. After calculating the bamboo biomass of each plot by using the growth model, bamboo AGC was calculated by using the conversion coefficients between biomass and carbon stocks. Classification verification data in 2018 were manually and evenly selected from the result of unsupervised classified data and visual interpretation.

Land use in this paper was classified into six types: urban, water, cultivated land (CTL), BLF, CNF, and BMF based on the data of the NFI and field survey [26]. Maximum likelihood [68,69] was applied to make the land use classification map. The classification training samples in this paper are uniformly and evenly selected by the visual interpretation based on the land use spectral reflectance characteristics. The training samples of each land use type were consistent (400 pixels) and they could cover the study area. The total number of classification verification samples in 2000, 2004, 2010, 2014, and 2018 were 171, 175, 191, 156, and 240, respectively, and the numbers of forest verification samples were 120, 116, 137, 106, and 160. Table 3 details forest AGC plots used to construct the AGC estimation model from 2000 to 2018.


**Table 3.** Summary of the forest AGC plots of different years.

#### *2.3. Random Forest and AGC Estimation*

Breiman first made formal definition of the RF in 2001, which is a bagging of uncorrelated CART trees learned with randomized node optimization [70–72]. First, the algorithm generates N bootstrap samples for the training dataset. Second, a regression tree model is built for each bootstrap sample. Finally, the predicted results are obtained by averaging the predictions from all individual regressions [20]. RF has three important parameters: (1) the number of random regression trees (Ntree); (2) number of variables to be randomly sampled at each node in a tree (Mtry), used to search for the variable that best partitions samples in the training data set and the default number is 1/3 of input variables [73]; and, (3) the minimum number of terminal nodes (Nodesize) where the default value is 5 in regression analysis [8,49].

Forest AGC samples of different years were divided into two groups to optimize the RF model in this study (Table 3), with one group of data accounting for 70% samples represented training samples of the RF model and the remaining 30% represented test samples to test the model accuracy. Training data were also used to evaluate the importance of input variables to forest AGC and select input variables for constructing the RF model. Forest in this paper includes three types and each of them has corresponding forest AGC plots in 2000, 2004, and 2010. In 2014 and 2018, the AGC plots only have the forest type of BMF. Therefore, we decided to use data in 2000, 2004, and 2010 to construct individual AGC estimation models by using the forest non-stratification method [65] and applied the corresponding model to simulate forest AGC in 2000, 2004 and 2010. Subsequently, these three models are applied to predict forest AGC in 2015 and 2018 and the BMF AGC plots in 2014 and 2018 are used for verification. The estimation result in 2015 and 2018 with the highest accuracy in the verification phase will be chosen as the final simulation result. In this paper, we used "RandomForest" package in the R statistical software [74,75] to construct the forest AGC model. During the process of applying the model established, we put the entire image data into the model and run it, and use Python to read the model result into image format, and then perform spatial analysis.

#### *2.4. Accuracy Assessment*

The determination coefficient (*R*2), root mean square error (*RMSE*), and relative RMSE (*RMSEr*) [27,65] were used to evaluate the accuracy of the RF model. A higher accuracy is indicated by the higher values of *R*<sup>2</sup> and lower values of *RMSE* and *RMSEr*. *R*2, *RMSE*, and *RMSEr* were calculated, as follows:

$$\mathcal{R}^2 = 1 - \sum\_{i=1}^n \underbrace{\left(p\_i - o\_i\right)^2 / \sum\_{i=1}^n \left(o\_i - \overline{o\_i}\right)^2}\_{} \tag{1}$$

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left(p\_i - o\_i\right)^2} \tag{2}$$

$$RMSE = \frac{RMSE}{\overline{y}} \times 100\tag{3}$$

In Equations (1)–(3), *R*2, *RMSE*, and *RMSEr* are the indexes to measure model accuracy; *pi* and *oi* are the value of forest AGC samples in the predicted and observed phase in the year of 2000, 2004, and 2010; *oi* is the average value of observed plots; *y* is the average value of forest AGC in testing samples; and, *i* is the number of samples.

#### **3. Results**

#### *3.1. Land Use Classification*

Figure 2 presents the classification results using maximum likelihood methods.

**Figure 2.** Land use maps from 2000 to 2018.

Table 4 shows the overall accuracy of classification by using confusion matrix. It illustrates that the overall classification accuracies in different years were above 86.86%, and the highest was 89.47% with a kappa coefficient above 0.84 and a high of 0.87. The classification accuracies of the BLF, BMF, and CNF were between 86.67% and 89.62%. Classification verification results show a high level of precision. Based on this high precision classification result, we extracted BLF, BMF and CNF separately and use it to mask the forest AGC estimation results.


**Table 4.** Overall accuracy and kappa coefficient of land use classification.

#### *3.2. RF Model Construction*

#### 3.2.1. Parameters Optimization of RF

Training data were used to input into RF to traverse all of the variables values and finally obtain the optimal parameters. Figure 3 shows the results. Figure 3a presents Mtry, which is used to determine the minimum amounts of variables in each tree to construct RF. As can be seen from Figure 3a, when Mtry is a certain value, the model error is the smallest, which is the minimum Mtry value that is required. Ntree in Figure 3b shows that model error tends to be stable when the Mtry is big enough. Additionally, we used a 10-fold cross-validation to observe the effect of the number of variables on the model error. Results in Figure 3c represent the variation trend of average model errors in this process. It shows that the variables in the model are not as good as possible. When the model traverses all possibilities, the model error will tend to be stable until a certain amount, when the value of the model error reaches a minimum. Table 5 lists specific settings for different parameter values of different models.

**Figure 3.** (**a**) Influence of *Mtry to* model error (Mg/ha), (**b**) Influence of *Ntree to* model error (Mg/ha), (**c**) Influence of variables number to model error (Mg/ha).

**Table 5.** Optimized parameters of random forest (RF) models in different years.


#### 3.2.2. Variable Importance and Autocorrelation

The RF model in different years was ran 100 times each to observe changes in variables importance. We listed the first 20 variables with importance scores of different models in Figure 4. Among all of the variables used in the three models, most of the variables are texture features, while the original band and vegetation index account for a small proportion. Upon viewing of variable selection in different models, we found that 80% of the variables are the mean of the texture features in 2000 model. It reveals that the texture mean at different windows of the original band is important for estimating forest AGC. In the 2004 model, the sixth band of the original band occupies 45% of the overall variables, followed by the third band at 40%. Additionally, the texture window of 11\*11 accounts for 65% of the total variables. It illustrated the importance of the texture features of the third and sixth band at 11 × 11 windows to the estimation. In the 2010 model, the mean of the texture features occupies 55% of the total and the sixth band has a selection probability of 35%, which exceeds other variables. As for the importance scores, in the model of 2000, the influence of Tb2w7ME on forest AGC estimation is the largest at an average of 14.83, followed by Tb2w5ME at 14.73. For 2004, Tb311HO has the largest influence of 8.66, followed by Tb3w9DI with score at 8.61. Tb6w7VA has the largest effect on forest carbon prediction at a score of 13.67 and is followed by Tb6w11ME at 12.01.

**Figure 4.** Importance of the first 20 variables measured in %InMSE (the percentage increase in the mean squared error) from 100 runs of the RF. Note: Tbiwjxx, a texture image developed using the texture measure xx (xx can be such texture measures as ME, VA, HO, CON, DI, EN, SE, COR) on spectral band i (bandi) with a window size of j × j pixel (wj).

We calculated the frequency of the top 100 variables in the different RF models to further understand the proportion of variables in the models. In the 2000 RF model, texture information accounted for 92%, of which the fourth band has the most texture information, while the highest window size is w9, and ME accounts for 28.3% of all texture information. Texture information in the 2004 model occupies 98% of the total variables. The number of textures in the third band is up to 24.5%. W7 and ME have a major advantage in terms of window size and texture value. In 2010, 93% of the texture information in the first band accounted for 27.6%, and w7 is also the window with the most selection. The ME value has the same status as the previous two models in selection to establish forest AGC models.

RF, which is widely used to estimate forest parameters while using remote sensing data, can effectively solve the multicollinearity problems of complex variables in traditional statistical regression models [20,76]. Figure 5 shows the collinear relationship between the first 20 variables. A weak correlation indicates that the RF model could solve the collinearity between variables and the overfitting problem to ensure the accuracy of the forest AGC estimation. We selected the optimal number of variables and the optimal variables to build models for different years based on the optimized parameters.

*Forests* **2019**, *10*, 1004

**Figure 5.** Autocorrelation of the first 20 variables in different models.

#### *3.3. Estimation and Evaluation of Forest AGC*

We extracted the AGC estimation for the forest covers based on the classification results of the maximum likelihood method and simulation results of the RF model established above (Figure 6).

**Figure 6.** Spatial distribution of forest AGC from 2000 to 2018.

The model performance is evaluated with scatterplots of predicted AGC against the observed data (Figure 7). In the training phase (A) using data from 2000, 2004, and 2010, the models yielded R2 that ranged from 0.69 to 0.73 (*p* < 0.01) and RMSE from 3.18 Mg/ha to 4.84Mg/ha. In the testing phase (B) of the model in 2000, 2004, and 2010, the *R*<sup>2</sup> value was in the range of 0.67–0.73. RMSE and RMSEr were in the range of 5.58–6.66 Mg/ha and 20.41–23.65. Forest AGC simulation in 2015 and 2018 was done by using the models in 2004 and 2010, respectively. The models yielded an *R*<sup>2</sup> of 0.57 and 0.32. RMSE were 5.59 and 5.76 Mg/ha and RMSEr were 23.65 and 20.77 for 2015 and 2018, respectively.

**Figure 7.** Accuracy evaluation of RF model at training phase (**A**) in 2000, 2004, 2010 and testing phase (**B**) in 2000, 2004, 2010, 2015, and 2018.

#### *3.4. Spatiotemporal Evolution of Forest AGC*

The total forest area in the study region decreased in the past two decades according to the land use classifications in different years. However, forest AGC has gradually increased from 2000 to 2018 in many areas, especially in the west of Hangzhou and southwest of Huzhou. We calculated statistics of forest area, forest AGC, and total AGC storage to understand the change and relationship between forest area, forest AGC, and the total forest AGC storage in Hang-Jia-Hu during the past two dacades (Figure 8). The total forest area only increased from 2000 to 2004 and then gradually decreased afterwards. The total amount of forest area was reduced by 77,713.92 ha with the largest declines occurring following 2004 and 2010. The forest AGC density increased 3.4 times, from 10.14 Mg/ha in 2000 to 44.59 Mg/ha in 2018, and the growth rate was the highest between 2015 and 2018. The total forest AGC storage did not decline due to the reduction of forest area, but it followed the same trend as forest AGC density. It increased from 641.38 Mg C in 2000 to 2472.51 Mg C in 2018, with a growth rate of 285%. This reflects that forest AGC density is more influential on to the total forest AGC storage when compared to the forest area.

**Figure 8.** Forest area, forest AGC and total forest AGC storage from 2000 to 2018.

Figure 9 shows the interannual variability of forest AGC and total forest AGC storage of 15 sub-regions in Linan (LN), Yuhang (YH), Jiaxing (JX), Jiashan (Js), Anji (AJ), Fuyang (FY), Pinghu (PH), Deqing (DQ), Hangzhou (HangZ), Tongxiang (TX), Hanning (HN), Haiyan (HY), Huzhou (HuZ), Xioashan (XS), and Changxing (CX). Forest AGC did not show obvious variations across different regions (Figure 9a). Forest AGC was the lowest for all subregions in 2000 and the highest in 2018, with the remaining years fluctuating between 22.57 Mg/ha and 28.43 Mg/ha. The total forest AGC varied greatly across subregions mainly due to the variation in forest area. LN, AJ, and FY stood out with over 2 <sup>×</sup> 106 ha of AGC, while other subregions had less than 1 <sup>×</sup> 106 ha.

**Figure 9.** (**a**) Forest AGC of different regions in different years, (**b**) Total forest AGC storage of different regions in different years.

Figure 10 reveals the spatial distribution of forest AGC change from 2000 to 2018. Overall, forest AGC in most of the region is increasing, and the increase is mainly distributed in the western part of the study area. Furthermore, from the above data analysis, forest AGC is increasing during the past two decades. However, some areas showed a decrease in forest AGC, the reduction phenomenon. Such decline in forest AGC mainly occurred near the center of Huzhou and towards the south of Hangzhou in Fuyang District. The magnitude of decline was mainly between −10 Mg/ha and −20 Mg/ha. The reduction of forest AGC in these areas was mounted to 240,660.36 ha.

**Figure 10.** Spatial distribution change map of forest AGC.

**Figure 11.** (**a**) Study area; (**b**) and (**d**) urban expansion in the central city of Hangzhou and Jiaxing; and, (**c**) and (**e**) forest AGC reduction due to urban expansion in central city of Hangzhou and Jiaxing.

Moreover, urban forests have received increasing attention in recent years due to their large C sequestration capability and the positive impact on the environment, especially in areas with rapid urban expansion. According to our land use classification, the urban area in the Hang-Jia-Hu region has expanded into suburbs of the cities over the past 20 years, especially around the metropolitan areas, such as Hangzhou and Jiaxing. Extensive decreases of forest AGC were particularly evident in Huzhou City, but not clearly distinguishable in other cities. Therefore, to reveal more details regarding the urban forest AGC change, we used 15-m pan-sharpened Landsat images to analyze the changes of land use in Hangzhou City and Jiaxing City (Figure 11). We found that the urban growth took place in the periphery of Jiaxing and mainly towards the northwest of Hangzhou. In Hangzhou, forest AGC decreased along with intense urban growth in the northwest plains of the City. The decline of forest area due to urbanization reached 3157 ha, which corresponded to an AGC storage loss of 25,309 Mg C. In Jiaxing, forest AGC reduced in all surrounding areas, where the pattern is consistent with the direction of the urban growth. The forest coverage in Jiaxing was reduced by 3078 ha and the forest AGC by 20,701 Mg C as a result of urban growth. It is clearly seen that, although the urban expansion mainly replaced the cultivated lands, it also significantly affected forests.

#### **4. Discussion**

In our modeling exercise using RF, we found that land surface texture information is the most important variables of forest AGC. We calculated the correlation between the variables of the corresponding optimal number of variables and forest AGC in different models, and presented a scatter plot of the first eight variables and forest AGC in three models of the training phase to further understand the specific correlation between variables and forest AGC (Figure 12). The results revealed that *R*<sup>2</sup> between the input variables in RF and forest AGC was in the range of 0.13 and 0.32 in the model of 2000, 2004, and 2010. It further reflected that the variables that are involved in RF are closely related with forest AGC, which means that texture information is one of the most essential factors in the process of simulating forest AGC. Furthermore, it also reflected that remote sensing technology, in combination with machine learning algorithms, are an effective means to monitor forest resources in a precise and timely manner at large spatial scales.

However, there are still defects and deficiencies. Firstly, the temporal mismatch of remote sensing and field survey data due to the lack of cloud free Landsat scenes corresponding to the exact times of ground observation can lead to estimation error in the results. Furthermore, different types of forest surfaces have similar spectral signatures, which could have resulted in misclassification [77]. The verification accuracy of the classification results indicates that the BMF is the main reason for the precision declining. Secondly, in the verification phase of simulation results, accuracy in 2015 and 2018 were all below the accuracy in the other three models by a degree. Parameter settings for different modeling due to different raw data vary from year to year. Additionally, different parameter settings will result in the model being only applicable to the simulation of forest AGC in the input data year, but not in other years. Finally, RF also has certain flaws in simulation due to the great requirements for input data. Spatiotemporal distribution of the simulation results show that forest AGC in different periods is concentrated in a certain range (fluctuating around the average of training data), which means that RF has a poor simulation effect on the data at both ends of a dataset, which is consistent with the research result of Gao et al. [65]. High simulation accuracy seems to show that RF is best suited to simulating AGC at certain range, and making the simulation results have the same data distribution as the input data is a critical step. Studies have shown that a combination of multi-source remote sensed data can improve the accuracy of estimation in forest biomass or forest AGC [78]. Furthermore, forest AGC can be simulated with other types of data, such as meteorological data, topography, and soil data [79]. The accuracy of forest AGC simulation can be further improved when considering multivariate remote sensing data and other various impact factors.

**Figure 12.** (**a**) Correlation between the first 8 variables and forest AGC in 2000 model. (**b**) Correlation between the first 8 variables and forest AGC in 2004 model. (**c**) Correlation between the first 8 variables and forest AGC in 2010 model.

#### **5. Conclusions**

Based on the remotely sensed and ground survey data, we used the optimized RF models to simulate the distribution patterns and temporal changes of forest AGC in the Hang-Jia-Hu region. Classification by using a maximum likelihood method can present well the land use cover. Additionally, our models estimated forest AGC for the study area in different years with relatively high accuracies. The standardized residuals of the testing samples in different models (Figure 13) are all in the range of −2 to 2, which further illustrates that the optimized RF model has good stability and reliability in predicting forest AGC. With the estimation results, we found that, during the study period from 2000 to 2018, although the total forest area has decreased, the total AGC storage has significantly increased as a result of the increased AGC density. However, forest AGC has decreased near the cities due to the urban expansion in spite of the increase of forest AGC over the entire region.

**Figure 13.** Standardized residual of the testing phase in different models.

**Author Contributions:** Conceptualization, H.D.; Data curation, X.L., L.D., J.Z., H.L., Z.H. and S.H.; Methodology, M.Z.; Supervision, G.Z.; Validation, F.M.; Writing—original draft, M.Z.; Writing—review & editing, H.D.

**Funding:** The authors gratefully acknowledge the support of the National Natural Science Foundation (No. U1809208, 31670644), the State Key Laboratory of Subtropical Silviculture (No. ZY20180201), the Joint Research fund of Department of Forestry of Zhejiang Province, the Chinese Academy of Forestry (2017SY04), the Zhejiang Provincial Collaborative Innovation Center for Bamboo Resources and High-efficiency Utilization (No. S2017011), and the National Natural Science Foundation (No.31901310).

**Acknowledgments:** The authors gratefully acknowledge the supports of various foundations. The authors are grateful to the Editor and anonymous reviewers whose comments have contributed to improving the quality.

**Conflicts of Interest:** The authors declare that they have no competing interests.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
