*Article* **Comparative Analysis of Seasonal Landsat 8 Images for Forest Aboveground Biomass Estimation in a Subtropical Forest**

#### **Chao Li 1, Mingyang Li 1,\*, Jie Liu 2, Yingchang Li <sup>1</sup> and Qianshi Dai <sup>3</sup>**


Received: 10 December 2019; Accepted: 25 December 2019; Published: 31 December 2019

**Abstract:** To effectively further research the regional carbon sink, it is important to estimate forest aboveground biomass (AGB). Based on optical images, the AGB can be estimated and mapped on a regional scale. The Landsat 8 Operational Land Imager (OLI) has, therefore, been widely used for regional scale AGB estimation; however, most studies have been based solely on peak season images without performance comparison of other seasons; this may ultimately affect the accuracy of AGB estimation. To explore the effects of utilizing various seasonal images for AGB estimation, we analyzed seasonal images collected using Landsat 8 OLI for a subtropical forest in northern Hunan, China. We then performed stepwise regression to estimate AGB of different forest types (coniferous forest, broadleaf forest, mixed forest and total vegetation). The model performances using seasonal images of different forest types were then compared. The results showed that textural information played an important role in AGB estimation of each forest type. Stratification based on forest types resulted in better AGB estimation model performances than those of total vegetation. The most accurate AGB estimations were achieved using the autumn (October) image, and the least accurate AGB estimations were achieved using the peak season (August) image. In addition, the uncertainties associated with the peak season image were largest in terms of AGB values <25 Mg/ha and >75 Mg/ha, and the quality of the AGB map depicting the peak season was poorer than the maps depicting other seasons. This study suggests that the acquisition time of forest images can affect AGB estimations in subtropical forest. Therefore, future research should consider and incorporate seasonal time-series images to improve AGB estimation.

**Keywords:** aboveground biomass; Landsat 8 OLI; seasonal images; stepwise regression; map quality; subtropical forest

#### **1. Introduction**

As an important characteristic of forest ecosystems, forest aboveground biomass (AGB) provides a basis for ecosystem and forestry research; AGB estimation further provides data critical to estimating the forest carbon sink [1,2]. In recent years, accurate and rapid AGB estimation has, therefore, become crucial for forest ecosystem and global climate change research.

Traditionally, high precision AGB field measurement methodologies have involved extensive field surveys [3]. However, these methods are time-consuming, laborious and destructive; in addition, they cannot be used to analyze the spatial distribution and dynamic change of AGB on a large scale [4]. Today, remote sensing-based methodologies are more commonly used to estimate AGB as they rapidly

provide near real-time, dynamic and regional scale data, and the characteristics of the obtained images are strongly correlated with AGB [5]. Remote sensing data can be divided into two categories: Passive remote sensing (optical sensors, thermal and microwave) and active remote sensing (radar and light detection and ranging (LiDAR)) [5–7]. Optical sensors such as Landsat, Systeme Probatoire d'Observation de la Terre (SPOT), the moderate-resolution imaging spectroradiometer (MODIS), QuickBird and the Advanced Very High-Resolution Radiometer (AVHRR) have been widely used for AGB estimation because of their coverage, repetitive observation and cost-effectiveness [6,8]. Of these sensors, Landsat images are the most commonly used for remote sensing-based AGB estimations because the sensors provide a long-term data record and have medium spatial resolution, wide spatial coverage and high spectral sensitivity [9]. In many countries, the spatial resolution obtained using Landsat is similar to the size of sample plots in national forest inventories; therefore, using Landsat to estimate AGB can reduce spatial errors associated with matching pixels to sample plots [10].

The information derived from Landsat images significantly correlates with AGB because these images provide valuable information regarding the forest canopy [11]. In fact, previous studies have shown that individual spectral bands, vegetation indices, transformed images (using principal component analysis (PCA)) and textural images are strongly correlated with AGB and can, therefore, be used to effectively estimate AGB [12–15]. Furthermore, many statistical models can be used in developing remote sensing-based AGB models. These models can be divided into two categories: (i) Parametric models (linear, nonlinear, etc.) [16–18] and (ii) nonparametric models (random forest, RF; artificial neural networks, ANN; support vector machines, SVM; etc.) [14,19–21]. Multiple linear regression models, however, are most frequently used in AGB research.

Optical sensors mainly provide information about the forest canopy [11]. The canopy structure of subtropical forests significantly varies between seasons, and even between months [6,22,23]. These variations can cause differences in remote sensing data [24]. Therefore, AGB estimation can vary widely when time-series images are used to model AGB in the same study area [25]. Previous studies have used a single Landsat image (taken during the peak growing season or at a time close to when the ground survey of national forest inventory plots took place) to estimate AGB [21,26–28]. These images, however, do not always accurately reflect forest characteristics. For example, dense canopy cover during the peak growing season often results in extremely saturated images [25,29,30], which ultimately affects AGB estimation accuracy. Some studies have, therefore, utilized time-series of Landsat images to estimate AGB, e.g., Zhu and Liu [25], Safari et al. [31] and Powell et al. [32]. These studies, however, focused on particular forest type or a regional forest. Therefore, there exists a knowledge gap regarding whether time-series Landsat images affect the accuracy of AGB estimations in different forest types and whether the estimations differ among forest types.

Given this gap in knowledge, this study explores the use of seasonal Landsat 8 Operational Land Imager (OLI) images in estimating AGB in a subtropical forest in northern Hunan, China, using stepwise regression. The main objectives of this study were to: (1) Explore the potential variables of seasonal time-series data for different forest types when estimating AGB; (2) investigate the potential of seasonal time-series data in improving the accuracy of AGB estimations in different forest types; and (3) investigate the uncertainties associated with using seasonal time-series data to estimate AGB.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area is located in Hunan Province, central China (path/row: 124/40), and comprises an inclined transition zone from the hills of central Hunan to Dongting Plain. The climate is a typical subtropical monsoon humid climate [33] with an average annual temperature and annual precipitation of 16.5 ◦C and 1200–1700 mm, respectively. The study area is further characterized by four distinct seasons: Spring (March to May), summer (June to August), autumn (September to November) and winter (December to February). Chinese fir (*Cunninghamia lanceolate* (Lamb.) Hook.) and Chinese red pine (*Pinus massoniana*) plantations, evergreen broadleaf, deciduous and mixed forests dominate this area with scattered bamboo and shrub lands [34]. A total of 303 forest plots were inventoried in 2014 by the National Forest Continuous Inventory (NFCI) in China (Figure 1).

**Figure 1.** Study area (red box) in Hunan Province, China (**a**); the spatial distribution of sample plots (**b**); and the distribution of forest types (**c**).

#### *2.2. Calculation of Plot-Level AGB*

A total of 303 sample plots were used in this research including, 125 CFF (coniferous forest) plots, 138 BLF (broadleaf forest) plots, and 40 MXF (mixed forest) plots (Table 1). The area of the sample plots is 0.067 ha, and the plots were systematically allocated based on a 4 × 8 km grid (NFCI). The AGB values of the study plots were calculated according to tree species or species groups described in a previous study [35]. Statistical information regarding the sample plot data based on different forest types is summarized in Table 1. The AGB values of all sample plots ranged from 5.01 Mg/ha to 151.06 Mg/ha with an average AGB of 48.27 Mg/ha. The mixed forest had the highest mean (±standard deviation) AGB (52.69 ± 29.45 Mg/ha).



#### *2.3. Remote Sensing Data and Information Extraction*

To explore the effectiveness of utilizing seasonal images to estimate AGB, we acquired four cloud-free Landsat 8 OLI images which covered different forest states within the study area from spring to winter during 2013 and 2014 (Table 2). These four Landsat 8 OLI images were Landsat surface reflectance data downloaded from the United States Geological Survey (USGS) website (https://earthexplorer.usgs.gov/). Landsat 8 OLI surface reflectance data are generated using the land surface reflectance code (LaSRC), which utilizes the coastal aerosol band to perform aerosol inversion tests, uses auxiliary climate data from MODIS, and a unique radiative transfer model [36].


**Table 2.** Landsat 8 Operational Land Imager (OLI) imagery acquired for this study.

To estimate forest AGB in the study area, we calculated and extracted 165 spectral variables: Six original bands, 12 vegetation indices, the first three bands from principal component analysis, and 144 texture variables using a gray-level co-occurrence matrix (Table 3).


**Table 3.** Summary of the predictor variables.

#### *2.4. Vegetation Classification Data*

The European Space Agency (ESA) Climate Change Initiative (CCI) was developed to address climate change at a global level [48]. As part of this initiative, the ESA has derived and consolidated global CCI land cover (CCI-LC) information including annual landcover maps from 1992 to 2015. For the present study, we obtained the CCI-LC data of the study area from MERIS and SPOT satellite images at 300 m spatial resolution [49]. Further, the 2014 CCI-LC map of the study area was downloaded from the ESA website (http://maps.elie.ucl.ac.be/CCI/viewer/index.php) to obtain forest stratifications (coniferous forest (CFF), broadleaf forest (BLF), and mixed forest (MXF)) for AGB estimation.

#### *2.5. AGB Estimation Model*

Pearson product-moment correlation coefficient was used to analyze the relationships between plot AGB and spectral variables, and the spectral variables which had significant correlations with AGB were used as independent variables. Stepwise regression analysis is a frequently used approach in AGB research to determine and select the spectral variables which best contribute to forest AGB. Stepwise regression ultimately results in a regression model containing the variables which best explain the dependent variable (AGB in this study). During the stepwise regression, multicollinearity, which creates highly sensitive parameter estimators with inflated variances and leads to improper model selection, was analyzed between each pair of selected spectral variables using the variance inflation factor (VIF). In this study, if the VIF of a spectral variable exceeded ten, this spectral variable was considered seriously collinear with other variables [50,51].

The stepwise regression model developed in this study assumed that a linear relationship exists between independent (spectral variables) and dependent variables (AGB of different forest types). The model is defined in Equation (1) and describes the relationship between AGB and spectral variables:

$$y = a + b\_1\mathbf{x}\_1 + b\_2\mathbf{x}\_2 + \dots + b\_n\mathbf{x}\_n + \varepsilon\_\prime \tag{1}$$

where *y* is AGB, *a* is the constant term, *x*1, ... , *xn* represent the independent variables, *b*1, ... , *bn* represent the parameters of the independent variables, and ε is the error.

To analyze the accuracy of the AGB models derived from the seasonal images for different forest types, the following workflow was used (Figure 2).

**Figure 2.** The workflow for aboveground biomass (AGB) models derived from different scenarios.

#### *2.6. Model Comparison and Evaluation*

Model performance was evaluated using '10-fold' cross validation [52], and predicted AGB values were compared to observed AGB values using three accuracy indicators: Coefficient of determination (R2), root mean square error (RMSE and RMSE %) and bias. Accuracy indicator Equations (2)–(5) are as follows:

$$R^2 = 1 - \sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2 / \sum\_{i=1}^{n} \left( y\_i - \overline{y}\_i \right)^2 \tag{2}$$

$$RMSE = \sqrt{\sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2 / n},\tag{3}$$

$$RMSE \%= \frac{RMSE}{\overline{y}} \times 100,\tag{4}$$

$$B \text{ias} = (y\_i - \hat{y}\_i) / \overline{y}\_\prime \tag{5}$$

where *yi* is the observed AGB value, *y*ˆ*<sup>i</sup>* is the predicted AGB value based on models, *y* is the arithmetic mean of all observed AGB values, and *n* is the sample number. In general, a higher R2 value and lower RMSE and RMSE% values indicate a greater accuracy of the model.

We generated ten predicted forest AGB maps using the results of 10-fold cross validation, and the average of these AGB maps was taken as the final spatial distribution of AGB. In addition, the standard deviation (Stdev) of spatial AGB predictions was calculated to analyze the uncertainty of each pixel [53,54]. Larger Stdev values indicate higher estimation uncertainty and smaller Stdev values indicate lower estimation uncertainty.

#### **3. Results**

*3.1. Comparison of AGB Estimates Using Seasonal Images of Total Vegetation*

The variables of AGB models for total vegetation using seasonal images were selected using stepwise regression according to the correlation between AGB, the dependent variable and spectral variables. We found that four variables were included in the AGB models for January, April and October, whereas six variables were included in the AGB model for August (Table 4). The selected variables of these models indicated that the textural images of Landsat 8 OLI played an important role in forest AGB estimation of total vegetation regardless of the season.

**Table 4.** The selected variables for AGB estimation models for different months based on the total vegetation.


Note: bi\_M, original band i; NDWI\_Apr, normalized difference water index of April; SR\_Oct, simple ratio of october; bi\_XYjM, textural image developed from spectral band i with a window size of jxj pixels using texture entropy (EN), angular second moment (SEM), variance (VA), correlation (COR), contrast (CON) or homogeneity (HO).

Based on 10-fold cross validation, the results of model fitting are shown in Table 5. We found that the use of seasonal Landsat 8 imagery resulted in different AGB estimates. For total vegetation, the stepwise regression model of the October image showed the highest R2 value (0.39) and the lowest RMSE (21.67 Mg/ha; 44.1% of the mean) and bias (−0.19 Mg/ha) values. The model based on the peak season (August) image showed the lowest R<sup>2</sup> value (0.27), followed by the January and April models. Overall, the results demonstrated that the acquisition time of Landsat 8 imagery significantly influenced AGB estimation, and that the peak season (August) image showed inferior performance compared to that of the other AGB estimation models.

**Table 5.** Summary of the accuracy assessment results for the seasonal models based on the total vegetation.


Note: R2, coefficient of determination; RMSE, root mean squared error; RMSE%, relative root mean squared error.

The relationship between the predicted AGB and observed total vegetation AGB for different seasons using stepwise regression model is shown as scatterplots in Figure 3a1–d1. Each month, we detected overestimations when the plot AGB value was lower than 30 Mg/ha, and underestimations when the plot AGB value was higher than approximately 90 Mg/ha. The August model showed the largest bias (Figure 3c2). The bias calculated for each prediction model showed a skewed distribution (Figure 3a2–d2), but when the bias was less than −25 Mg/ha or greater than 25 Mg/ha, bias frequencies of the October model were smaller than those of the other three months.

**Figure 3.** Model performances were evaluated using 10-fold cross validation and predicted AGB values were compared to observed AGB values using accuracy indicators. Scatterplots depict the relationship between predicted and observed AGB estimation values in each month (**a1**–**d1**). Histograms depict model biases (**a2**–**d2**).

The above analysis was based on the overall performance of different stepwise regression models generated for each month, but it cannot provide detailed information regarding the effect of different forest types on estimation of total vegetation AGB. Table 6 summarizes the RMSE and RMSE% results for different forest types. For CFF and BLF, the RMSE and RMSE% were lowest when the October image was used for AGB estimation. For MXF, the RMSE and RMSE% were lowest when the April image was used for AGB estimation. While the October model resulted in lower R<sup>2</sup> and RMSE values than the April model, the April model performed better in MXF AGB estimation than the October model.


**Table 6.** Summary of RMSE (Mg/ha) and RMSE% results from different seasonal images under non-stratified conditions.

Note: CFF, coniferous forest; BLF, broadleaf forest; MXF, mixed forest; RMSE, root mean squared error; RMSE%, relative root mean squared error.

#### *3.2. Comparison of AGB Estimates Using Seasonal Images of Di*ff*erent Forest Types*

The independent variables selected by the AGB models using seasonal images of the three forest types are summarized in Table 7. The selected variables varied among each forest type in different months. However, in general, texture measures were involved in all AGB models, indicating that when considering different forest types and months, textural information significantly contributed to improving the AGB predictions in this study.


**Table 7.** The selected variables for AGB estimation models in different months based on different forest types.

Note: CFF, coniferous forest; BLF, broadleaf forest; MXF, mixed forest; bi\_M, original band I of month M; DVI\_Jan, difference vegetation index of January; SR\_M, simple ratio of month M; PCA3\_Otc, band 3 of principal component analysis in October; bi\_XYjM, textural image developed from spectral band i with a window size of jxj pixels of month M using texture entropy (EN), angular second moment (SEM), variance (VA), correlation (COR), contrast (CON), mean (ME), dissimilarity (DI) or homogeneity (HO).

We further compared the AGB models derived using seasonal images of three forest types (Table 8). For the different forest types, we found that regardless of month, MXF model performances were better than those of CFF and BLF. The performances of the CFF and BLF models did not significantly differ. Regarding all model performances, R<sup>2</sup> value differences ranged from 0.13 for the BLF models to 0.2 for the MXF models, RMSE (RMSE%) value differences ranged from 1.85 Mg/ha (3.03%) for the BLF models to 3.92 Mg/ha (7.44%) for the MXF models. Overall, the model obtained using data from the October image had the least bias, whereas the August model performed had the largest bias; the January and April models were intermediate. The R2 values were all less than 0.55 and the RMSE% were all larger than 35%; these results indicated that though the performances of the models for different forest types were better than those of total vegetation, nearly half of the AGB variation cannot be explained. When compared to the previously constructed total vegetation models (Table 5 vs. Table 8), the models based on different forest types resulted in larger R<sup>2</sup> and lower RMSE (RMSE%) values and performed better overall, indicating that consideration of forest type can improve AGB estimation.


**Table 8.** Summary of the accuracy assessment results for the seasonal models based on different forest types.

Note: CFF, coniferous forest; BLF, broadleaf forest; MXF, mixed forest; R2, coefficient of determination; RMSE, root mean squared error; RMSE%, relative root mean squared error.

We further compared the AGB models derived using seasonal images of three forest types (Table 8). For the different forest types, we found that regardless of month, MXF model performances were better than those of CFF and BLF. The performances of the CFF and BLF models did not significantly differ. Regarding all model performances, R<sup>2</sup> value differences ranged from 0.13 for the BLF models to 0.2 for the MXF models, RMSE (RMSE%) value differences ranged from 1.85 Mg/ha (3.03%) for the BLF models to 3.92 Mg/ha (7.44%) for the MXF models. Overall, the model obtained using data from the October image had the least bias, whereas the August model performed had the largest bias; the January and April models were intermediate. The R2 values were all less than 0.55 and the RMSE% were all larger than 35%; these results indicated that though the performances of the models for different forest types were better than those of total vegetation, nearly half of the AGB variation cannot be explained. When compared to the previously constructed total vegetation models (Table 5 vs. Table 8), the models based on different forest types resulted in larger R<sup>2</sup> and lower RMSE (RMSE%) values and performed better overall, indicating that consideration of forest type can improve AGB estimation.

The relationship between the predicted and observed AGB values of the three forest types in different seasons using stepwise regression models is shown as scatterplots in Figure 4. Overestimations occurred in plots with AGB values lower than approximately 30 Mg/ha for CFF, BLF, and MXF, whereas underestimations occurred in plots with AGB values higher than approximately 100 Mg/ha for each forest type. The scatter plot constructed using the October data better fit the line y = x, whereas the scatter plot constructed using the August data was more discrete with serious over- and underestimation issues (Figure 4). January and April prediction model biases showed skewed distributions (Figure 4), the model bias of October represented a normal distribution, and the model bias of August was discrete. Further, the October bias values mostly ranged from −15 Mg/ha to 15 Mg/ha, and there were lower proportions of bias values <−25 Mg/ha or >25 Mg/ha.

**Figure 4.** Model performances were evaluated using 10-fold cross validation and predicted AGB values were compared to observed AGB values using accuracy indicators. Scatterplots depict the relationship between predicted and observed AGB estimation values in each month for each forest type (**top**). Histograms depict model biases in each month for each forest type (**bottom**). CFF, coniferous forest; BLF, broadleaf forest; MXF, mixed forest.

#### *3.3. AGB Distribution Maps and Map Quality*

In addition to model diagnostics, we predicted AGB distribution maps and AGB standard deviation (Stdev) maps within the study area. Using seasonal images, we constructed AGB spatial distribution maps based on total vegetation and different forest types (Figure 5). AGB distribution patterns in different months varied, supporting our previous results (Sections 3.1 and 3.2) which suggested that seasonal model performances differed. Further, AGB distribution patterns for total vegetation were narrow (within the range of 25 Mg/ha to 75 Mg/ha; Figure 5a), whereas AGB distribution patterns for different forest types were discrete (within the range of 0 Mg/ha to 100 Mg/ha; Figure 5b). These results indicate that models constructed based on forest types can achieve relatively low (<25 Mg/ha) and high (>75 Mg/ha) AGB values and thus alleviate over- and underestimation. In addition, October distribution maps were more heterogeneous than those of the other months, further indicating that October model performances were superior.

**Figure 5.** Spatial distribution of forest aboveground biomass (AGB) using seasonal images under different scenarios: (**a**) The total vegetation; (**b**) forest types including coniferous forest, broadleaf forest and mixed forest.

Stdev maps of each scenario are shown in Figure 6. For both total vegetation and different forest types, the model uncertainties for October were smaller compared with those for January, April and August, indicating that the October AGB maps were more accurate. Moreover, model uncertainties for August were larger compared with those for the other three months, indicating that the August AGB maps were the least accurate. The Stdev values of different AGB ranges for different forest types were further calculated and analyzed (Figure 7). When mapping the AGB of both the total vegetation and the different forest types, the Stdev values were greater when the AGB values were <25 Mg/ha or >75 Mg/ha. In this case, the Stdev of the August models were the largest, followed by the January, April and October models. This result indicated that AGB maps exhibiting these AGB values (<25 Mg/ha or >75 Mg/ha) showed the largest uncertainty when utilizing the August image. In addition, when attained AGB values were >75 Mg/ha, all Stdev values for each scenario were larger than three, further indicating a large amount of uncertainty associated with these particular AGB values.

**Figure 6.** Standard deviation (Stdev) maps of AGB values using seasonal images under different scenarios: (**a**) The total vegetation; (**b**) forest types including coniferous forest, broadleaf forest and mixed forest.

**Figure 7.** The standard deviation (Stdev) of AGB within different AGB value ranges (TV, total vegetation; CFF, coniferous forest; BLF, broadleaf forest; MXF, mixed forest).

#### **4. Discussion**

Forests are complex ecosystems containing variable species composition and structure; therefore, the image information (especially textural information) of these ecosystems also varies considerably [55,56]. Previous studies utilizing Landsat images to estimate AGB were unable to determine which spectral variables were best able to predict AGB [6,57]. In this study, the selected spectral variables used for AGB models of different months and different forest types varied. Nonetheless, we found that for all forest types, textural images played an important role in AGB estimation, in accordance with previous research [58]. The selected variables belonged to various

original bands (bands 2 to 7), indicating that all original bands can be used to estimate AGB in this study. These results differed from earlier research in which the shortwave infrared (SWIR) bands (e.g., Landsat TM spectral bands 5 and 7) were more important in AGB estimation than other bands [59–61]. In addition, in previous research utilizing Landsat imagery, spectral information (e.g., vegetation index, original band) was often selected to estimate the AGB of coniferous forest given that the structure of coniferous forest was simple and the importance of spectral information over textural information. On the other hand, textural information has often been used in the study of broadleaf forest and mixed forest given that those forests often have multiple canopy layers and more complex structures. In our study area, because of the low level of forest management, the forest structure was complex; therefore, in this study, for each forest type, textural information was mostly used to estimate AGB, regardless of which seasonal image was utilized.

In this study, stepwise regression was used to estimate AGB of different forest types based on Landsat 8 OLI seasonal images. We found that in our study area, the best month for AGB estimation was October given that the R<sup>2</sup> values of different forest types were higher than 0.39. Overall, this result indicates that the October image can explain more than 39% of the information regarding the estimated AGB for each forest type. The less accuracy month for AGB estimation was August given that the R2 value for total vegetation was only 0.27. Stepwise regression is a widely used methodology of fitting regression models based on the correlation between dependent and independent variables. During this procedure, the significance of an introduced variable is tested, and the variable that is of least significant is discarded [62]. While selection of variables depends upon the degree of linear correlation, selection of variables with low correlation is possible; this can ultimately affect the accuracy of the model. The forest characteristics of different forest types were heterogeneous. Different forest types were different in spectral characteristics caused by the heterogeneity of the stand structures and species compositions. The correlations among the spectral variables and AGB of different forest types were also different. In this case, the performances of models for different forest types were significantly different. In our study, among all forest types analyzed, we found that the MXF models achieved the best results for AGB estimation. This indicates that the image information was most strongly correlated with MXF compared with other forest types, and therefore, the image can better reflect the condition of the mixed forest. However, when the forest types were considered in AGB estimation, model accuracy was further affected by the number of plots [59]. In this study, there were 135 CFF plots and 128 BLF plots, whereas there only 40 MXF plots. Therefore, MXF models may have been more accurate given the far fewer number of plots analyzed compared with the models for the other forest types.

In this study, Landsat 8 OLI seasonal images were used to estimate AGB. The four seasonal images utilized were associated with four seasons of the study area (January (winter), April (spring), August (summer) and October (autumn)). The results showed that utilization of the peak season (August) image resulted in inadequate AGB estimation compared with the other seasons, in accordance with results reported by Zhu and Liu (2014) [25]. These researchers further found that the normalized difference vegetation index (NDVI)-based AGB estimates of the forest senescing period were better than those of the peak season in a temperate forest of southeastern Ohio, USA [25]. Furthermore, in accordance with our results, previous researchers detected over- and underestimations when utilizing Landsat 8 OLI imagery to estimate AGB in a subtropical forest in western Hunan, China [58]. In our study, these uncertainties were common among all seasonal images analyzed. The observed underestimations within the higher range of AGB values may have been a consequence of image saturation issues affecting model performance [56,63]. Regarding AGB values within the lower range, model performance was likely affected by mixed pixels, thus resulting in AGB overestimation [64]. While uncertainties were detected among all time-series images, underestimation associated with the peak season (August) within the high AGB range (>75 Mg/ha) was more serious than that associated with the other seasons. Taken together, these results suggest that image saturation more strongly influenced AGB estimation results for August than it did for the other seasons, further indicating that

the uncertainties were less in the other seasons. In addition, the overestimation associated with the peak season was greater than that associated with the other seasons.

#### **5. Conclusions**

In this study, seasonal Landsat 8 OLI imagery was utilized to estimate forest AGB in a subtropical forest in northern Hunan Province, China. Study plots were classified according to forest types (CFF, BLF, MXF and total vegetation) and stepwise regression was used to select appropriate variables and thus effectively model AGB based on the seasonal images. Subsequently, models of the different scenarios (different forest types in different seasons) were compared. Given the variables selected during stepwise regression, we concluded that seasonal image textural information was most significantly correlated with AGB, and that the study area is made up of forests with complex structures. The method of AGB estimation based on forest type is very useful for improving the accuracy of AGB estimation because the model performances for the different forest types (CFF, BLF and MXF) are better than those for the total vegetation, regardless of season. The time-series images, which reflect various seasons, can affect AGB estimations, with the autumn image (October) potentially yielding the most accurate AGB estimations and the peak season (August) image being of poorer quality in a subtropical forest. We also explored the accuracies of seasonal images in remote sensing-based AGB estimation. We hope to provide new insight into using Landsat images to improve the accuracy of biomass estimation.

Future research will focus on the mechanism underlying the cause of these differences when utilizing seasonal Landsat 8 OLI images in AGB estimation of different forest types.

**Author Contributions:** Conceptualization, C.L. and M.L.; Data Curation, C.L., Y.L. and Q.D.; Formal Analysis, C.L., Y.L. and J.L.; Funding Acquisition, M.L.; Methodology, C.L. and M.L.; Project Administration, M.L.; Resources, M.L. and Q.D.; Software, C.L. and Y.L.; Supervision, M.L.; Validation, C.L., Y.L. and J.L.; Visualization, C.L. and J.L.; Writing—Original Draft, C.L.; Writing—Review & Editing, C.L., M.L., J.L., Y.L. and Q.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation (NO. 31770679), and Top−notch Academic Programs Project of Jiangsu Higher Education Institutions, China (TAPP, PPZY2015A062).

**Acknowledgments:** The authors are grateful to the ESA (http://maps.elie.ucl.ac.be/CCI/viewer/index.php) and USGS (http://glovis.usgs.gov/) for providing the Land Cover map and Landsat 8 OLI images.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Estimating Urban Vegetation Biomass from Sentinel-2A Image Data**

#### **Long Li 1,2, Xisheng Zhou 1,3,\*, Longqian Chen 1, Longgao Chen 4, Yu Zhang <sup>4</sup> and Yunqiang Liu <sup>1</sup>**


Received: 18 December 2019; Accepted: 19 January 2020; Published: 21 January 2020

**Abstract:** Urban vegetation biomass is a key indicator of the carbon storage and sequestration capacity and ecological effect of an urban ecosystem. Rapid and effective monitoring and measurement of urban vegetation biomass provide not only an understanding of urban carbon circulation and energy flow but also a basis for assessing the ecological function of urban forest and ecology. In this study, field observations and Sentinel-2A image data were used to construct models for estimating urban vegetation biomass in the case study of the east Chinese city of Xuzhou. Results show that (1) Sentinel-2A data can be used for urban vegetation biomass estimation; (2) compared with the Boruta based multiple linear regression models, the stepwise regression models—also multiple linear regression models—achieve better estimations (RMSE = 7.99 t/hm<sup>2</sup> for low vegetation, 45.66 t/hm2 for broadleaved forest, and 6.89 t/hm2 for coniferous forest); (3) the models for specific vegetation types are superior to the models for all-type vegetation; and (4) vegetation biomass is generally lowest in September and highest in January and December. Our study demonstrates the potential of the free Sentinel-2A images for urban ecosystem studies and provides useful insights on urban vegetation biomass estimation with such satellite remote sensing data.

**Keywords:** urban vegetation; biomass estimation; Sentinel-2A; stepwise regression; Xuzhou

#### **1. Introduction**

According to the World Urbanized Prospects, urban residents are expected to compose 68% of the global population by 2050 [1], and this would bring increasingly intensive urban heat island (UHI) effects, environmental degradation, and ecological damage. As an important carrier of urban ecosystems, urban vegetation—which refers to all naturally growing and human-planted vegetation within an urban area [2,3]—brings considerable ecological, economic, and social benefits [4]. These include improving urban microclimates, mitigating UHI effects, increasing surface runoffs, maintaining the urban carbon–oxygen balance, and equally importantly, enhancing the quality of urban life by providing spaces for relaxation and recreation [5–8]. As such, the focus of urban eco-environmental studies has been long on urban vegetation, particularly the biomass of urban vegetation [9]. Urban vegetation biomass is an effective indicator of the capacity of carbon storage and sequestration, and ecological effect of an urban ecosystem [10,11]; it is, therefore, important to estimate urban vegetation biomass in urban eco-environmental management.

Traditional biomass measurement is simply to remove and weigh all the biomass occurring in quadrats, which is a labor-intensive and time-consuming practice [12,13]. This method does not allow quick monitoring and, more importantly, to some extent, might be destructive to the phenomenon being investigated. Remote sensing, however, provides an alternative to biomass measurement largely because it makes objective and mostly non-destructive observations of vegetated areas at various spatial and temporal resolutions. While vegetation biomass cannot be directly derived from remote sensing image data, remote sensing based estimation requires the use of sample plots to acquire field measurements for allometric growth equations based modeling and image interpretation for estimation (e.g., [14]). Vegetation biomass estimation with remote sensing has been summarized and reviewed in previous studies [15–17]. While optical sensor, radar, and lidar data can be used for biomass estimation separately or jointly [18–22], multispectral data is the most frequently used data type [15]. Although it has been widely recognized for its advantages, remote sensing has been mostly used to measure the biomass of individual vegetation types in natural forest [23,24], grassland [25–27], wetlands [28,29], and deserts [30] but rarely the biomass of urban vegetation [14,31].

Sentinel satellites are an Earth observation satellite constellation developed by the European Space Agency (ESA) as part of the Copernicus Program. Sentinel-2 is a wide-swath, high-resolution, multispectral imaging mission with two twin satellites (Sentinel-2A and Sentinel-2B), supporting land and climate-change monitoring [32]. Sentinel-2A was launched in June 2015 and has offered free image data at the ESA's website as of December 2015. The Sentinel-2 MSI (multispectral imager) samples 13 different spectral bands ranging from the visible to shortwave infrared of electromagnetic spectrum, four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution [32]. It has now been used for a variety of forestry applications such as fire damage monitoring [33,34], forest storage estimation [35,36], and canopy cover calculation [37]. While some researchers have combined Sentinel-2A with radar data for biomass estimation [24], using such free optical sensor data alone has not been assessed. Testing the capability of Sentinel-2A data to estimate urban vegetation biomass would be interesting as Sentinel-2A data is being increasingly important for land monitoring, particularly for forestry.

In this study, we therefore focus on the modeling of urban vegetation biomass estimation from Sentinel-2A image data. Quadrat biomass was calculated using the allometric biomass equations with field measurements, and then vegetation biomass models were constructed with remote sensing derived variables. Specific objectives are testing the capability of Sentinel-2A data to estimate urban vegetation biomass and examining whether vegetation type-specific modeling can improve estimation accuracy.

#### **2. Study Area**

Bordering the provinces of Shandong, Henan, and Anhui, Xuzhou (33◦43'~34◦58' N, 116◦22'~118◦40' E) (Figure 1) is a national key railway hub located in the northwestern part of Jiangsu province, east China [38]. It has a monsoon-influenced humid subtropical climate with an annual mean daily temperature of 14.5 ◦C and an annual total precipitation of 832 mm [39]. As a typical forested city, Xuzhou has received multiple titles and awards such as the National Forest City in 2012, the National Ecological Gardening City in 2015, and particularly the UN-Habitat Scroll of Honor Award in 2018 [40], which is attributed largely to the implementation of several greening and ecological restoration programs in recent decades. Although the importance of urban vegetation to cities is generally acknowledged here, no research has been conducted to estimate and assess the urban vegetation biomass for Xuzhou.

The area within the third ring road of Xuzhou (indicated by the red line in Figure 1a) was selected for this research, covering a geographical area of ~108.51 km2. The area within the third ring road is traditionally considered as the urbanized part of Xuzhou and home to the majority of Xuzhou's urban residents. Its urban green areas have expanded remarkably in recent years and would be an ideal area for this research. The study area is flat in the central area with thick soil and hilly in the north, east, and south parts with thin humus-poor soil. The soil type is leached cinnamon soil, weak alkaline with pH ranging from 7.63 to 8.07 [41].

**Figure 1.** The location of the study area: (**a**) the border of the study area (i.e., the third ring road of Xuzhou) and the sites for field investigations (yellow for low vegetation, green for broadleaved forest, and purple for coniferous forest); (**b**) Xuzhou in east China.

According to our fieldwork, most of the trees in the study area are coniferous, consisting largely of arborvitae trees (*Platycladus orientalis*). These evergreen trees were mainly planted during the 1950s and 1960s with 700–3000 trees per hectare [41]. They are usually 5–12 m high (avg. 8.36 m) with diameters at breast height (DBH) ranging from 5 to 15 cm (avg. 12.47 cm) [41]. Broadleaved forest is dominated by poplar (*Populus euramevicana*), black locust (*Robinia pseudoacacia*), and paper mulberry (*Broussonetia papyrifera*) trees. While the poplar trees are usually large (avg. DBH = 21.40 cm) and high (avg. height = 20 m) and concentrated along rivers and roads, the black locust and paper mulberry trees are scattered in parks and small hills. Shrubs are mostly found in parks, including colorful and decorative species such as *Buxus megistophylla*, and *Berberis thunbergii*. Grassland is relatively small in urban Xuzhou, usually in parks and residential/institutional properties. Typical grass includes *Setaria viridis*, *Ophiopogon bodinieri*, *Iris tectorum*, and *Allium macrostemon*.

#### **3. Materials and Methods**

#### *3.1. Remote Sensing Data*

In this study, we used Sentinel-2A image data—freely obtained from ESA's website—for urban vegetation biomass estimation. These L1C-level data, which have already been radiometrically calibrated, were acquired in six different months of 2017 (Table 1). The image quality is generally good with a mean cloudiness of less than 10%. Although the January and May images were more cloud-contaminated, the study area remains cloud-free in the images—the images are therefore still usable. For data preprocessing, they were first atmospherically corrected and then re-sampled to 10-m, both using SNAP (SentiNel Application Platform), an image processing package developed by ESA for processing Sentinel data [42]. Lastly, the study area was extracted from the image data in ENVI 5.1 software for further processing.


**Table 1.** Remote sensing image data used for urban vegetation estimation.

#### *3.2. Urban Vegetation Classification*

Based on our preliminary field investigations, we decided to classify the vegetation of the study area into three coarse categories, namely low vegetation (mostly shrubs and grass), broadleaved forest (mostly poplar, black locust, and paper mulberry), and coniferous forest (mostly arborvitae trees). While many areas are characterized by a single vegetation type, there are some areas with mixed vegetation, which justifies the use of linear spectral mixture analysis (LSMA) [38,43]—where the spectrum of a pixel is considered a linear combination of spectra of pure endmembers within the pixel weighted by their fractional abundance. To this end, a wide variety of features, such as spectral features (spectral reflectance and spectral indices), textural features (calculated by the gray level co-occurrence matrix), and vegetation abundances (the abundances of coniferous forest, broad-leaved forest, and low vegetation, obtained by LSMA) were derived from the Sentinel-2A image data and combined with topographical features (DEM—digital elevation model, and slope and aspect derived from DEM) to classify urban vegetation classification using the support vector machine (SVM) method. SVM is a machine learning algorithm used for image classification [44,45] and can achieve high accuracy. We compared SVM with other classifiers, namely random forest (RF), artificial neural network (ANN), and quick unbiased efficient statistical tree (QUEST), and found that the SVM produced the best result when vegetation abundances were added for classification. For a detailed description of the classification procedure, please refer to our previous research [2]. The produced classification map helps to identify the dominant vegetation type of each pixel so the biomass of each vegetated pixel can be estimated with the models constructed later.

#### *3.3. Candidate Variables for Modeling*

A total of 116 variables (features) on spectral reflectance, vegetation indices, topographical features, and vegetation abundances were selected as candidate variables (features) for biomass estimation. They are given in Table 2 (see Table A1 for their description and calculation formulas).



**Note**: VRE1–VRE3 represent the spectral reflectance in the three red-edge bands of Sentinel-2A image data and N\_NIR represents the narrow near-infrared band. Low, BLF, and CLF represent the abundances of low vegetation, broadleaved forest, and coniferous forest. The description and formulas for the vegetation indices are detailed in Table A1. Mean (\*), Var (\*), Homo (\*), Cont (\*), Diss (\*), Entr (\*), Sec\_M (\*), and Cor (\*) refer to the eight textural features obtained by the gray level co-occurrence matrix using the 10 original image bands, namely mean, variance, homogeneity, contrast, difference, entropy, second moment, and correlation.

#### *3.4. Field Measurements*

Biomass sampling is necessary for vegetation biomass modeling. Usually, quadrat biomass is the sum of the dry weight of every single plant in the quadrat [12,13]. Despite high accuracy, this method requires the vegetation being investigated to be cut. As such, it is applicable to primeval forest or experimental plots but not desirable for urban green land. As a frequently used indirect biomass estimation method [46], the allometric biomass equations, where the quantitative relationships between the biomass and the growth variables of a plant are established [11], however, provide an alternative biomass sampling approach in an urban context. As they are reliable for determining tree biomass, a growing number of biomass equations have been proposed for various vegetation species across the world [47–54]. In this study, the allometric biomass equations were considered for calculating the biomass of each quadrat.

From extensive literature, the allometric biomass equations for various types of trees and shrubs in Xuzhou were summarized (Tables A2 and A3). For grass, a different estimation approach was adopted in this study: the average unit grassland biomass of Xuzhou is the spatially weighted biomass of Jiangsu, Anhui, Henan, and Shandong provinces [55] since Xuzhou is located at the junction of these four provinces (Table 3). Through the calculation, the average unit biomass of Xuzhou's grassland is 61.89 g/m2.


**Table 3.** The calculation of the average unit (aboveground) biomass of grassland of Xuzhou [55].

The growth variables of plants required in the allometric biomass equations were measured in the field investigations conducted from October to December 2017. The general investigation procedure is as follows: (1) a total of 192 urban vegetation quadrats were randomly pre-selected over the false-color Sentinel-2A imagery of the study area and their central coordinates were retrieved; (2) 10 m × 10 m quadrats were determined (matching the spatial resolution of Sentinel-2A imagery) by navigation in the field with hand-held GPS (Global Positioning System) devices to these coordinates; (3) the growth variables of each single plant (shrubs and trees only) in each quadrat were recorded and the biomass of each single plant using the plant-specific allometric biomass equations was calculated; and (4) the biomass of the all the plants in a quadrat were summed to obtain the total biomass of that quadrat and this was repeated for each quadrat.

Note that our records varied with vegetation type. Within each quadrat, we documented the name, tree height (from the base to the crown), and DBH (diameter at breast height, i.e., ~1.3 m) for trees, the name, basal diameter, height, and crown width for shrubs, and the name, height, and coverage area for grass. Different measuring tools were used in accordance with the plants to be investigated and the parameters to be recorded. The DBHs and basal diameters were measured by a 2-m tape measure with a minimum scale of 1 mm while shrub heights were measured by a 5-m tape measure with a minimum scale of 1 mm. For tree heights, we used a telescopic height measuring rod with a maximum range of 20 m and a minimal scale of 1 mm. Photos illustrating the fieldwork are shown in Figure 2.

**Figure 2.** Photos taken in the field illustrating the measurements.

Although 192 vegetation quadrats were initially selected, only 140 quadrats of them (shown in Figure 1) were visited and investigated in practice—because some of the pre-selected quadrats were not accessible for various reasons (e.g., physical barriers and refusal to access). Among the 140 quadrats were 35 dominated by coniferous forest, 73 by broadleaved forest, and 32 by low vegetation. The results of quadrat biomass calculated mainly by using the allometric biomass equations are detailed in Table A4.

#### *3.5. Modeling*

#### 3.5.1. Correlation Analysis

Prior to modeling, the relationship between the candidate variables (Table 2) and the vegetation biomass was examined through correlation analysis. The biomass of the quadrats dominated by low vegetation, broadleaved forest, and coniferous forest is hereinafter referred to as low vegetation biomass, broadleaved forest biomass, and coniferous forest biomass, respectively. The correlation coefficients were computed with and without vegetation types discriminated.

#### 3.5.2. Stepwise Regression Modeling

Stepwise regression (SR) is essentially a multiple linear regression method, but it is different from the general multiple linear regression in the selection of variables. In a stepwise regression analysis, the most significant or least significant variable is added to or removed with iteration from the multiple linear regression model based on its statistical significance [56,57]. At each iteration of adding or removing a potential independent variable, resultant models are assessed by means of the *p*-value of an *F*-statistic (*p*-value < 0.05 for statistical significance) [56,57]. Stepwise regression has proved effective in selecting variables for modeling and has been widely used in different fields [58,59], including forest biomass estimation [60]. As such, it was considered more suitable for constructing the urban vegetation biomass estimation models in this study.

As it is likely that collinearity exists in the predictive variables, the variance inflation factor (*VIF*) [57,61] is used to examine it in this study:

$$VIF = 1/\left(1 - R\_i^2\right) \tag{1}$$

where *Ri* is the correlation coefficient between the *i*th predictive variable and the remaining predictive variables. There is no multicollinearity if VIF ranges between 0 and 10. If VIF ≥ 10, high multicollinearity exists between variables and some of them should be removed from the model [62].

#### 3.5.3. Boruta Based Multiple Linear Regression Modeling

In addition to the SR modeling, the general multiple linear regression (MLR) is also considered in this study for comparative analysis. It is too complicated to include all the 116 candidate variables (Table 2) in the MLR modeling as it would decrease accuracy, cause overfitting, and slow computation. It is advisable to reduce the dimensionality of data when there are a large number of variables [63]. To this end, a group of important variables is then selected, which is done in this study by using the Boruta algorithm. Boruta is a feature selection wrapper built around the random forest classification algorithm and helps to determine important variables [64,65]. A detailed description of this feature selection technique can be found in [65,66]. The Boruta algorithm can be performed in the statistical software of *R*, where important variables are confirmed for modeling and unimportant one are rejected, and some artificial variables called shadow variables are generated from the original variables [65]).

Despite the capability to locate important variables, the Boruta algorithm does not consider the collinearity among these variables. Like the SR modeling, closely correlated variables are removed if VIF ≥ 10. The final MLR biomass estimation models are finally determined until the VIF of each remaining variable is less than 10.

#### 3.5.4. Accuracy Assessment

While 70% of the calculated quadrat biomass were used for modeling, the remaining 30% were reserved for assessing the models using two measures, namely the coefficient of determination (*Ryz*2) and the root-mean-square-error (*RMSEyz*):

$$\left| R\_{yz}^2 = \sum\_{i=1}^n \binom{n}{i} B\_{\text{model},i} - \overline{B} \right|^2 / \sum\_{i=1}^n \left( B\_{\text{calculate},i} - \overline{B} \right)^2 \tag{2}$$

$$RMSE\_{yz} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left(B\_{calulated,i} - B\_{model,i}\right)^2} \tag{3}$$

where *Bmeasured*,*<sup>i</sup>* is the calculated quadrat biomass, *Bmodeled*,*<sup>i</sup>* is the modeled quadrat biomass, *B* is the average of calculated biomass of all quadrats, and *n* is the number of quadrats.

#### *3.6. Seasonal Variation of Urban Vegetation Biomass*

After the accuracy assessment, the superior models can be determined and used for exploring the seasonal vegetation biomass variation of the study area. With the variables required by the determined models derived from the Sentinel-2A image data (Table 1), the biomass of low vegetation, broadleaved forest, and coniferous forest can be estimated for January, March, May, July, September, and December of 2017, respectively. The total urban vegetation biomass of the study area is then calculated by summing the estimated type-specific biomass. The change rate (*CR*) is defined by the following equation:

$$CR = \frac{Bio\_{max} - Bo\_{min}}{Bio\_{min}} \tag{4}$$

where *Biomax* and *Biomax* are the maximum and minimum biomass of the year 2017.

#### **4. Results and Analysis**

#### *4.1. Urban Vegetation Classification*

By the SVM classifier, the urban vegetation of the study area was classified into three types, namely low vegetation, broadleaved forest, and coniferous forest (Figure 3) in the 24-July-2107 image; the overall accuracy of this classification was 89.86% with a Kappa coefficient of 0.83. While the central part of the study area had limited vegetation, vegetated areas were mostly covered by low vegetation, followed by coniferous forest.

**Figure 3.** Urban vegetation classification by support vector machine.

#### *4.2. Correlations between Candidate Variables and Urban Vegetation Biomass*

#### 4.2.1. For Low Vegetation

There were 14 candidate variables significantly correlated with low vegetation biomass (Table 4). Eight spectral reflectance variables had negative correlations with all-vegetation biomass, coefficients ranging from −0.364 to −0.553. It was negatively associated with low vegetation abundance and positively with coniferous forest abundance. Low vegetation biomass is generally lower than the biomass of broadleaved and coniferous forests, and more low vegetation in the quadrat means lower quadrat biomass. The correlation of low vegetation biomass with topographic features was not significant because low vegetation is usually scattered in the study area. Low vegetation biomass was negatively correlated with two vegetation indices and two textural features.


**Table 4.** Variables significantly correlated with low vegetation biomass.

#### 4.2.2. For Broadleaved Forest

A total of 54 variables were significantly correlated with broadleaved forest biomass (Table 5). Four spectral reflectance variables were negatively correlated with broadleaved forest biomass. Regarding vegetation abundance variables, only low vegetation abundance was negatively correlated with broadleaved forest biomass, but the coefficient was low. As for topographic features, broadleaved forest grows in relatively flat areas (e.g., parks and residential land) and low-elevated hills in the study area and, therefore, no significant correlation exists between topography and broadleaved forest biomass. The biomass was also correlated with seven vegetation indices, higher correlation coefficients with *DVI* and *SR4*. Textural features had close, mostly positive, correlations with broadleaved forest biomass, although the highest correlation (−0.72), with *Cor (VRE2)*, was negative.

**Table 5.** Variables significantly correlated with broadleaved forest biomass.


#### 4.2.3. For Coniferous Forest

Among the 116 candidate variables, 16 were significantly correlated with coniferous forest biomass (Table 6). Seven spectral reflectance variables were all negatively correlated with coniferous forest biomass, with correlation coefficients mostly higher than 0.5. Not surprisingly, only coniferous forest abundance (*CLF*) was highly positively correlated with coniferous forest biomass. *DEM* was the only topographic feature significantly correlated with coniferous forest biomass, and the negative correlation is probably linked to the fact that coniferous forest grows in hills and its biomass decreases with elevation. Coniferous forest biomass was highly significantly correlated with several vegetation indices but, interestingly, no correlation was found with textural features. The *Var* (variance), *Cont* (contrast), *Diss* (difference), *Entr* (entropy) values were all zero while *Mean* (mean), *Homo* (homogeneity), *Sec\_M* (second moment), and *Cor* (correlation) values were all one—coniferous forest is densely distributed in the study area, thus no clear textural characteristics.


**Table 6.** Variables significantly correlated with coniferous forest biomass.

#### 4.2.4. For All-Type Vegetation

Results show that 39 variables were significantly correlated with all-type vegetation biomass (Table 7). In total, ten spectral reflectance variables had negative correlations with all-type vegetation biomass, coefficients ranging from −0.308 (Red) to −0.496 (Green). It was negatively associated with low vegetation abundance but positively with broadleaved and coniferous forest abundances. Low vegetation has lower biomass than coniferous and broadleaved forest and, in a given area (e.g., a pixel size), the all-type vegetation biomass would be lower if low vegetation abundance is larger than the other two vegetation abundances. While it had no significant correlation with topographic features, all-type vegetation biomass was correlated with half of the vegetation indices. The highest positive correlation coefficient was found with SR4 (0.390) while the highest negative with DVI (−0.396) (Table A1). In addition, only 14 (17.50% of the total) textural features were significantly correlated with all-type vegetation biomass and coefficients were generally low.

**Table 7.** Variables significantly correlated with all-type vegetation biomass.


#### *4.3. Urban Vegetation Biomass Estimation Models*

#### 4.3.1. Stepwise Regression Models

The results of performing SR for constructing vegetation biomass estimation models are presented in Table A5. All the (adjusted) coefficients of determination (Rnh<sup>2</sup> and adj-Rnh2) were higher than 0.70, and the fitting was generally good. The variables in the models were less than those (highly) significantly correlated with vegetation biomass (Tables 4–7). The type-specific and all-vegetation biomass estimation models are given below.

The SR biomass estimation model for low vegetation:

$$\begin{aligned} B &= 10 \times [-171.896 - 49.335 \times Low + 76.406 \times CLF + 316.404 \times gNDVI - 13.710 \\ &\times SR2 - 0.365 \times Cor(VRE2) + 1.087 \times DEM \end{aligned} \tag{5}$$

The SR biomass estimation model for broadleaved forest:

$$\begin{array}{rcl} B = 10 \times [660.327 & -16.739 \times \text{Corr}(VRE2) - 3601.606 \times \text{Green} \\ &+ 9.944 \times \text{Corr}(SWIR1) - 695.210 \times \text{OSAVI} \\ &-196.861 \times \text{Var}(VRE2) + 98.126 \times \text{Cont}(SWIR1) \end{array} \tag{6}$$

The SR biomass estimation model for coniferous forest:

$$B = 10 \times \begin{bmatrix} 183.909 & -473.034 \times \text{SWIR1} - 0.016 \times \text{SR3} - 0.232 \times \text{DEM} + 0.299 \times \text{GI} \\ & + 14.747 \times \text{Corr}(V \text{RE2}) \end{bmatrix} \tag{7}$$

The SR biomass estimation model for all-type vegetation:

$$\begin{array}{l} B = 10 \times \left[ 213.811 - 4566.311 \times \text{Gren} - 5.370 \times \text{Corr}(VRE2) + 2655.001 \times \text{Rcd} \right. \\ \qquad + 237.815 \times \text{Cont}(SWIR2) - 108.805 \times \text{Cont}(VRE1) \\ \qquad + 0.366 \times \text{Corr}(N\_{-}\text{IIR}) - 273.149 \times \text{Var}(SWIR1) - 395.915 \times \text{Var}(Blue) \\ \qquad + 157.094 \times \text{Var}(VRE1) - 49.701 \times \text{Cont}(Red) + 163.695 \times \text{Entr}(Green) \\ \qquad - 203.368 \times \text{Sc}\_{-}M(VRE2) \right] \end{array} \tag{8}$$

#### 4.3.2. Multiple Linear Regression Models

The results of performing the Boruta algorithm in the statistical software of *R* are shown in Figure 4. Important variables were labeled as Confirmed in blue, unimportant ones as Rejected in red, and shadow ones as Shadow in grey.

Using the same biomass data as the SR modeling, the MLR biomass estimation models for low vegetation, broadleaved forest, coniferous forest, and all-type vegetation were built with the important variables identified through the Boruta algorithm and the use of *VIF*.

The MLR biomass estimation model for low vegetation biomass:

$$q = 10 \times [110.92 - 77.401 \times \text{L}w - 199.972 \times \text{SR} \cdot \text{6} + 70.94 \times \text{CLF}] \tag{9}$$

The MLR biomass estimation model for broadleaved forest:

$$B = 10 \times [409.043 - \quad 12.234 \times \text{Cor}(VRE2) - 2222.677 \times \text{Green}]$$

$$\begin{aligned} &-696.378 \times \text{NIR} - 124.43 \times \text{Var}(\text{N}\_{\text{NIR}}) \\ &+27.297 \times \text{Cont}(\text{VRE2})] \end{aligned} \tag{10}$$

The MLR biomass estimation model for coniferous forest:

$$BB = 10 \times [170.234 - 301.27 \times VER2 - 0.712 \times Slope] \tag{11}$$

The MLR biomass estimation model for all-type vegetation:

$$\begin{array}{ll} B = 10 \times [156.94 & -4011.984 \times Green + 37.17 \times Cont(VRE2) \\ & + 2201.306 \times Rad + 4.449 \times SR4] \end{array} \tag{12}$$

**Figure 4.** Importance of candidate variables: (**a**) low vegetation; (**b**) broadleaved forest; (**c**) coniferous forest; and (**d**) all-type vegetation. Important variables are labeled as Confirmed in blue, unimportant ones as Rejected in red, and shadow ones as Shadow in grey.

#### 4.3.3. Accuracy Assessment

Figure 5 illustrates the results of assessing the SR biomass estimation models for low vegetation, broadleaved forest, coniferous forest, and all-type vegetation. It shows that *R*yz<sup>2</sup> values of the models for specific vegetation types (viz. the models for low vegetation, broadleaved forest, and coniferous forest) were all higher than 0.7. The coniferous model had the highest *R*yz<sup>2</sup> (0.786) and the lowest *RMSE*yz (6.89 t/hm2). The all-type model had a larger *RMSE* than the type-specific models.

**Figure 5.** Accuracy assessment of the SR biomass estimation models: (**a**) low vegetation; (**b**) broadleaved forest; (**c**) coniferous forest; and (**d**) all-type vegetation.

Similarly, the remaining 30% of field observation data are used to assess the accuracy of the *MLR* biomass estimation models. After this, the two types of models are compared in terms of accuracy measured by the coefficient of determination (*R*yz2) and root-mean-square-error (*RMSE*yz) (Table 8).


**Table 8.** Comparing the accuracies of the SR and MLR biomass estimation models (unit for*RMSE*: t/hm2).

#### *4.4. Seasonal Variation*

As the SR models produced better estimates, they were used to calculate the biomass of each urban vegetation type in January, March, May, July, September, and December of 2017. The type-specific vegetation biomass and total vegetation biomass are shown in Figure 6.

**Figure 6.** Type-specific biomass and the total vegetation biomass in the selected months of 2017: (**a**) low vegetation; (**b**) broadleaved forest; (**c**) coniferous forest; and (**d**) all vegetation.

Overall, vegetation biomass increased over time and decreased after peaking in autumn. The highest biomass of low vegetation was in September (28,423 t) and lowest in January and December (~15,000 t) with a maximal change rate of 87.60%. Despite an increase of 27,150 t biomass from January to September, the change rate of broadleaved forest was 58.93%, much lower than low vegetation (Figure 7). The biomass change rate of coniferous forest (25.58%) was the lowest in the three vegetation types. The total vegetation biomass change was 67,524 t with a change rate of 40.39%.

**Figure 7.** The seasonal change rate of vegetation biomass of the study area.

#### **5. Discussion**

Correlation analysis is useful to identify what variables are related to the dependent variable [59]. While the biomass of low vegetation and broadleaved forest is correlated mostly with spectral reflectance, broadleaved biomass is correlated mostly with textural features. Although there might be close correlations among some of the candidate variables (e.g., NDVI and RVI in the category of vegetation indices), we here did not provide a full correlation matrix for this because the number of variables was so large and would take substantial space of the publication. In addition, the use of stepwise regression and variance inflation factor can avoid the models with correlated variables [57].

Our modeling results show that for both individual vegetation types and all-type vegetation, the SR models have higher coefficients of determination and lower root-mean-square-errors than the MLR models. This clearly suggests that the SR modeling outperforms the MLR modeling in the estimation of urban vegetation biomass. The superiority of SR modeling is also noted in the study of Xu et al., where degraded grassland biomass was estimated using machine learning methods from terrestrial laser scanning data [27]. By comparing SR, random forest, and artificial neural network, they claimed that SR produced the highest accuracy (*R*<sup>2</sup> = 0.84, *RMSE* = 48.89g/m2). However, it might be controversial to conclude that SR is best for vegetation biomass modeling as some researchers favor machine learning algorithms. For example, Lu et al. report that RF (*R*<sup>2</sup> = 0.78, *RMSE* = 1.34 t/ha) performs better than SR (*R*<sup>2</sup> = 0.75, *RMSE* = 1.46 t/ha) in wheat biomass estimation with unmanned aerial vehicle data [67]. We here do not attempt to compare the results of our models with those of others because the data for modeling and the contexts (various vegetation types in an urban area vs. a single type of vegetation in (semi-) environments) were different.

Although some researchers estimated vegetation biomass from remote sensing without discriminating types [29], our study revealed that vegetation biomass should be modeled for specific vegetation types for higher modeling accuracy. This is often done for different contexts by other researchers, e.g., Gao et al. who discriminated broadleaved, coniferous, mixed, and bamboo forest in China's Zhejiang province [68], and González-Jaramillo et al. who divided vegetation of the San Francisco watershed (south Ecuador) into tropical mountain forest, subpáramo, and pastures [23]. In fact, the finding of correlation analysis that variables significantly correlated with vegetation biomass varies largely with vegetation type implies that type-specific biomass estimations models should be constructed. Similarly, non-species-specific allometric growth models yielded larger errors than species-specific ones [69]. Urban vegetation cannot be regarded as a single vegetation type as it varies largely in biophysical characteristics and thus biomass. Such variations, which might be minimized in plantations, should be considered for urban green areas. As such, it is important to discriminate urban

vegetation types through image classification before modeling urban vegetation biomass from remote sensing image data.

Regarding the seasonal variation of vegetation biomass, coniferous forest has much lower biomass loss than low vegetation and broadleaved forest, which is because coniferous forest consists mainly of evergreen arborvitae trees that do not lose their leaves through the year. This suggests that more coniferous trees should be planted if the biomass loss of low vegetation and broadleaved forest needs to be compensated. In this multi-season analysis, the same type-specific estimation models were used for estimating vegetation biomass from remote sensing data imaged in different months. For a plant species in an area, there is only one allometric growth equation, which is often built with measurements acquired, e.g., when plants are luxuriant with maximal biomass in a year. The biomass estimation models constructed with quadrat biomass calculated using these equations should best reflect that time. If these models are used for other dates, estimation biomass would be less accurate (e.g., due to less leaves in winter). Remote sensing variables derived from remote sensing images can however characterize the vegetative status of the plants and compensate the impact.

In addition, there are some other limitations that might undermine the results. Firstly, the allometric biomass equations for a variety of plant species with high reported accuracies were borrowed from previous studies, but we were not able to individually verify these equations as this work is out of the scope of the present study. Secondly, tree biomass could be, to some extent, underestimated from remote sensing image data. While it is likely that under large coniferous and broadleaved and coniferous trees grow some low vegetation like grass and bushes, this cannot be recognized in pixels, notwithstanding the application of linear spectral mixture analysis. Despite these limitations, our study proves the capability of free optical sensor data like Sentinel-2A to estimate urban vegetation biomass. It would be interesting if urban vegetation biomass could be regularly monitored; however, this seems currently challenging as Sentinel-2A data now remains scarce and does not allow a retrospective assessment.

#### **6. Conclusions**

This study demonstrates how Sentinel-2A image data can be used for vegetation biomass in an urban context. The main findings and conclusions of this study are as follows:


Urban green areas are a key component of urban eco-environment and make a vital contribution to improving the quality of life and moderating climate. In general, trees have a stronger carbon sequestration capability and produce more biomass than low vegetation. More coniferous trees can maintain less biomass loss in winter. However, tree species should be diversified to reduce ecological vulnerability and guarantee a more robust urban ecosystem and more sustainable urban development.

**Author Contributions:** Conceptualization, X.Z. and L.L.; methodology, X.Z., L.L., and L.C. (Longqian Chen); software, X.Z., Y.L., and Y.Z.; validation, Y.L. and Y.Z.; formal analysis, L.L., and X.Z; investigation, X.Z., Y.L., and L.L.; resources, L.C (Longqian Chen). and L.C. (Longgao Chen); data curation, X.Z. and L.L.; writing—original draft preparation, L.L. and X.Z.; writing—review and editing, L.L., L.C. (Longqian Chen), and L.C. (Longgao Chen); visualization, Y.Z. and Y.L.; supervision, L.C. (Longqian Chen); project administration, L.L. and L.C. (Longqian Chen); funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by "the Fundamental Research Funds for the Central Universities" (Grant No.: 2018QNB06).

**Acknowledgments:** The authors would like to thank the United States Geological Survey (USGS) for freely providing satellite remote sensing image data required in this study, Guanghui Hong for his assistance in the fieldwork, and Jinyu Zang, Ruiyang Liu, Zhiqiang Wang, and Ziqi Yu for their help in preparing an early version of this manuscript. Comments and suggestions from three anonymous reviewers are greatly appreciated for improving the study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


**Table A1.** Formulas used for calculating spectral indices [70].

Note: VRE1–VRE3 represent the three red-edge bands; N\_NIR represents the narrow near-infrared bands.


**Table A2.** Allometric biomass equations for trees, used for calculating quadrat biomass.


**Table A2.** *Cont*.

Note: *D* is DBH (diameter at breast height); *H* is tree height; *WS*, *WB*, *WL*, refer to the biomass of stem, branch, and leaves; and *WT* and *WR* to the total aboveground biomass and root biomass.




**Table A3.** *Cont*.

Note: *D* is basal diameter; *H* is height; *C* is crown width (which is the average of south-north crown diameter *C1* and east-west crown diameter *C2*; *C* = *(C1* + *C2)*/*2*); *AC* is the area of crown (*AC* = π × *C1* × *C2*); VC is the volume of crown (*VC* = *AC* × *H*); *WS*, *WB*, *WL*, refer to the biomass of stem, branch, and leaves; and *WT* and *WR* to the total biomass and root biomass.

**Table A4.** The calculated biomass for each quadrat. As only 140 of the 192 pre-selected quadrats were visited and investigated, the quadrat ID ranges from 1 to 192.



**Table A4.** *Cont*.

**Table A5.** The results of the *SR* modeling.


#### **References**

1. United Nations. *World Urbanization Prospects: The 2080 Revision*; United Nations: New York, NY, USA, 2018.

2. Zhou, X.; Li, L.; Chen, L.; Liu, Y.; Cui, Y.; Zhang, Y.; Zhang, T. Discriminating urban forest types from Sentinel-2A image data through linear spectral mixture analysis: A case study of Xuzhou, East China. *Forests* **2019**, *10*, 478. [CrossRef]


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
