1. Introduction
The expansion dynamics of urban forests describe their construction process and represent the age of trees relative to their construction time, which is important for assessing the benefits and potential problems of vegetationover time. Therefore, it is closely related to the effectiveness of urban green space policies [
1,
2], urban risk assessment [
3], carbon sequestration [
4,
5], and the intensity of ecosystem service [
6,
7]. Unlike natural forests, urban forests undergo frequent renewals and exhibit high heterogeneity [
8]. Thus, mapping continuous expansion dynamics is essential for smart management and the precise optimization of urban forests.
As urban forests expand with urban development [
2], optimizing the existing urban forests has become a crucial issue [
1,
9,
10]. Over the past 40 years of urbanization in China, the construction and management of urban forests have remained in an extensive state [
1,
11], leading to a lack of detailed archives and quality records for most urban forest vegetation. This has hindered improvement in urban forest quality. In recent years, as the pace of urbanization in China has slowed, a series of refined urban governance policies, such as urban health examinations and urban renewal, have created a need for the detailed evaluation and management of urban forests. Therefore, to fill the data gaps from the past, it is particularly necessary to use efficient technical methods to conduct retrospective assessments of urban forest dynamics in the post-urbanization stage.
In the past fifteen years, vegetation change analysis algorithms based on time series spectral trends have rapidly developed due to the availability of Landsat time series data [
12], including CCDC (Continuous Change Detection and Classification) [
13], BFAST (Breaks for Additive Seasonal and Trend) [
14], and LandTrendr (Landsat-Based Detection of Trends in Disturbance and Recovery) [
15]. Among these algorithms, LandTrendr is widely used for its balance of efficiency and accuracy. By segmenting single-spectral time series and filtering, LandTrendridentifies abrupt and persistent changes in vegetation development [
15]. However, most of the LandTrendr application only identifies the most significant changes in natural forests [
15,
16,
17,
18] by extreme value filtering, limiting LandTrendr’s potential to detect various dynamic semantic features in time series, as not all critical surface changes exhibit the most significant variations. Recently, research integrating LandTrendr with machine learning, especially random forest, has gradually increased to extend the utility and accuracy of LandTrendr, as machine learning classifiers can integrate high-dimensional features during the filtering process. The accuracy of LandTrendr in identifying vegetation disturbances [
16,
17] or land use changes [
19,
20,
21,
22] has been enhanced using multispectral features and random forest models. However, these methods still rely on the extremum extraction of segments for judgment and do not further explore the quantitative relationships among segments. The change patterns during urbanization are particularly relevant, where transitions from non-vegetated to vegetated areas follow specific patterns [
10]. Integrating higher-dimensional trend features using machine learning has the potential to identify dynamic processes in urban forest dynamics.
Given the lack of studies using LandTrendr for urban forests, this study aims to test the application of LandTrendr in urban forests and improve its accuracy in identifying urban forest expansion through machine learning and the features among segments. This study set two main objectives:
- (1)
It aims to test the optimal accuracy of baseline methods that use extremum-based filtering in identifying urban forest expansion dynamics.
- (2)
It aims to improve the filtering method by using a random forest-supervised classification method with single-band/index images and then use the trend features of land use transitions as variables to enhance the accuracy of filtering and test the performance of the improved method in urban forest expansion dynamic detection.
Finally, using the built-up area of Beijing as an example, we applied the optimal improved method to create an interannual urban forest expansion dynamic map for this area to explore its application potential in urban forest management optimization decisions.
2. Materials and Methods
This study was divided into four steps (
Figure 1):
- (1)
Segmentation Preparation: We extracted and synthesized Landsat images for LandTrendr, using LandTrendr’s segmentation to obtain parameters for filtering.
- (2)
Baseline Method Testing: We tested the accuracy of baseline methods based on extremum in identifying urban forest expansion. The two baseline methods with extremum magnitude-of-change were single-band/index filtering with threshold [
15] and multispectral secondary classification [
16].
- (3)
Improved Method Testing: We tested the accuracy of improved methods with basic segment features and trend features using random forest-supervised classification.
- (4)
Mapping of Urban Forest Expansion: We mapped the dynamics of urban forest expansion based on the method with the best accuracy.
We evaluated combinations of different maximum numbers of segments and different bands/indices to ensure that the best performance of each method is achieved.
2.1. Study Area
Beijing is located in the western part of the North China Plain (
Figure 2a), characterized by a temperate monsoon climate with distinct seasonal vegetation features. The altitude of the plain area within the municipal boundary ranges from 20 to 60 m. The study area (
Figure 2b) focuses on Beijing’s first greenbelt area (“the 1st Greenbelt”) and the Central District. This area is delineated by the outer boundary of the 1st Greenbelt, with all areas situated within the plain area. The total area is 645 km
2, covering Beijing’s main built-up area and the primary scope of urban forest construction.
This study focuses on the urban forest construction expansion process in the study area from 1994 to 2022. The Central District is the main region of early urban construction in Beijing. To limit the ring expansion outward from the Central District, the 1st Greenbelt and the 2nd Greenbelt were planned to guide the city toward a decentralized cluster development pattern. Since 1994, Beijing has issued a series of master plans and administrative regulations to implement urban forest construction in the 1st Greenbelt [
1,
23], with plans to complete the 1st Greenbelt by 2035 [
24]. Overall, the area bound by the outer edge of the 1st Greenbelt has been the region with the highest intensity and earliest stages of urban forest construction in Beijing to date. As the post-urbanization stage approaches, there will be an urgent need for the stock optimization of urban forests within this area.
2.2. Landsat Stack and Composite Images
Landsat Collection 2 Tier 1 Level 2 (C02/T1_L2) images were obtained and processed through the Google Earth Engine (GEE) platform, as this category of images is considered suitable for time series studies [
25]. To fully capture the trends at the temporal boundaries, the image stack is a superset of the target time range (1992–2023). The annual image extraction time range is from 1 June to 30 September, as this period is Beijing’s vegetation growing season, which can fully reflect vegetation characteristics. A total of 731 available images from Landsat TM, ETM, and OLI sensors were used, including three visible bands, one near-infrared band, and two shortwave infrared bands. Clouds, snow, and shadows were removed based on the ‘QA_PIXEL’ band [
25]. The bands of the images were composited into an annual representative image using the medoid composite method [
26]. The medoid image is composed of the true values of the pixels closest to the median from all annual images, ensuring that the pixels typically represent the general condition.
2.3. Verified Points
The points for the actual change time used for training and validation were collected from Google Earth
TM historical images of the study area and its surroundings. Based on historical image availability, the time range of the validation points was from 2001 to 2022. Three different vegetation types (evergreen vegetation, deciduous vegetation, and grass) were selected through stratified random sampling. The sample selection method involved visually inspecting all available historical images in the time series of the current sample points to determine the change time when the urban forest was constructed as of 2022. Since tree planting sometimes occurs in winter, change times after 30 September were set to the following year. Before 2001, there were fewer large-scale urban forest construction areas in Beijing, so no-change samples were fewer than changed samples. A total of 648 points were extracted (
Figure 2), including 170 no-change samples and 478 changed samples. Precise coordinates, whether an urban forest was planted, and the year of change time were recorded.
To establish a stable training–evaluation process for all methods, a k-fold (k = 4) cross-validation method was used. All collected samples were divided into four training–test sets using the stratified sampling of unchanged and changed samples, ensuring that the number of unchanged and changed area sample points in each fold’s dataset was approximately equal.
2.4. LandTrendr Segmentation
Segmentation is the core step of LandTrendr [
27], which groups spectral observations from Landsat annual time series stacks into a series of segments representing surface processes, achieved by removing spikes, identifying potential vertices, trajectories fitting, and simplifying [
15]. To identify the optimal spectra/indices for describing the urban landcover change process and test the optimal performance, four single bands and four composite indices (
Table 1) were used, namely the commonly used NBR, TCW, and NDVI in LandTrendr [
15]; the high signal-to-noise ratio bands NIR, SWIR1, and SWIR2 [
16]; NDMI [
16,
21]; and GREEN [
20], which perform well in urban contexts. Composite images were processed using the LandTrendr segmentation function on the GEE platform [
18]. Among all the preset parameters of LandTrendr, the maximum number of segments (max_segments) is one of the decisive parameters [
15,
21]. Although simple urbanization processes can summarize the spectral characteristics of land use change through three segments [
10,
28], the study area showed that multiple land use change processes cover the entire urbanized area. Therefore, 4–17 were selected as alternative parameters for the max_segments. Since urban forest construction is often accomplished within 1–2 years, the recovery threshold was set to 1 to accommodate one-year-long construction events [
15]. In summary, 112 combinations of the max_segments and bands/indices were used for all filtering method tests.
2.5. Baseline Method Testing
Both baseline methods utilized the general extremum-based filtering workflow constructed on the GEE platform [
18]. This program filters out the most significant disturbance events from the LandTrendr-fitted segments based on the greatest change. Urban forest construction represents a gain in vegetation, so we filtered the segments with the greatest magnitude of change within the time frame. We obtained the initial value (Preval), the year of detection (YOD), the rate of change (Rates), the magnitude of change (Mag), and the disturbance signal-to-noise ratio (DSNR) as the six feature parameters of these segments for further filtering.
2.5.1. Single-Band/Index Filtering with Threshold
Because not all greatest gains represent urban forest construction, all segments filtered by the greatest magnitude of change were further filtered by the thresholds of their basic features. We used the verified points to determine features with significant differences between changed and no-change groups using the K-S test. Then, we established a threshold grid centered on the average of the means of the two groups, divided into five intervals ranging from 0.5 to 1.5 times the group mean. Through grid search, we identified the optimal threshold combination. The greatest magnitude-of-change segments were then subjected to secondary filtering through thresholds, and the YOD of these filtered segments was used as the change time.
2.5.2. Multispectral Secondary Classification
The multispectral secondary classification is another method used to reduce the error of greatest magnitude-of-change filtering [
16]. Single-band/index images with the same max_segments were combined to evaluate whether a pixel contains urban forest construction events by random forest-supervised classification. Subsequently, the change time corresponding to the urban forest construction events is derived from the mode of all images’ YOD [
19].
2.6. Improved Method Testing
2.6.1. Single-Band/Index Supervised Classification
Due to the context of rapid urbanization, multiple land cover changes may occur before urban forests are planted. During this process, spectral changes often exhibit frequent and intense fluctuations. Therefore, filtering based on extreme values might mistakenly identify other land use changes as urban forest changes (
Figure 3). Due to temporary vegetation during the land preparation process and sparse planting at the beginning of construction, the spectral characteristics at the time of urban forest establishment may not be distinct, potentially causing further difficulties in filtering.
To reduce the potential errors caused by extremum filtering, the improved method employed supervised classification to identify the common characteristics of segments representing urban forest construction events. After segmentation, all segments in the time series and their features describing each segment were retained. These features included magnitude, rates, start value, end value, start year, end year, duration, and DSNR (
Figure 1). The images with multiple segments were processed on the GEE platform [
18] and then saved offline for further processing. The construction of subsequent filtering methods improvement was carried out in Python 3.9.
To construct a training–test set for supervised classification, segments representing true urban forest construction events were identified based on the matching of true change times from verified points and the end-year features of the segments. Segments identified as construction events and other segments were categorized into two subsets: changed and no change (
Figure 4).
The random forest classifier uses an ensemble of decision trees to make reliable predictions [
33], which is particularly advantageous in handling high-dimensional data, as it can automatically select and rank the most relevant features [
34]. Random forest was run on all folds of the training set via a hyperparametric grid containing n_estimator (100, 200, 500), max_depth (None, 10, 20, 30), max_samples_split (2, 5, 10), and max_samples_leaf (1, 2, 4). Since the no-change segments vastly outnumber the changed segments, Balanced_class_weight was applied to address this imbalance of changed and no-change segments in the training samples. Next, sample points where urban forest construction events happened were filtered from all samples by the changed mask obtained using the multispectral secondary classification method. The optimal hyperparameter combination with the average optimal result overall folds of the test set was used as the final result to characterize the performance of the improved method.
2.6.2. Single-Band/Index Supervised Classification with Trend Features
The specific spectral change patterns of urban forests provide guidance for identifying the exact change time of land cover (
Figure 5). Regarding the amplitude of spectral fluctuations, the surface spectra typically maintain a stable trend after construction, indicating the growth process of vegetation after planting. Conversely, urban planning and development before urban forest construction can cause fluctuations in spectral values. Numerically, the transition from non-vegetated to vegetated areas before and after urban forest construction results in a significant contrast in spectral values.
Therefore, extracting the mean values of each spectral fitting segment, which represents urban land use patterns, helps to determine whether a transition from non-vegetated to vegetated areas has occurred before and after the segment (Equations (1) and (2)).
where
is the basic feature to be quantized, which can be rates, magnitude, durations, start values, and end values;
is the index of the target segment;
is the total number of segments;
is the average of
for all segments before segment
; and
is the average of the
for all segments after segment
.
By extracting the standard deviation to capture the fluctuations in land use patterns before and after each segment, it is possible to identify whether the process represented by the segment leads to a stable spectral state indicative of stable vegetation growth (Equations (3) and (4)).
where
is the standard deviation of
for all segments before segment
, and
is the standard deviation of
for all segments after segment
. When segment
is the second/penultimate segment, the features of segment 1/
are used for virtual extension. When segment
is the last/first segment, its
/
is set to 0.
In addition to the above-added feature variables and all the basic features, we also set a batch of categorical variables by whether they were the overall maximum/minimum value and their ranking in the time series.
The improvement with trend features used exactly the same steps in training set construction and model training as the improved method with basic features. A total of 88 variables (
Table A1) were screened by feature importance tests at two time points, and 38 feature variables were used for the final filtering test (
Table 2).
2.7. Evaluation of the Filtering Performance
The filtering performance of all methods was comprehensively evaluated in two dimensions: whether the change could be identified and whether the change time could be accurately extracted.
In one-fold data, due to the imbalance of no-change and changed samples, the ability to accurately predict urban forest change was evaluated using the area under the ROC curve (AUC) [
35]. When AUC equals 1, it indicates that the algorithm can completely distinguish between no-change and changed samples.
For the identification of change time, we used accuracy instead of a continuous-variable approach (Equation (5)). For the reasons mentioned in
Section 2.3, a predicted change time that differs from the true change time by ±1 year was considered accurate.
where TP represents the sample in which the extracted change time difference is less than or equal to 1 year, and TN represents the areas identified as unchanged. FP represents the areas identified as changed but actually unchanged, and FN represents the areas identified as unchanged but actually changed.
The overall performance of each method was determined with the mean and standard deviation of all folds to reduce the potential test errors caused by sample selection.
2.8. Urban Forest Mask and Expansion Dynamic Mapping
The change time of urban forests identified by the best-performing method was used to create an expansion dynamic map of urban forests. An urban forest mask was used to limit the identified areas to the urban forest range, thereby reducing interference from other land use types. We defined all vegetation within the study area as urban forest, including evergreen vegetation, deciduous vegetation, and grass. In the rapid urbanization process, there is a phenomenon of vegetation being temporarily removed and then re-vegetated. To avoid confusing this situation with unchanged vegetation, a static urban forest mask was only extracted in 2022. Pixel-level land cover classification for the 2022 Landsat medoid composite image was achieved using random forest-supervised classification [
36], categorizing the land cover into grass, evergreen trees, deciduous trees, water bodies, and impervious surfaces, with an overall accuracy of 0.85 (
Table A2). Eight-neighborhood mode filtering was used to eliminate the effect of salt-and-pepper noise from isolated change pixels in the mapping results.
5. Conclusions
Our study integrates the trend features of urban surface dynamics into the identification of urban forest expansion using random forest classification, establishing a filtering method based on single spectral features. By identifying the spectral quantity and fluctuation characteristics before and after urban forest construction, the accuracy of LandTrendr in detecting urban forest expansion was significantly improved. Compared to baseline methods, the accuracy of the improved method increased by over 25%. Validating the reliability of this method in different land use change scenarios and integrating the workflow with GEE’s cloud computing process are the primary directions for future development.
Accurate urban forest expansion dynamics enable detailed post-implementation assessments of urban forest initiatives, promote quantitative research on process mechanisms, and significantly advance the scientific evaluation and renewal strategies of urban forests.