*Article* **Response of Spatio-Temporal Differentiation Characteristics of Habitat Quality to Land Surface Temperature in a Fast Urbanized City**

**Yongge Hu 1,†, Enkai Xu 1,†, Gunwoo Kim 2,\*, Chang Liu <sup>1</sup> and Guohang Tian 1,\***


**Abstract:** The degradation and loss of global urban habitat and biodiversity have been extensively studied as a global issue. Urban heat islands caused by abnormal land surface temperature (LST) have been shown to be the main reason for this problem. With the accelerated urbanization process and the increasing possibility of abnormal temperatures in Zhengzhou, China, more and more creatures cannot adapt and survive in urban habitats, including humans; therefore, Zhengzhou was selected as the study area. The purpose of this study is to explore the response of urban habitat quality to LST, which provides a basis for the scientific protection of urban habitat and biodiversity in Zhengzhou from the perspective of alleviating heat island effect. We used the InVEST-Habitat Quality model to calculate the urban habitat quality, combined with GIS spatial statistics and bivariate spatial autocorrelation analysis, to explore the response of habitat quality to LST. The results show the following: (1) From 2000 to 2015, the mean value of urban habitat quality gradually decreased from 0.361 to 0.304, showing a downward trend as a whole. (2) There was an obvious gradient effect between habitat quality and LST. Habitat quality's high values were distributed in the central and northern built-up area and low values were distributed in the high-altitude western forest habitat and northern water habitat. However, the distribution of LST gradient values were opposite to the habitat quality to a great extent. (3) There were four agglomeration types between LST and habitat quality at specific spatial locations: the high-high type was scattered mainly in the western part of the study area and in the northern region; the high-low type was mainly distributed in the densely populated and actively constructed central areas; the low-low type was mainly distributed in the urban-rural intersections and small and medium-sized rural settlements; and the low-high type was mainly distributed in the western mountainous hills and the northern waters.

**Keywords:** urban biodiversity; urban habitat quality; InVEST model; land surface temperature; Moran's *I*; Zhengzhou

#### **1. Introduction**

Urban biodiversity is the driving force of sustainable urban development, and it is of great significance for maintaining a balanced, sustainable human settlement environment [1]. Biodiversity loss, habitat degradation, and the urban heat island effect are generally considered to be intertwined urban problems [2]. Urban heat islands have led to continuous changes in urban species abundance, distribution patterns, and energy flow and material cycling; further, some organisms have been forced to migrate or have even become extinct [3–5].

**Citation:** Hu, Y.; Xu, E.; Kim, G.; Liu, C.; Tian, G. Response of Spatio-Temporal Differentiation Characteristics of Habitat Quality to Land Surface Temperature in a Fast Urbanized City. *Forests* **2021**, *12*, 1668. https://doi.org/10.3390/f12121668

Academic Editor: Elisabetta Salvatori

Received: 1 November 2021 Accepted: 26 November 2021 Published: 30 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Land surface temperature (LST) is an important indicator that reflects the intensity of urban heat islands, and characterizes the energy balance and climate change on the earth's surface [6,7]. It has a significant impact on the migration of surface matter and energy conversion. The use of thermal infrared remote sensing can obtain a wide range of LST information compared with traditional measurement methods. It is fast and convenient and has a large measurement range, continuous information, and so on, which can be used to measure the characteristics of regional climate change [8,9]. Although the land area of physically urbanized cities only accounts for 3% of the total land surface area of the earth, the ecological footprint of cities is usually hundreds of times their area [10]. The rise in LST caused by urban activities directly impacts the time and energy that a species must invest in maintaining an optimal body temperature [11]. World Cities Report 2020 from UN-Habitat's pointed out that the world will be further urbanized in the next ten years. The proportion of the global urban population will increase from the current 56.2% to 60.4% in 2030, which means urban construction activities will be further expanded and the urban LST differentiation may be more prominent.

Habitat quality refers to the ability of a certain space to provide resources and ensure survival and reproduction for species, which can be regarded as an important characterization and reflection of regional biodiversity [12]. The "Global Biodiversity Outlook 5" report (https://www.cbd.int/gbo/, accessed on 11 August 2021) reveals that the vertebrate population has fallen by more than two-thirds on average since 1970, especially in built-up areas outside nature reserves, where the trend of biodiversity loss has not been effectively curbed.

Improving the habitat quality on which organisms depend for survival is a basic way to protect biodiversity [13–15]. Generally, habitat quality decreases with the intensity of human land use [16,17]. The higher the habitat quality, the higher the level of biodiversity and the higher the service capacity of the ecosystem. In the 1980s, based on the experimental data of field surveys focusing on the geographical differentiation characteristics of the habitat conditions of species in a specific area, the evaluation results were generally accurate [18,19]. However, the operation is complex, the cycle is long, and it is not easy to promote applications and carry out comparative studies.

After the 21st century, as the integration of RS (remote sensing), GIS (geography information systems) and GPS (global positioning systems) becomes stronger, it is possible to quantitatively, visually, and finely analyze and assess the habitat changes in the spatiotemporal scale of large- and medium-scale. At this stage, the research and application of ecological models such as SolVES (Social Values for Ecosystem Services) [20], ARIES (artificial intelligence for ecosystem services) [21], and InVEST (Integrated Valuation of Ecosystem Services and Tradeoffs) have become the most important in the field of urban ecology. Among many research hotspots of ecosystem service assessment, the InVEST model is the most widely used [22,23]. It was jointly developed by Stanford University, The Nature Conservancy (TNC), and the World Wide Fund for Nature (WWF) to evaluate and weigh ecosystem services. Habitat Quality module is based on the principle of habitat stress to quickly assess the quality of regional habitats. The required data can replace complex methods, such as species surveys, because they are easy to obtain and have a wide range of applications.

At present, research on the characteristics of the spatio-temporal differentiation of habitat quality mainly focuses on grassland [24], wetland [25], watershed [26,27], and forest [28] habitats, among others; research on the driving factors of habitat quality mostly focuses on land use change [29,30], landscape pattern [31,32], topography [33], urban expansion [34], road network [35,36], and species invasion [37]. However, there is a lack of research on the response mechanism of urban habitat quality to LST, especially in the open and complex mega-system metropolis. The purpose of this study is to explore the spatio-temporal mechanism of urban habitat for LST, and adjust the LST to improve the quality of urban habitat. Zhengzhou is a typical agglomeration city under the background of urbanization and an important comprehensive hub in central China. The city is currently facing stricter resource and environmental restrictions, which have a negative impact on the urban habitat and biodiversity. In view of this, Zhengzhou was selected as the research object. Based on remote sensing and land use data from 2000 to 2015, we assessed the spatio-temporal differentiation characteristics of Zhengzhou's habitat quality using the InVEST-Habitat Quality model. GIS spatial statistics and GeoDa bivariate spatial correlation were used to analyze the response of habitat quality to LST on the urban area. The results were expected to provide data support for biodiversity conservation and land use planning in megacities.

#### **2. Study Area**

Zhengzhou (34◦16 –34◦58 N, 112◦42 –114◦14 E), the capital of Henan Province, is located in the central hinterland of China in the eastern section of the Qinling Mountains, the transitional zone between the second-level and the third-level landform steps in China, with a total area of 7576 km<sup>2</sup> and a population of more than 12 million (Figure 1). The terrain is high in the southwest and low in the northeast, descending in a stepped manner. From west to east, it gradually transits from middle and low mountains to tectonic denudation hills, loess hills, slope plains, and alluvial plains, forming a relatively complete geomorphic sequence.

**Figure 1.** Geographic location (**a**), administrative district (**b**), and topography (**c**) of Zhengzhou region.

The city's main habitats include cultivated land, woodland, grassland, water, construction land, and unused land. Woodland is mainly distributed in the western and southwestern mountains, including Mang Mountain, Song Mountain, Fu Xi Mountain, and Big Bear Mountain, which constitute an ecological barrier in the west of Zhengzhou; the northern Yellow River is the main water area that flows through Zhengzhou, and its banks are located in the middle of the three major migration channels of migratory birds in China. It is an important migratory habitat and wintering place for migratory birds. It is one of the key areas of biodiversity distribution in China and has important ecological value. Cultivated land and grassland are mainly distributed in the southern and eastern plains, and construction land is mainly distributed in the central urban area of Zhengzhou and the built-up areas of cities under its jurisdiction.

#### **3. Data Sources and Methods**

#### *3.1. Data Used*

The workflow steps in our analysis are shown in Figure 2. The principle of the InVEST-Habitat Quality model is to combine land-use and land-cover change (LUCC) and threat factors to generate a habitat quality map to reflect the status of regional biodiversity. The interpretation data of land use and cover in 2000, 2005, 2010, and 2015 required to calculate habitat quality are all sourced from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) [38,39]. The grid resolution is 30 m × 30 m. The Landsat series of remote sensing images required to retrieve LST are provided by the United States Geological Survey (https://earthexplorer.usgs.gov/, accessed on 6 January 2021). To maintain consistency of the data source, images of the same month are selected to minimize the effects of seasonal variation in solar radiation. The dates of the selected images are 8 June 2000 (Landsat 7), 22 June 2005 (Landsat 7), 20 June 2010 (Landsat 7), and 10 June 2015 (Landsat 8). To reduce the difference in topography, illumination, and atmosphere of different time phase images, radiation correction and atmospheric correction were carried out on the four phase images, and the data processing was carried out in ArcGIS 10.2 and ENVI 5.3.

**Figure 2.** Workflow steps in our analysis.

#### *3.2. Habitat Quality Calculation*

The InVEST-Habitat Quality runs using raster data, where each cell in the raster is assigned an LUCC class. The user defines which LUCC types can provide habitat for the conservation objective. Generally, a relative habitat suitability value can be assigned to an LULC type ranging from 0 to 1. The higher the value, the higher the suitability for the survival and reproduction of individual organisms or populations.

In addition to a map of LUCC and data that relates LUCC to habitat suitability, the model also requires data on habitat threat density and its effects on habitat quality. Each threat source needs to be mapped on a raster grid and all mapped threats should be measured in the same scale and metric. It is worth emphasizing that the impact of threats on habitat in a grid cell is dominated by two main factors. One is the threat sources selected in specific research and their relative impact, the other is the distance between habitat and the threat source and the impact of the threat across space.

Based on the basic situation of the study area, the InVEST User's Guide and previous research results [32,40–43], we defined paddy field, dry land, built-up land, rural residential areas, and other construction land (quarries, mines, traffic land, etc.) as threat sources, and assigned values to the sensitivity of each habitat to threat factors (Tables 1 and 2). The calculation is as follows:

$$D\_{xj} = \Sigma\_{r=1}^{\mathbb{R}} \Sigma\_{y=1}^{\gamma\_r} \left(\frac{\omega\_r}{\sum\_{r=1}^{\mathbb{R}} \omega\_r}\right) r\_y i\_{rxy} \beta\_x S\_{jr} \tag{1}$$


**Table 1.** Habitat types and sensitivity of habitat types to each threat.

**Table 2.** Threats' weight and the maximum distance of influence.


In Equation (1), *Dxj* represents the degree of habitat degradation; *R* represents the total threat sources and *r* represents one of the threat sources; *y* represents the grid unit of the *r* threat layer; *Yr* refers to the sum of the grids on the *r* threat layer; *ω<sup>r</sup>* represents the weight of threat source *r*; *ry* is used to judge whether grid *y* is the source of threat source *r*; *β<sup>x</sup>* represents the protection of society, law, etc., which is not taken into consideration in this research. The accessibility level from threat source *r* to grid *x, Sjr* represents the sensitivity of habitat type *j* to threat source *r*. *irxy* represents the distance impact function of threat source *r* in the habitat of grid *x* on grid *y*. Assuming that the impact of the threat source on the habitat decays exponentially, the calculation is as follows:

$$d\_{rxy} = \exp\left[1 - \left(\frac{2.99}{d\_{rx\text{max}}}\right) d\_{xy}\right] \tag{2}$$

In Equation (2), *dxy* represents the linear distance between grid *x* and *y*, and *drmax* represents the maximum impact distance of threat source *r*.

Therefore, according to the investment model's calculation principle of habitat quality, the calculation is as follows:

$$Q\_{xj} = H\_{\bar{j}} \left[ 1 - \left( \frac{D\_{x\bar{j}}^Z}{D\_{x\bar{j}}^Z + k^Z} \right) \right] \tag{3}$$

In Equation (3), *Qxj* represents habitat quality; *Hj* represents the habitat suitability of land use type *j*; *k* is the half-saturation constant, which is usually set to half of the highest degradation value; *Z* is the default parameter of the model, *Z* = 2.5.

#### *3.3. LST Inversion Calculation*

As a key parameter in many basic disciplines and applications, LST can provide information about spatio-temporal variation of the surface energy balance. The atmospheric correction method adopted in this study is mainly based on the composition of the thermal radiation intensity observed by the remote sensor on the satellite to calculate LST [44]. The related calculation is as follows:

$$L = \text{gain} \times DN + bias\tag{4}$$

In Equation (4), *L* is the radiation value of the pixel in the ETM+ thermal infrared band at the sensor, *DN* is the gray value of the pixel, and *gain* and *bias* are the gain and bias values of the thermal infrared band, respectively, which can be obtained from the image header file.

$$T = K\_2 \, / \ln(K\_1 / L + 1) \tag{5}$$

In Equation (5), *T* is the temperature value at the sensor; *K*<sup>1</sup> and *K*<sup>2</sup> are calibration parameters, Landsat 7: *<sup>K</sup>*<sup>1</sup> = 666.093 W/(m2·sr·μm), *<sup>K</sup>*<sup>2</sup> = 1282.710 K; Landsat 8: *<sup>K</sup>*<sup>1</sup> = 774.885 W/(m2·sr·μm), *<sup>K</sup>*<sup>2</sup> = 1321.079 K. The sensor temperature (*T*) must be corrected for specific emissivity to estimate the final LST.

$$\text{LST} = T / \left[ 1 + (\lambda T / \rho) \ln \varepsilon \right]. \tag{6}$$

In Equation (6), *λ* is the center wavelength of the ETM+ 6 band (*λ* = 11.335 μm); *<sup>ρ</sup>* = 1.438 × <sup>10</sup>−<sup>2</sup> <sup>m</sup>·K; and *<sup>ε</sup>* is the surface emissivity, which is estimated using the *NDVI* threshold method [45].

#### *3.4. Bivariate Moran's I Calculation*

After preliminary analysis, we found that the spatial numerical distribution of habitat quality and LST were very similar, and both had a certain degree of spatial autocorrelation. Bivariate spatial autocorrelation has high applicability and effectiveness in describing the spatial correlation and dependence characteristics of two geographic elements. At present, this method has not been found to be used to explore the spatial relationship between habitat quality and LST. Therefore, we have innovatively attempt to adopt this method.

Bivariate Moran's *I* is an extension and expansion based on Moran's *I* index, which measures the correlation between the attribute values of spatial units and other attribute values in adjacent spaces [46,47]. It can be used as an effective method to analyze the correlation characteristics between urban habitat quality distribution and LST. Bivariate Moran's *I* is divided into two levels: global Moran's *I* and local Moran's *I*. The calculation formula is as follows:

$$I\_{ab} = \left(\frac{X\_{ma} - \overline{X}\_a}{\delta\_d}\right) \left(\frac{X\_{ab} - \overline{X}\_b}{\delta\_b}\right) \overline{\sum}\_{j=1}^{n} W\_{mo} \tag{7}$$

In Equation (7), *Xma* is the value of the variable *a* of the spatial unit *m*, *Xob* is the value of the variable *b* of the spatial unit *o*, *Xa* and *Xb* are the mean values of *a* and *b*, respectively, and *δ<sup>a</sup>* and *δ<sup>b</sup>* are the variances of *a* and *b*; *Wmo* is the spatial weight matrix between unit *m* and *o*. *Iab* is the Moran's *I* statistic and its value is between (−1, 1): less than 0 means negative correlation, equal to 0 means no correlation, and greater than 0 means positive correlation. Data processing is done in GeoDa 1.6.7.

#### **4. Results**

#### *4.1. Spatio-Temporal Variation of Habitat Quality*

Using the isometric classification method of ArcGIS, the calculation results of habitat quality index were divided into five intervals: 0.00–0.20, 0.20–0.40, 0.40–0.60, 0.60–0.80, and 0.80–1, which represent I, II, III, IV, and V (from low to high), respectively. The differences in habitat quality and the evolution law of spatio-temporal pattern in Zhengzhou were further analyzed (Figures 3 and 4).

From the perspective of the time scale, the mean habitat quality of Zhengzhou gradually decreased from 0.361 to 0.304 from 2000 to 2015, with II accounting for the highest proportion of the total area, with an annual mean of 62.94%. Especially, the continuum of habitats located at the intersection of the urban built-up area was gradually fragmented to form scattered and isolated island-like habitats or habitat fragments.

**Figure 3.** Spatial distribution of habitat quality during the different periods from 2000 to 2015.

**Figure 4.** Percentage of the five levels of habitat quality area from 2000 to 2015.

From the perspective of the spatial pattern, the spatial distribution of habitat quality in Zhengzhou is significantly different. II is the most widely distributed, mainly in the low-elevation plains in the east, where human farming is frequent, the ecosystem is single, the vegetation coverage rate is low, and the ecological environment is relatively fragile. I is concentrated in urban built-up areas; the land use structure in these places is dominated by construction land, human activities are disturbed, and biodiversity is low. III, IV, and V are mainly distributed in the high-altitude western mountain and hilly areas, including Fuxi Mountain, Changshou Mountain, Qinglong Mountain, and Song Mountain. These regions have a sparse population with very high green space coverage fraction ratio. Moreover, several ecological protection policies have been implemented, making the habitats there more suitable for many kinds of animals and plants. Further, there are also some IV or higher value habitats in the northern part of the city, thanks to the abundant waters and wetlands—resources suitable for the survival of aquatic animals, plants, and birds.

#### *4.2. Spatio-Temporal Variation of LST*

To eliminate the influence of dimension and magnitude on the final result, this study uses the range standardization method to process the LST results of Zhengzhou; the value range was (0, 1). Using ArcGIS's equal interval classification method, LST was divided into 10 gradients from low to high (Figure 5). In terms of time scale, LST dropped slightly from 2000 to 2005, rose markedly in 2010, and dropped slightly in 2015. The results show that in different stages of Zhengzhou's transformation from traditional industry and agriculture to modern service industry, its overall LST is in a dynamic change process.

**Figure 5.** Spatial distribution of LST during the different periods from 2000 to 2015.

From the spatial distribution perspective, high gradients (8, 9, 10) are distributed in the central city and southern airport area which indicates that the high island effect is mainly affected by human activities; low gradients (1, 2, 3, 4) showed an obvious spatial agglomeration effect in the western mountainous areas and northern waters which indicates that natural habitats have a significant effect on alleviating the heat island effect and are easy to become the center of the cold island effect. The medium gradient (5, 6, 7, 8) is widely distributed in Zhengzhou, indicating that the traditional urban heat island had exceed the boundary of the built-up areas and become a regional heat island.

#### *4.3. The Distribution of Habitat Quality along LST Gradients*

As seen in Figure 6, there is an apparent numerical gradient effect between habitat quality and LST. From the time scale, the response of habitat quality to LST in the four periods corresponds to four trend lines, which are obviously different. The fluctuation range in 2000 and 2005 is larger than that in 2010 and 2015.

**Figure 6.** Distribution of habitat quality along LST gradients during the different periods from 2000 to 2015.

From the perspective of spatial scale, high values of habitat quality are concentrated in low gradients (1, 2, 3), and low values are concentrated in high gradients (8, 9, 10), indicating that suitable LST can ensure the stability of habitats. High temperatures can lead

to restrictive natural factors such as reduced surface runoff, water shortages, and plants' evapotranspiration, which cause a decline in habitat quality and the ability to support biological communities.

#### *4.4. Bivariate Spatial Autocorrelation Analysis*

Using GeoDa software, the bivariate spatial autocorrelation analysis was carried out with LST as the first variable (X) and habitat quality (HQ) as the second variable (Y). Corresponding to different periods in 2000, 2005, 2010 and 2015, the global Moran's *I* were −0.430, −0.313, −0.323, and −0.311, respectively (Figure 7). Randomization 999 was selected in GeoDa for the significance test. The results showed that the *p* values were all 0.001, indicating a significant spatial negative correlation between LST and habitat quality under the confidence of 99.9%; that is, with the increase in LST, habitat quality in Zhengzhou showed a downward trend.

**Figure 7.** The Moran's *I* scatter diagram of LST and habitat quality during the different periods from 2000 to 2015.

The global Moran's *I* only represents the overall correlation trend of the two variables and cannot reflect the agglomeration differences in specific spatial locations. Therefore, the local Moran's *I* was further calculated, and the results (LST-habitat quality) were divided into four aggregation types: high-high (H-H), low-low (L-L), high-low (H-L), and low-high (L-H). The LISA (Local Indicators of Spatial Association) cluster map of LST and habitat quality clearly show the spatial agglomeration characteristics of regions that have passed the significance test (Figure 8).

**Figure 8.** LISA cluster map of LST and habitat quality during the different periods from 2000–2015.

The H-H type was scattered mainly in the western and northern regions of the study area, showing a significant reduction trend from 2000 to 2015, indicating that the rise of a certain LST in some areas in the initial stage can promote the flow of energy and materials in the habitat, but the continuous high LST straightforwardly led to different degrees of structural confusion, dysfunction, and other degradation trends in different habitats. The H-L type was mainly distributed in densely populated and actively built central urban areas, showing a continuous expansion trend, indicating that suburbanization of residential areas is bound to be accompanied by suburban urbanization. The gradual diffusion of manufacturing and industrial agglomeration centers from central urban areas to suburbs is the main reason for the rise of LST. The excessive LST destroys the structure and function of the original habitat. This has resulted in the decline of habitat quality.

The L-L type was mainly distributed in the urban-rural transition zone and small and medium-sized rural residential areas, showing a decreasing trend. The rural areas were dominated by cultivated land habitats in the initial stage of farming. Therefore, the traditional agricultural operation had little impact on the LST, the LST value remained at a low level, and the quality of cultivated land habitat itself was at a low level; thus, it presented a low-low agglomeration trend. However, with the development of mechanized farming technology, artificial heat source increases the LST, and the accumulation type has a great tendency to change to the H-L type.

The L-H type was mainly distributed in the western mountains and hills and the riparian areas of northern waters. It has remained stable in quantity and space for a long time, indicating that the LST in the natural environment was generally lower than that in the human activity area. It was suitable for the growth and reproduction of species.

The results show obvious differences in the spatial correlation of habitat quality and LST in Zhengzhou, and the differentiation results were also evolving with the development of Zhengzhou.

#### **5. Discussion**

From the perspective of LST, we analyzed the temporal and spatial differentiation of habitat quality characteristics. It reflects the gradient effect between habitat quality and LST, and reveals the agglomeration characteristics of the two in a specific spatial location.

According to the results of Section 4.2, we can speculate that the intensity and distribution of urban thermal environment is affected by changes in population aggregation, resource consumption, and industrial development, etc. In the initial stage (2000–2005), the rural population around Zhengzhou gathered in built-up areas, and the high temperature agglomerate in small areas, so the overall LST decreased. In the stage of rapid development (2005–2010), with more foreign population and their material demand, a large number

of industrial zones had to be built around the built-up area, which were bound to be accompanied by the release of heat energy, so the overall LST rose. In the stage of stable construction (2010–2015), with the strengthening of energy conservation and emission reduction policy, the government was strictly guiding industry to reduce the use of fossil energy, restricting the use of coal and oil, and encouraging subsidies for the development of wind energy, so the overall LST dropped.

According to the results of Section 4.3, we can speculate that in 2000, with increasing construction and production activities, the LST changed suddenly, resulting in a tremendous impact on the quality of urban habitat. After 2010, with the city entering a period of stable development, the government increased the supervision of nature reserves and strictly controlled intensity of urban development. Moreover, the public had also been consciously advocating ecological environment protection and actively responding to global warming. Therefore, The LST gradually returned to the controllable range, which had a weaker impact on the quality of most habitats compared to 2000 and 2005. The prediction was consistent with the variation characteristics of habitat quality in response to LST in Figure 6.

It should be noted that there are still many other possible factors affecting the urban habitat quality, including LUCC, topographic features, meteorology, socio-economic condition, vegetation, etc. The LST is the one factor affecting habitat quality in the region that may interact with other ecological factors. In follow-up studies, the following should be considered: to take grid units of different scales as habitats, to introduce more potential factors in combination with observation stations, and to deeply study the spatio-temporal dynamic evolution law of urban habitat quality to provide a reference for the government to take more accurate ecological management and construction measures to protect urban habitats. In addition, although the LST is easy to obtain through remote sensing inversion, there are random errors in the instantaneous data, which may be affected by the weather and time of the day. As the LST inversion needs the thermal infrared information of ground objects, the shielding of cloudy weather will affect the results. There are also certain differences in the surface temperature during the day, such as evening and noon, and the acquisition time of the existing remote sensing images is not fixed. Further, we will try to set up some field meteorological stations to acquire LST measured data to correct the possible errors caused by both weather and time. In the future, in the construction of urban human settlements and biodiversity protection, we can start from the following aspects:

(1) Carrying out scientific urban planning and building design to reduce the area of the impervious layer. Building green low-carbon buildings to reduce carbon emission reduction in the whole life cycle of buildings is the most ecological and effective means to reduce the "urban heat island" effect and increase urban biodiversity. For example, roof greening can provide a good habitat for some animals, such as butterflies and bees, and provide a certain food source for some birds. It can also use its special habitat ex situ protection box to breed some endangered plant species and increase urban biodiversity. In addition, the city's wind circulation plays a significant role in improving the LST of the city [48]. The reasonable shape and structure model of urban buildings can form a favorable environment for wind circulation. Through the atmospheric circulation, the heat island can exchange with the air in the surrounding areas, to reduce the temperature of the city. Therefore, we can effectively arrange the plane and facade of the building, and the volume and height should be organically combined, so that the wind can form a circulation within a certain range. Furthermore, the layout of the living area, office area, commercial area, and industrial area in the city should be reasonably considered to build a multi-functional mixed urban system; particularly, the industrial area should vigorously develop new energy to slow down the heat accumulation.

(2) Building urban green infrastructure systems with stable energy cycles. As an important part of green infrastructure, vegetation can fully absorb the solar radiation [49]. It converts most of the radiant energy into chemical energy in photosynthesis, so as to transfer a large amount of heat energy, which promotes energy conversion and material

circulation of the urban ecosystem [50,51]. More near natural habitats can be created through reasonable green infrastructure layout, increasing ventilation corridors, and reducing unnecessary hard environment to provide good living conditions for people to coexist harmoniously with other organisms.

(3) Developing new energy technologies. A large amount of man-made heat energy is one of the important factors in the rise of LST, such as the use of air-conditioning in summer, industrial exhaust gas, and automobile exhaust emissions. Therefore, it is necessary to reduce the use of nonrenewable energy, promote the use of new energy, and continue to develop new energy such as wind energy, solar energy, and other renewable energy sources for improving the urban habitat [52–54]. Among them, biomass energy has the advantage of being renewable, clean and low-carbon, and with an abundance of raw materials. The total amount of waste generated in Chinese cities every year is abundant. If it is not used rationally, it is easy to waste resources, even pollute the environment, and reduce the quality of urban habitat. Therefore, the development of biomass technology is particularly important. Currently, the bioenergy utilization technologies currently under development in China mainly include thermochemical conversion technology, biochemical conversion technology, biomass briquetting technology, etc. In addition, the use of geothermal technology is very common in some European countries, and we can use geothermal resources to serve construction. Through the development of ground source heat pump technology, the use of air conditioning in buildings can be greatly reduced.

(4) Optimizing the spatial pattern of biodiversity conservation according to local conditions. On the one hand, LST should be adjusted to reduce its negative effects on biological survival. On the other hand, the nature reserves should be optimized, and the protection and supervision of priority areas should be strengthened according to the life history, genes, physiological characteristics, and geographical scope of local organisms with regard to changes in LST [55,56]. For example, we can improve the ex situ conservation system of biodiversity and build ecological corridors that promote species migration and gene exchange, so as to solve the outstanding problems such as urban heat islands, fragmentation and isolated islands of natural habitat, and low ecological connectivity [57,58]. Moreover, sustainable utilization technologies of the biological resources at all levels and types of habitats should be deeply studied [59,60].

#### **6. Conclusions**

Based on the analysis of the spatio-temporal differentiation characteristics of habitat quality in Zhengzhou from 2000 to 2015, the critical indicator of LST was introduced to study whether there is a gradient effect and spatial correlation between habitat quality and LST. The main conclusions are as follows:

(1) From 2000 to 2015, the urbanization of Zhengzhou was in a period of accelerated development. The mean value of habitat quality gradually decreased from 0.361 to 0.304, and a large number of middle and middle-low level habitats (II and III) turned into low habitats (I), but the degradation trend slowed down in 2010. Spatially, middle-low level habitats (II) were the most widely distributed, mainly in the low-elevation plains in the east, and low-level habitats (I) were concentrated in the urban built-up areas; middle- and high-level habitats (III, IV, and V) were mainly distributed in the high-altitude western mountainous and hilly regions and northern waters.

(2) LST dropped slightly from 2000 to 2005, rose significantly in 2010, and dropped slightly again in 2015. Spatially high gradients (8, 9, 10) were distributed in the central urban area and the southern airport area, medium gradients (5, 6, 7, 8) were widely distributed, and low gradients (1, 2, 3, 4) were concentrated in the higher elevations of the western mountains and northern waters.

(3) Habitat quality in different periods showed different trends along the LST gradient. High values of habitat quality were concentrated in low gradients (2, 3, 4). Low values were concentrated in high gradients (8, 9, 10), indicating that a slightly lower LST can ensure habitat stability. The excessively high LST may have reduced surface runoff, water

shortage, and other natural restrictive factors, resulting in the decline of the functional structure of the habitat and the ability to support the biological community.

(4) The global Moran's *I* of the four periods were −0.430, −0.313, −0.323, and −0.311. Overall, there was a significant spatial negative correlation: the higher the LST, the lower the habitat quality. However, at different spatial locations, the distribution and agglomeration characteristics of LST and habitat quality can be divided into the following four types of agglomeration: the high-high type was scattered mainly in the western and northern regions of the study area; the high-low type was mainly distributed in densely populated and actively constructed central urban areas; the low-low type was mainly distributed in the urban-rural intersection and small- and medium-sized rural settlements; and the low-high type was mainly distributed in the western mountain hills and the northern waters and river banks.

(5) Combined with the research results and the existing policies of local authorities, we put forward the following suggestions to improve habitat and biodiversity: carrying out scientific urban planning and building design; building the urban green infrastructure systems with stable energy cycles; developing new energy technologies; and optimizing the spatial pattern of biodiversity conservation. Our innovative research has drawn some laws about the spatial correlation between Zhengzhou's habitat quality and LST, but it is relatively limited. More potential factors will be added in the follow-up research to explore the driving mechanism of urban habitat quality to provide systematic theoretical support for habitat and biodiversity protection in other similar cities.

**Author Contributions:** Conceptualization, Y.H. and E.X.; methodology, Y.H. and E.X.; software, E.X. and C.L.; validation, Y.H., E.X., G.K. and G.T.; formal analysis, Y.H. and E.X.; investigation, Y.H. and C.L.; resources, Y.H.; data curation, G.K.; writing—original draft preparation, Y.H. and G.K.; writing—review and editing, Y.H., E.X. and G.K.; visualization, C.L. and G.K.; supervision, G.K. and G.T.; project administration, G.T.; funding acquisition, G.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by Urban–Rural Green Space Resources Control and Landscape Ecological Design Disciplinary Innovation and Talents Introduction Centre Program of Henan, China (GXJD006) and the Key Technology R&D Program of Henan Province (212102310838), the Special Fund for Young Talents in Henan Agricultural University (30500930 and 30501053), the Youth Fund of Ministry of Education Laboratory for Earth Surface Processes of Peking University.

**Data Availability Statement:** The data presented in this study are available on request from the first author.

**Acknowledgments:** We would like to thank the International Joint Laboratory of Landscape Architecture for their support and Henan Agricultural University for their great help. We also thank the Institute of Geographical Sciences and Natural Resource Research, Chinese Academy of Sciences, and United States Geological Survey.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Exploring the Determinants of Urban Green Space Utilization Based on Microblog Check-In Data in Shanghai, China**

**Dan Chen 1, Xuewen Long 1, Zhigang Li 2, Chuan Liao 3, Changkun Xie <sup>1</sup> and Shengquan Che 1,\***


**Abstract:** Urban green space has significant social, ecological, cultural and economic value. This study uses social media data to examine the spatiotemporal utilization of major parks in Shanghai and explore the determinants of their recreational attraction. Methods: Based on microblog check-in data between 2012 and 2018 across 17 parks in Shanghai, we investigated the patterns at different temporal scales (weekly, seasonal and annual) and across workdays and weekends by using log-linear regression models. Results: Our findings indicate that both internal and external factors affect park utilization. In particular, the presence of sports facilities significantly contributes to higher visit frequency. Factors such as the number of subway stations nearby, scenic quality and popularity have a positive impact on check-in numbers, while negative factors affecting park use are number of roads, ticket price and average surrounding housing price. Across different temporal scales, the use patterns of visitors have obvious seasonal and monthly tendencies, and the differences of workday and weekend models lie in external factors' impacts. Conclusions: In order to achieve the goal of better serving the visitors, renewal of urban green spaces in megacities should consider these influential factors, increase sports facilities, subway stations nearby and improve scenic quality, popularity and water quality. This study on spatiotemporal utilization of urban parks can help enhance comprehensive functions of urban parks and be helpful for urban renewal strategies.

**Keywords:** social media data; urban green space; spatiotemporal utilization; Shanghai

#### **1. Introduction**

Urban green space plays a crucial role in urban residents' leisure and recreational activities [1]. Services provided by urban green space contribute to both physical and mental health of park users [2–4], enhance the value of surrounding real property [5] and mitigate the effect of urban heat islands [6,7]. For megacities such as Shanghai, there are few opportunities to create new urban green space in its core urban area due to high building density. In order to maximize the benefits of urban green space, it is crucial for urban planners and decision makers to improve the existing parks to better serve citizens and encourage park utilization. To achieve this goal, it is necessary to understand the current situation of park utilization and its determinants.

Previous studies have taken both qualitative and quantitative approaches to investigate urban green space utilization. Qualitative studies such as interviews, observations and records often involve purposeful sampling of participants and settings [2] and convert the interpretations from language and actions to data and conclusions. Surveys are the most popular approach used in quantitative studies to acquire park use data [8,9]. A combination of behavior mapping and GIS (Geographic Information System) techniques can be used to help generate park use patterns by using spatial analysis and visualization [10].

**Citation:** Chen, D.; Long, X.; Li, Z.; Liao, C.; Xie, C.; Che, S. Exploring the Determinants of Urban Green Space Utilization Based on Microblog Check-In Data in Shanghai, China. *Forests* **2021**, *12*, 1783. https:// doi.org/10.3390/f12121783

Academic Editors: Manuel Esperon-Rodriguez and Tina Harrison

Received: 13 November 2021 Accepted: 14 December 2021 Published: 16 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

These methods can be used for in-depth evaluation of recreation space, recreation type and behavior in urban parks. Senetra et al. (2018) analyzed spatial distribution, influence and quality of urban green space, finding that green spaces were unevenly distributed and not all residents have equal access to high-quality public greens [11]. However, it is difficult to conduct comprehensive evaluation of multiple urban parks at a broader scale due to intensive labor requirements and the short time span of the survey data.

Based on findings from previous studies, major factors that influence urban green space utilization include both internal and external characteristics of the parks. The internal characteristics of urban parks reflect the quality of parks, which is related to the planning and design, construction quality and maintenance management of parks. A review study by McCormack et al. (2010) indicates that safety, aesthetics, amenities and maintenance are important for encouraging park use. Several studies found that well-designed public open space can promote physical activity in the community and may contribute to the health of local residents [10,12]. Research has shown that the quality of paths in parks, the allocation of infrastructure, the proportion of water in green space and the surrounding environment, among many other factors, have impacts on the population's physical activity. Meanwhile, aesthetic value, safety and other indicators also have positive impacts [13,14]. Besenyi et al.'s research has shown that there is a positive correlation between the environmental elements of urban parks and residents' outdoor physical exercise; that is, parks and green spaces that are high quality of larger scales have a positive impact on people's physical activity [8].

The external factors of a park are mainly related to the location, traffic convenience, economic development status and population density as well as land use types in the surrounding areas. Accessibility of parks is a very influential determinant [2]. Besenyi et al., based on an online survey, concluded that higher accessibility would encourage people to engage in physical activity [8]. Research has shown perceptions of park availability and usage by friends are associated with greater park use [8]. Few studies have reported on the impact of external factors of parks use in addition to accessibility; however, subjective outdoor thermal comfort has been found to impact urban park utilization in Hong Kong [15].

Over the past decade, a considerable amount of data on peoples' location and movement have been collected due to mobile communication devices, position perception technologies and social media. Social media data are widely used, including that from platforms such as Twitter [16,17], Foursquare, Sina microblog [18–20], Flickr [21] and Panoramio [22]. Compared with data obtained through surveys, digital data include the advantage of having a larger amount of information that is highly accurate and richer. This provides opportunities for innovation in urban research. Existing studies have identified the practical significance of using social media data analysis in urban residents' activities. Social media data can be used as a valuable data source for urban space research and analysis of residents' activities and, thus, address shortcomings such as small sample size and short time span in survey research.

Recent research conducted spatiotemporal analysis on urban park recreation verified the feasibility of using Big Data. Most relevant applications of social media data mainly explore the spatial pattern of urban residents' gathering, dispersion and flow relating with time, space, activities and several other aspects [23,24]. The applications of social media data in urban areas are mainly based on GIS platforms, using cluster analysis, kernel density analysis and other spatial analysis and calculation methods to identify the spatial structure and characteristics [25]. The hourly real-time Tencent user density (RTUD) data from social media are used to analyze the spatiotemporal distribution of urban park users in the study by Chen et al., and the factors that may be associated with the user density in urban parks were studied by a group of linear regression models in Shenzhen, China [5]. Roberts proposed a method of crowdsourcing in urban green space using Twitter data and identified the events in urban green space that bring cultural benefits, such as individual and community identity, as well as cultural activities participation [26]. In Spain, popular public spaces in the urban fabric of the city of Murcia were studied, with

data sources from Foursquare and Twitter [27]. Roberts et al. used Twitter data to assess the variance of physical activity engagement between summer and winter seasons [28]. Another study analyzed mobile phone data derived from 10 million daily active users to explore spatiotemporal activity patterns of users in Central Park, New York, USA, and found that regions with established amenities and points of interest demonstrate a higher record of shared experience [29].

However, most previous social media studies focused on spatiotemporal analysis and city-scale issues rather than exploring the determinants of urban parks utilization. Using social media data, this paper explores the determinants of urban park utilization. Based on our findings, we propose optimization countermeasures, suggestions for the planning and design of the parks in terms of each determinant and the advantages and shortcomings of social media data in research of urban public space service functions.

The purpose of this study is to take advantage of social media data to indicate the internal and external characteristic influencing factors of urban parks on park utilization. Based on the interactive relationship between recreation space and environmental behavior, influencing factors of the use of public green space in a high-density metropolis can be identified. This puts forward the corresponding urban renewal optimization countermeasures, which also has reference significance for planning and design of urban green space in other megacities.

#### **2. Materials and Methods**

#### *2.1. Study Area*

In 2020, Shanghai had a total population of 24 million, and the per capita area of park green space was 8.5 square meters. Urban public green space is a scarce commodity in the high-density urban center of Shanghai. There are at least 2–5 comprehensive parks in each administration district, which mainly serve the residents of the district and attract visitors from other districts as well. These parks generally cover an area of more than 5 hectares and have a large number of visitors. They have very important social, ecological and recreational services. Previous studies related to urban green spaces focused on spatial layout and accessibility [8]. However, the urban layout of Shanghai has been basically stable and green space is in the stage of slow growth. We should review green spaces from the perspective of urban renewal and micro design. Social media data provide a wide range and high amount of data for the study of public space in a high-density metropolis such as Shanghai and help to scientifically analyze the influencing factors of urban park green space use.

In this research, we studied the central area of Shanghai, China. According to the "Master Urban Planning of Shanghai," the central area of Shanghai is within the outer ring road, with an area of about 660 square kilometers (as shown in Figure 1, including Zone 1 to Zone 8, and Table 1).

Using the "Regulations on Administration of Parks in Shanghai," "Guidance on the Implementation of Classification and Grading of Urban Parks in Shanghai (trial)" and "Classification Standards of Urban Green Space (by central government)," we assessed the urban parks in the central area of Shanghai from 3 aspects, namely park type, park area and park influence. For park type, we chose comprehensive parks rather than rural parks; for park area, most of parks we chose had an area large than 5 ha; and for park influence, parks with diverse facilities for recreation and with a certain number of user in terms of microblog check-in data were used. We then identified 17 comprehensive parks as our research subjects (Figure 2).

**Figure 1.** Scope of Central Shanghai.

**Table 1.** List of 17 comprehensive parks in downtown Shanghai.


**Figure 2.** Distribution of 17 comprehensive parks in central Shanghai.

#### *2.2. Selection of Dependent Variable*

Sina microblog is one of the social network platforms with the largest number of users in China. According to the 2017 microblog user development report released by the microblog data center, the number of monthly active users on the microblog totaled at 376 million as of September 2017 (https://data.weibo.com/report/reportDetail?id=404, last accessed date (10 December 2021)). The check-ins in the microblog are used as dependent variable and completely records the relevant contents, including geographic information (latitude and longitude coordinates), time, texts and sex of microblog users, reflecting part of the spatiotemporal behavior of users in the city. From the texts of microblog users, it can be found that most of them visited parks for recreation, socialization and relaxation.

#### *2.3. Data Collection*

This paper uses Sina microblog check-in data (from the time period 1 January 2012 to 30 June 2018), obtained through the Sina microblog API (Application Programming Interface Application Data Interface); Shanghai downtown comprehensive park data and house price transaction data used in this study were obtained by using web crawler; Shanghai administrative boundary and Shanghai central comprehensive parks data were from the official website of Shanghai Greening and Urban Management Bureau. Parks are classified into two-star, three-star, four-star and five-star by Shanghai Greening and Appearance Bureau, which is the meaning of Park Ranking in Table 1. Number of Sports

Venues was counted as number of sports and fitness venues in each park, including trails, basketball courts, tennis courts, football courts and other sports venues (1 point per item). Popularity is the average number of internet users searching park names by using Baidu over a certain period of time. Scores of Scenic Quality given for the landscape effect of the comprehensive park can be divided into 5 levels: extremely poor (1 point), poor (2 points), general (3 points), good (4 points) and great (5 points). Water area (ha) means the area of water bodies in each park. Green Coverage Ratio (%) means the vertical projection area of vegetation in each park.

#### Quantification of Variables

A total of 13 independent variables were selected to quantify the check-ins. Summary statistics of the variables are included in Table 2.


#### **Table 2.** Summary statistic of variables used.

#### *2.4. Research Design*

In this study, we collected 214,068 Sina microblog check-in points in 17 comprehensive parks of central Shanghai from 2012 to 2018, among which 210,993 of points are valid (removing those with missing values). Workday and weekend data are separately analyzed to determine differences of park utilization.

The spatiotemporal distribution of microblog checked-ins were analyzed at various temporal scales, such as annual, monthly and intra-day scales, to help identify potential determinants of park use. The spatial and vector data in 17 studied comprehensive parks, the geographic coordinate information in the Sina microblog check-ins and the data on road density, public transportation number and housing price around the parks were input into GIS 10.6 and analyzed by using spatial analysis methods.

In this study, Sina microblog check-ins in comprehensive parks were used as the quantitative indicator of park use, while green coverage ratios, ticket price, area of park, area of water, popularity (Baidu Index), number of bus stations, average housing prices, area of commercial lands, scenic quality, parking ranking, number of sports venues and number of roads were used as the independent variables for analysis with multivariate regression models.

The urban parks utilization data were classified into weekday use data and weekend use data, and the check-ins were divided into yearly, seasonal and monthly groups, which served as the dependent variables of the subsequent stepwise regression model.

#### *2.5. Model Selection*

In this study, four linear regression models were considered, which included linear form, logarithmic form, logarithmic linear form and semi-logarithmic form, with the selected variables to test which is the most suitable model. Since some independent variables were coded to have a value of 0 to 1, only linear form and log-linear form were tested. Table 3 indicates that the log-linear model has the highest degree of fitting and the strongest explanatory power. Therefore, log-linear regression was chosen to carry out the multivariate linear regression analysis of comprehensive parks utilization.


**Table 3.** Summary of four functional forms of linear regression models.


**Table 3.** *Cont.*

#### Dummy Variables

From descriptive spatiotemporal distribution of visitors' analysis, we found that year, season and month contributed to the total check-ins of visitors to comprehensive parks. Therefore, dummy variables were created to identify the influences by yearly, seasonal and monthly factors.

#### **3. Results**

Table 4 shows yearly, seasonal and monthly regression results of comprehensive park utilization models using Microblog check-ins from 2012 to 2018.



#### *3.1. Yearly Comprehensive Park Utilization Model Results*

A total of 102 independent variables were calculated in the yearly comprehensive park utilization model. The correlation coefficient R is 0.777, the judgment coefficient R<sup>2</sup> is 0.604 and the adjusted R<sup>2</sup> is 0.558.

According to Table 4, the most influential determinant is the number of sports venues, followed by average housing price, number of roads, area of commercial lands, number of subway stations and area of water. The regression results shows that yearly average check-ins will increase 124% for every increase in the number of sports venues; with every CNY 1000 increase in average house price around the parks, park utilization will increase 4.8%. For each hectare increase in commercial land area, the check-ins will increase 2.3%. For every one station increase in the number of subway stations, park use will increase by 69%. Regarding negative impacts, one more road around the park will contribute to a 6.2% decrease in check-ins, and a per hectare water area increase will result in a 3.5% decrease in park use.

In the yearly workday and weekend models, variables such as the number of sports venues, average housing price and area of commercial land had positive influences on check-ins, while the number of roads had negative impacts.

#### *3.2. Seasonal Comprehensive Park Utilization Model Results*

For the seasonal comprehensive park usage model, correlation coefficient R was 0.818, the judgment coefficient R<sup>2</sup> was 0.669 and the adjusted R<sup>2</sup> was 0.656.

According to Table 4, seasonal check-ins will increase by 139% for every increase in the number of sports venues. With every CNY 1000 increase in average house price around the park, check-ins will increase by 9.2%, and for each hectare increase in commercial property, park use will increase by 5.1%. With every one increase in the number of metro stations, check-ins will increase by 91%. A 0.01 increase in scenic quality was associated with an 8.62% increase in check-ins. In contrast, some studied variables had negative impacts: One more road in the area around the park will cause a 7.7% decrease in check-in numbers. Ticket price also has a negative influence on park use, with a 40.5% decrease per CNY increase, and every bus station contributes to a 49% decrease in check-ins. Furthermore, seasonality is a factor in check-ins, with summer and winter having fewer negative effects on check-ins.

For seasonal workday and weekend models, determinants such as number of sports venues, park ranking, Baidu Index, scenic quality and number of subway stations all have positive influences on park use, while the number of roads and green coverage ratio induce negative impacts. In addition, spring and autumn are positive for check-ins in the workday model, and summer and winter seasons are negative for park use in the weekend model. Average housing price is positive in the workday model, and area of commercial lands is positive in the weekend model.

#### *3.3. Monthly Comprehensive Park Utilization Model Results*

After removing outliers, a total of 1320 observations were used in the monthly comprehensive park utilization models. The correlation coefficient R was 0.840, the judgment coefficient R<sup>2</sup> was 0.705 and the adjusted R<sup>2</sup> was 0.699.

The regression results show that monthly microblog check-ins will increase 486% for every increase in the number of sports venues. With each hectare increase in commercial land area, park utilization will increase by 1.1%, and every one increase in the number of Metro stations, park use will increase by 448%. The check-ins with a 0.01 increase in scenic quality will increase by 14.11% more and a value of 0.01 increases in park ranking will contribute to 5.33% more check-ins. The check-ins will increase by 1.2% with each increase in Baidu Index; every bus station brings a 41.9% decrease in park utilization. For negative impacts, one additional road around the park will cause a 25.7% decrease in check-ins, and each hectare increase in water area will result in a 72.8% decrease in park use. Ticket price also has a negative influence on check-ins with a 32.4% decrease for CNY 1 increase. Lastly for every CNY 1000 increase in average house price around the parks, the check-ins will decrease by 2.7%.

Monthly workday and weekend models indicate that variables such as the number of sports venues, park ranking, scenic quality, Baidu Index, number of bus stations, number of subway stations and month have positive influences on park use, while ticket price, area of water and number of roads have negative impacts. In addition, spring and autumn were found to be correlated with park use in the workday model, while summer and winter seasons were correlated with use in the weekend model. Average housing price had a positive correlation in the workday model, while the green coverage ratio had a negative correlation. Area of commercial lands had a positive correlation in the weekend model.

#### *3.4. Daily Comprehensive Park Utilization Model Results*

According to Figure 3, the time of 1:00–6:00 a.m. is the sleeping time. Numbers of visitors to comprehensive parks increases from 7:00 a.m. to 23:00 p.m. and peaks around 14:00–15:00 p.m., followed by another small check-in peak from 19:00 to 21:00 p.m.

**Figure 3.** Intraday distributions of check-in data.

The comparison of check-in time between weekdays and weekends is shown in Figure 4. The graphs indicate that recreational time for visitors was generally between 9:00 and 21:00. The check-in peak on weekdays appeared at 19:00, while the peak on weekends was around 15:00, and the number of average daily visitors on weekends significantly increased compared with that on weekdays.

**Figure 4.** Cumulative daily check-ins in comprehensive parks from 2012 to 2018. (**a**) Cummulative check-ins; (**b**) Average daily check-ins.

#### **4. Discussion**

Both internal and external factors of urban parks play significant roles in determining park utilization. The number of sports venues is the most influential determinant, followed by the number of subway stations, scenic quality, park ranking, Baidu Index, number of bus stations and area of commercial lands. Factors with the most negative influence on park check-ins are number of roads, area of water, ticket price and average housing price. Seasons, such as spring and autumn, had a positive influence on park use, while check-ins during January and February decreased.

Number of sport venues is the most influential element and had a positive impact on check-ins, while number of roads adjacent to the park had the most negative impact. The demand for sports venues is in line with the health motivation of visitors. The more sports venues there are, the more attractive urban green space will be to visitors.

In general, the number of subway stations, the area of commercial zones and the average housing price were associated with more urban parks utilization. The number of subway stations makes parks more accessible. A larger area of commercial lands around the park creates more attractions for visitors' recreation. This type of mixed land use has been used more for adapted land use planning in recent years and is viewed as a healthier and dynamic land use pattern [30]. At present, conventional commercial facilities, such as restaurants, teahouses, bookstores and snack shops, in parks are insufficient to meet the needs of park visitors, and the commercial office space around the park can make up for this defect. A large number of studies have shown that urban parks have a value-added effect on housing prices [31]. Housing close to urban parks with high access to green space and scenery is more expensive. Dunse et al. (2007) quantified the economic benefits of open space on households in Aberdeen, Scotland, and confirmed the positive influence of public parks on housing price [32]. Therefore, urban parks attract urban residents to gather while simultaneously promoting the value of the surrounding property.

The size of the park is not statistically significant, and the area of water is not always significant; if it is, its impact is negative. Previous studies believe that visitors prefer an urban green space with water body to spaces without any water elements [13,33]. Data from this study indicate that it having more water does not result in more utilization. There are two probable reasons: (1) more water means less space for other use or a large area of water separating the use of the park; (2) water quality is another consideration. The decline in water quality in urban parks in Shanghai [34] may result in less utilization, which is consistent with the study in London [35]. There is a correlation between a park's area and water area and check-ins and, more specifically, in the internal diversity of sites, infrastructure diversity and spatial diversity [11,36].

For seasonal and monthly models, additional variables had influences. Scenic quality always plays an important positive role; park ranking and Baidu Index have positive functions as well, while ticket price and green coverage ratio have a negative effect on attendance. The variable "scenic quality" properly reflects the quantitative relationship with

recreational attraction; it had a positive effect on park use. Previous research indicates that landscape naturalness, landscape characteristics, plant collocation and plant richness are all related to aesthetic degree [37]. Therefore, the aesthetic design of the landscape should pay more attention to the visual attraction of the site in order to optimize visitor satisfaction.

Park rankings are related to management quality. The results of Gui (2016) proposed that Shanghai residents place more value on the number of recreational spaces and facilities, the degree of maintenance of the facilities, environmental quality and the functional complexity of recreational space [38]. Urban residents have certain requirements for the management quality of parks, and park rankings help to summarize this management quality. The cost of park tickets is an obstacle to visitors' recreation. Zeng (2015) studied visitor data released by the Shanghai Municipal Administration of Greening and Appearance, and the findings suggest that the rapid growth of the number of visitors after parks offered free admission increased the difficulty of park management. Thus, there is still a balance to be considered on park admission fees.

For seasonal and monthly models, spring and autumn exerted positive influences on the workday model, while summer and winter caused negative influences. This result is consistent with simple descriptive spatiotemporal distribution analysis; that is, visitors prefer to attend parks in the spring and autumn seasons when the weather is nice. For the monthly models, October was found to have a significant positive influence, which is also reasonable because of the National Holiday in China and typically nice weather. The weather in spring and autumn is suitable for recreation, and autumn coincides with the National Day holiday, resulting in higher check-ins; therefore, outstanding seasonal features and improvements to summer and winter landscapes in urban parks may help balance park utilization.

In terms of the seasonal and monthly workday models, spring, autumn seasons and October showed positive influences on check-ins. According to the seasonal and monthly weekends models, the area of commercial land is significant and positive; summer and winter exert negative influences; and for the monthly weekend model, March, October and December are positive. On workdays, nice weather has a significant positive effect on park visitation. The number of bus stations is also important. During weekends, bad weather has significant negative effects, and the area of commercial lands is important. Therefore, we conclude that the differences of workday and weekend models lie in external factors, and the surrounding commercial conditions and good weather are important for park weekend utilization.

Regarding monthly characteristics of visitors' use in comprehensive parks, the peak is in October, which may be related to the impact of collective travel of urban residents during the National Day Golden Week. The check-ins are typically the highest in autumn, leading us to the suggestion that autumn has suitable weather and coincides with the National Day holiday, when people have more leisure time to spend in parks. The climate conditions during spring and autumn encourage park use, while winter and summer have fewer ideal conditions and limit visitors.

Daily analysis provides some interesting findings regarding visitors' behaviors. More people chose to visit the park in the afternoon, and some people visited parks after dinner (Figure 4). Check-ins at 12:00 and 18:00 formed two troughs, indicating that parks have fewer visitors during lunch and dinner hours. We also find that visiting time tendencies are predictable. Restricted by work, the leisure time of visitors is constrained. Overall, park recreation time is concentrated on weekends and golden week holidays. The average number of visitors on weekends is nearly twice compared with that of weekdays. The peak of park use on weekdays is at 19:00 after dinner, and the peak of recreation on weekends is around 15:00 p.m. On weekends, visitors have a higher demand for the recreational landscape environments.

An interesting finding is that the intraday check-in curve of Shanghai Expo Park is significantly different from that of other parks (Figure 3). Check-ins to the park were mainly distributed between 16:00 and 23:00 p.m. because Shanghai Expo Park often holds outdoor activities, such as the Strawberry Music Festival, Jazz Festival and Shanghai Magic Lantern Carnival, which attracts a large number of people from inside and outside Shanghai. As a result, the park had a low local visitor rate.

Social media use is growing fast with technological innovation and subsequent data collection. These large, efficient and growing datasets bring massive information to researchers, which help to inspire research questions and methods. Spatiotemporal datasets from social media allow researchers to study travel behaviors, assess location characteristics, identify tourist spots, assess attractiveness and reveal land use patterns via approaches that were impossible with traditional methods of data collection [39]. In this study, social media data, such as Microblog check-ins, provide researchers a large volume data to analyze yearly, seasonal, monthly and daily utilization situations for several parks in Shanghai, which is difficult to fulfill by using traditional questionnaires and observation methods. It is believed that social media data benefit spatiotemporal studies, especially when the studies include more than one location.

However, it is important to keep in mind the shortcomings of social media. First, the location information may not be exact, resulting in incorrect locational check-ins, especially for the check-ins at the boundary of different land use types. Second, social media users are not evenly distributed between socioeconomic and demographic groups. For instance, Microblog users comprise mostly younger generations; thus, children and the elderly may be excluded from the study. Third, compared to traditional methods such as questionnaires and interviews, it is hard to obtain personal and subjective views from social media data. Ries et al. (2009) indicated that efforts to promote park use should increase awareness of park availability, improve perceptions of park quality and utilize social networks [9]. Social media data are lacking on the "awareness" part. Fourth, social media data generate results on overall trend or patterns while ignoring individual differences. Qualitative methods (e.g., in-depth individual interviews, focus group interviews, direct observation and participant observation) rely heavily on interpretations from participant language and actions and involve purposeful sampling of participants and settings. The contextualization in qualitative research helps to show such individual attributes. Data such as the Baidu vocabulary cloud may help overcome this weakness. Qualitative research may also be less amenable to standardized research procedures and more difficult to synthesize.

#### **5. Conclusions**

Urban parks are essential to the wellbeing of humans and are important places for the public to interact with nature. The use of urban parks is a dynamic process. For a single space, various activities may take place in different times of day, month, season and year. Exploring the determinants of urban parks utilization is helpful for improving the renewal strategy of urban parks, providing elastic space design schemes, promoting citizens' recreation, improving public physical and mental health and encouraging comprehensive benefits of urban parks. The use of new technique, new data and new analysis methods provide guidance for future renewal planning of urban parks, transportation and land use. The proposed methodology can be applied in similar high-density urban areas all over the world to enhance social, ecological, cultural and economic service functions of urban parks.

This study explores the determinants of urban parks utilization using microblog checkin data. Logarithmic linear regression models are constructed based on social media data, park basic data and social life data. Taking 17 comprehensive parks in central Shanghai as the research objects, this paper examines determinants affecting park utilization and puts forward corresponding optimization countermeasures from design and planning perspectives based on the results from regression models.

Our findings suggest that spring and autumn are peak seasons, while summer and winter are slack seasons. Seasonal climate has an apparent impact on park utilization. The frequency of visitors' park recreational activities is high in spring and autumn and low in summer and winter. When planning and designing recreational space for visitors, more emphasis should be placed on the design of various green outdoor recreational spaces, and the planning and allocation of sufficient recreational facilities to satisfy the peak demands.

Regarding the determinants of urban parks utilization, we find that both internal and external factors of parks contribute to urban park utilization. Participation in physical exercise is in high demand by visitors as a result of the social advocacy for a healthy life. In addition to sports venues, more elastic spaces should be designed to encourage diverse healthy activities, such as square dancing, Tai Chi, walking and even sitting. In addition, the inner quality of a park should be improved by adding attractive views such as water bodies, thoughtfully arranged plant scenes, comfortable sitting areas and social events in order to make parks more popular and improve their ranking. For external factors, we suggest enhancing surrounding transportation facilities and promoting mixed land use (residential areas, parks, commercial areas, other public service lands, etc.) for better urban parks utilization. Breaking the boundary between the city and parks, as well as increasing subway stations and bus stations to parks, can make parks more accessible and perceived as nice choices.

**Author Contributions:** Conceptualization, D.C.; methodology, Z.L. and D.C.; formal analysis, X.L.; writing—original draft preparation, D.C. and X.L.; writing—review and editing, C.L., D.C. and C.X.; supervision, S.C. and Z.L.; funding acquisition, S.C. and D.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China, grant number 32001361.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Concatenated Residual Attention UNet for Semantic Segmentation of Urban Green Space**

**Guoqiang Men 1,2, Guojin He 1,2,3,\* and Guizhou Wang 1,3**

	- Sanya 572029, China

**Abstract:** Urban green space is generally considered a significant component of the urban ecological environment system, which serves to improve the quality of the urban environment and provides various guarantees for the sustainable development of the city. Remote sensing provides an effective method for real-time mapping and monitoring of urban green space changes in a large area. However, with the continuous improvement of the spatial resolution of remote sensing images, traditional classification methods cannot accurately obtain the spectral and spatial information of urban green spaces. Due to complex urban background and numerous shadows, there are mixed classifications for the extraction of cultivated land, grassland and other ground features, implying that limitations exist in traditional methods. At present, deep learning methods have shown great potential to tackle this challenge. In this research, we proposed a novel model called Concatenated Residual Attention UNet (CRAUNet), which combines the residual structure and channel attention mechanism, and applied it to the data source composed of GaoFen-1 remote sensing images in the Shenzhen City. Firstly, the improved residual structure is used to make it retain more feature information of the original image during the feature extraction process, then the Convolutional Block Channel Attention (CBCA) module is applied to enhance the extraction of deep convolution features by strengthening the effective green space features and suppressing invalid features through the interdependence of modeling channels.-Finally, the high-resolution feature map is restored through upsampling operation by the decoder. The experimental results show that compared with other methods, CRAUNet achieves the best performance. Especially, our method is less susceptible to the noise and preserves more complete segmented edge details. The pixel accuracy (PA) and mean intersection over union (MIoU) of our approach have reached 97.34% and 94.77%, which shows great applicability in regional large-scale mapping.

**Keywords:** urban green space; remote sensing; deep learning; convolutional neural network; residual structure; attention mechanism

#### **1. Introduction**

Regarding the concept of urban green space, different regions have their own interpretation of its definition and scope. Compared with urban green space, western countries use the concept of urban open space more in land use planning [1–3]. Urban open space is an open space area reserved for parks and other "green spaces", which includes water and other natural environments in addition to vegetation [4]. In China, in order to standardize the management process of urban greening, the government has issued the "Urban Green Space Classification Standard", which divides urban green space into two parts, including green space within urban construction land and square land and regional green space outside urban construction land [5]. In this study, the above definition of urban green space is followed.

**Citation:** Men, G.; He, G.; Wang, G. Concatenated Residual Attention UNet for Semantic Segmentation of Urban Green Space. *Forests* **2021**, *12*, 1441. https://doi.org/10.3390/ f12111441

Academic Editors: Manuel Esperon-Rodriguez and Tina Harrison

Received: 3 October 2021 Accepted: 21 October 2021 Published: 22 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Urban green space is an indispensable element in the urban ecosystem which is always considered to be an important component to improve the quality of the urban ecological environment [6]. It provides protection for the sustainable development of the city in various aspects of ecological service functions, such as reducing greenhouse gases, regulation of urban climate, reduction of energy consumption, maintenance of ecological security, etc. [7–10]. However, with the rapid development of urbanization, urban built-up areas continue to expand, and green spaces are severely damaged, affecting the quality of life of residents. The unreasonable planning and construction of urban green space restrict the healthy development of the city. Therefore, good urban green space monitoring is a necessity for the sustainable development and management of cities [11]. How to accurately and dynamically obtain urban green space information has arisen the interest of researchers.

Since the 21st century, with the rapid development of Earth observation technology, the acquisition ability of satellite remote sensing data has been greatly improved, suggesting that it has entered a new era of multi-platform, multi angle, multi-sensor, all-time and allweather Earth observation. Remote sensing technology can quickly and accurately monitor the dynamic changes of green space in a study area, so itis suitable for large-scale resource investigation and research. At present, a variety of optical and radar remote sensing data sources including Landsat series data [12], GaoFen series data [13] and Sentinel series data [14] have been employed for urban green space extraction. For the application of remote sensing technology in urban green space research, the key technology is how to quickly and accurately obtain the surface vegetation coverage information. Using remote sensing image to extract urban green space is to identify and classify the types of land cover, so as to obtain the vegetation cover map of the real situation of land cover. The methods for urban green space extraction from remote sensing images can be divided into four kinds: threshold method [15,16], pixel-based classification method represented by machine learning [17–20], object-oriented classification method [21,22] and deep learning method [23–26]. The threshold method selecta an appropriate threshold to distinguish the green space through the difference of the spectral response of vegetation and other ground objects in one or more bands. Due to the difference between the reflection of vegetation in the visible and near-infrared bands and the soil background, the researchers improved the combination of bands and proposed a series of vegetation indices to extract surface vegetation coverage information [16]. However, because of the different complexity of the environmental background in urban areas, the representation of vegetation on remote sensing images is easily disturbed by other features in the built-up area, especially in the classification of vegetation in areas with high-density buildings in cities. With the popularity of machine learning technology, more machine learning-based algorithms have attracted great attention from researchers, support vector machines [17], decision trees [18], random forests [19], artificial neural network [20] and other algorithms. These approaches are widely used in the research of urban green space extraction. However, there are a large number of similar objects with different spectra in high-resolution remote sensing images, which makes the traditional pixel-based classification methods unable to accurately distinguish different types of ground objects. As the traditional methods have become mature, it is difficult to improve at the technical level. In order to improve accuracy, traditional urban green space machine learning algorithms require manual design of features including texture and terrain [27]. This process is time-consuming and laborious, and there are certain challenges for accurately extracting urban green space information.

Deep learning has become a popular method for remote sensing information extraction. The principle of applying deep learning technology to urban green space information extraction is that the most original remote sensing image passes through multiple hidden layers and abstracts from low level to high level, aiming to automatically select the characteristics of the target to discover the distributed feature representation of green space. [28]. Compared with traditional machine learning, the advantage of deep learning is that without manual design and acquisition of features, unsupervised or semi-supervised

feature learning and efficient hierarchical feature extraction algorithms are implemented for feature extraction [29]. The convolutional neural network (CNN) is one of the most successful network architectures in deep learning algorithm which consists of input layer, convolution layer, pooling layer, full connection layer and output layer [30]. The input layer is used to input the original data; the convolution layer is used for feature extraction; the pooling layer compresses the input feature map; the full connection layer connects all features and sends the output value to the classifier; and the output layer outputs the classification result. The reasonable architecture ensures that the feature learning process is efficient, so it has become the main algorithm to extract urban green space coverage information. Nijhawan proposed a framework that combines local binary pattern (LBP) and GIST features with multiple parallel CNNs for feature extraction, and then combined with SVM to extract vegetation in the city. As the number of parallel CNNs increases, the accuracy increases significantly [23]. Moreno-Armendáriz built a deep neural network system based on CNN and multi-layer perceptron (MLP) to evaluate the health of urban green spaces and promote the realization of the sustainable development goals of smart cities [24]. Timilsina proposed an object-based convolutional neural network (OB-CNN) for extracting the coverage changes of the number of cities with an accuracy of over 95%, indicating that object-based CNN technology can effectively achieve urban tree coverage Mapping and monitoring [25]. Hartling tested the ability of densely connect convolutional networks (DenseNet) to identify the main tree species in complex urban environments in the fusion image of WorldView-2 Visible-to-Near Infrared (VNIR), Worldview-3 Selective Wavelength Infrared (SWIR) and LiDAR data sets. The study showed that, regardless of the size of the training sample, DenseNet is superior to RF and SVM technologies when processing highly complex image scenes, so it is more effective for urban tree species classification [26].

Compared with the above methods, urban green space is not only often blocked by shadows on high-resolution remote sensing images, but misclassified owing to the spectrum similarity of farmland, or-chards, etc. Therefore, it is difficult to extract urban green space using only spectral information. At the same time, due to the complex background of the ground features and irregular boundaries, the existing methods will produce some misclassifications and omissions during extraction. Therefore, in order to solve the above problems, this paper proposes an improved fully convolutional neural network based on the encoding and decoding structure to extract urban green space from the Gaofen-1 remote sensing image which called concatenated residual attention UNet (CRAUnet). The work presented in this article focuses on the following three aspects: (1) A residual module with feature concatenation mechanism is proposed to improve the loss of original image features. (2) In order to improve the feature expression ability of the network, attention mechanism is embedded to the model, and convolutional block channel attention (CBCA) module is proposed. (3) To illustrate the applicability of the network, we compare with other classical networks to evaluate the efficiency of the network structure.

#### **2. Materials and Methods**

#### *2.1. Study Area and Data Sources*

As a national economic center city and an international city, Shenzhen is China's first Special Economic Zone [31]. Its geographical location is shown in Figure 1. As one of the first cities to develop and endure environmental pressure earlier, Shenzhen has always adhered to both protection and development, and vigorously created a good forest ecological environment. Especially since 2015, Shenzhen has fully initiated the creation of a national forest city. By 2019, the green area coverage exceeded 45% [32].

The Gaofen-1 satellite was successfully launched into orbit on 26 April 2013. The selected panchromatic multispectral camera image includes a panchromatic band with a spatial resolution of 2 m and four multispectral bands with a spatial resolution of 8m. In this research, we selected 12 scenes multi temporal Gaofen-1, 1-C and 1-D remote sensing data of Shenzhen for 2017 and 2020, of which 9 scenes are used as training data and 3 scenes are used as testing data. The information about images of the study is shown in Table 1. The PCI Geo Imaging Accelerator software was employed to conduct the remote sensing data preprocessing. The rational polynomial coefficient model was adopted to geometrically correct the remote sensing image [33], then the PANSHARP method was employed to fuse the multi-spectral image and the pan-chromatic image [34], and finally all the data was unified into 8-bit. After the preprocessing, the geometric error of the remote sensing image is limited to within 1 pixel. The flow chart of preprocessing is shown in Figure 2.

**Figure 1.** Schematic diagram of the geographical location of the study area. (**a**) Shenzhen's location in China. (**b**) Overview of Shenzhen.



#### *2.2. Methods*

#### 2.2.1. Sample Generation

Since deep learning requires a large amount of labelled data related to the classification target for training, and the existing open-source datasets cannot meet the requirements of this article, it was necessary to manually establish an urban green space labelled dataset. Generally, the boundary of green space in the remote sensing image of the experimental area is directly delineated by manual visual interpretation. However, due to the fragmented distribution, diverse types and complex background of green space, this method is timeconsuming and laborious [22]. Therefore, this paper adopts the object-oriented method for data annotation. The process mainly realized through three steps including selecting

appropriate parameters to segment the image, classifying the segmented image, and manually correcting the result. The final result is stored as 8-bit ground truth binary label, where the value 1 represents the urban green space while 0 denotes the background. The sample data covers different types of green space, as shown in Figure 3. Then the processed remote sensing image data and label data are divided into training set and test set according to the area ratio of 4:1 to ensure that the training set and the test set are independent and identically distributed. At the same time, to increase the diversity of training samples and reduce the under-fitting problem caused by insufficient samples during the training process, random cropping strategy is utilized to generate the samples with the size of 256 × 256 pixels by selecting different repetition rates, and random enhancement is adopted ed to expand the training set and the test set to weaken the background noise and enhance the robustness of the model. In this paper, we mainly use flip, rotation, scaling, adding Gaussian noise and fuzzy to enhance the samples, and finally there are in total 8000 training samples and 2000 test samples, respectively.

**Figure 2.** Flow chart of remote sensing image preprocessing.

**Figure 3.** For samples of different green space types, the left side of each group is the original image and the right side is the label image. (**a**) High density building area; (**b**) countryside; (**c**) Suburban Forest; (**d**) Bare land; (**e**) Golf Course; (**f**) Farmland; (**g**) Land for parks and squares; (**h**) Port.

#### 2.2.2. Improved Residual Structure

In deep learning, based on the network training process of stochastic gradient descent, the error signal is prone to gradient dispersion or gradient explosion through multi-layer back propagation [35]. Therefore, as the network depth increases, training becomes more difficult. In order to solve the above degradation phenomenon, He and others proposed residual neural network (ResNet) in 2015, which greatly eliminated the problem of training difficulties caused by the excessive depth of the neural network [36]. The main contribution of ResNet is to invent a building block containing direct mapping for the phenomenon of network degradation [37]. This process can be expressed as follows:

$$\mathbf{x}\_{l+1} = \mathbf{x}\_l + F(\mathbf{x}\_l, \mathbf{w}\_l) \; \mathbf{x}\_l \tag{1}$$

where *xl* and *xl*<sup>+</sup><sup>1</sup> are input and output of the *l*-*th* unit, and *F* is a residual function.

The residual building block is divided into an identity mapping part and a residual part, as shown in Figure 4a. *xl* is the identity map which is reflected in the straight line on the left side of the Figure 4a. In the convolution neural network, the number of feature maps of *xl* and *xl*<sup>+</sup><sup>1</sup> may be different. At this time, one needs to use a 1 × 1 convolution for dimensionality increase or dimensionality reduction. *F*(*xl*, *wl*), presented at the right side of the Figure 4a, represents the residual part, which is usually composed of two or three convolution operations. The residual structure can alleviate the gradient dispersion problem of deep neural network to a certain extent, solve the degradation problem of deep neural networks, which can make the forward and backward propagation of information smoother.

**Figure 4.** Comparison of different residual structures. (**a**) The original residual structure; (**b**) The proposed residual structure.

Different from other ground objects, the types of urban green space are diverse and its distribution is highly fragmented. At the same time, they are often misclassified as a result of shadow occlusion in remote sensing images and the similarity of spectral features with cultivated land usually leads to confusion between targets, so effective use of any feature of the original image is particularly important for the accurate extraction of information. Therefore, it is necessary to introduce the residual structure into the extraction of urban green space to solve the above problems. In this research, we propose a new residual structure based on the idea of identity mapping in residual neural network, as shown in Figure 4b. This building block passes through a convolutional layer with a kernel size of 1 × 1 in the identity mapping part to increase the number of features to the required output feature size. The 1 × 1convolution kernel is used because it summarizes the features between pixels with a smaller kernel to avoid the loss of initial image information. In the residual part, in order to make full use of different levels of semantic features, while reducing the amount of network parameters and improving computational efficiency, the input image is convolved by halving the number of two consecutive features. Then the output results of each convolution are concatenated to obtain the semantic information of urban green space from multiple feature dimensions. At the same time, batch norm (BN) is used as normalization after each convolution kernel, and then rectified linear unit (ReLU) is used as activation function.

#### 2.2.3. Convolutional Block Channel Attention Mechanism

In the convolutional neural network, each convolutional layer contains several filters, which can learn to express the local spatial connection pattern including all channels. However, the local channel features of adjacent spatial locations in the feature map often have a high correlation due to their overlapping receptive fields. The attention mechanism in deep learning originates from the attention mechanism of human brain. When the human brain receives external information, such as visual information and auditory information, it often does not process and understand all information, but only focuses on some significant or interesting information, which is conducive to filtering out unimportant information to improve the efficiency of information processing [38]. Introducing the attention mechanism

into the convolutional neural network does not significantly increase the amount of network parameters while making selective and finer adjustments on the existing feature maps to improve the performance of the network and increase the interpretability of the neural network structure [39]. Therefore, in this study, to emphasize the spectral relationship of remote sensing image feature map, we introduce channel attention mechanism and propose a new network unit—a convolution block channel attention (CBCA) module—to replace the skip connection in the original UNET structure. The structure of the module is shown in Figure 5.

**Figure 5.** The structure of the convolution block channel attention module.

First, the feature map output by the residual module is subjected to a standard convolution to further enhance the feature expression ability. Then, the pooling layer is used to compress each channel into a single-dimensional digital vector. Here, we use max pooling and average pooling to compress the feature map in the channel dimension to obtain two different channel feature descriptors. Then, the above results are input into the fully connected layer with ReLU activation to reduce the dimension and introduce new nonlinearity, and finally a sigmoid activation function is employed to provide each channel a smooth weight, and output the weight value. The range of values is normalized to be between 0–1. Finally, the weights are multiplied by the features of the original encoder output, and then transferred to the decoder. The process can be described by the following formula:

$$
\lambda\_{weight} = \sigma \left( F\_{FC} \left( P\_{avg} \left( F\_{\ $\times\$ } (\mathbf{x}\_{l}) \right) \right) + F\_{FC} \left( P\_{max} \left( F\_{\ $\times\$ } (\mathbf{x}\_{l}) \right) \right) \right) \tag{2}
$$

where *σ*, *FFC*, and *P* represent the *Sigmoid*, *FC*, and *Pooling Operation*, respectively.

#### 2.2.4. Network Structure

In this paper, a convolution neural network for extracting urban green space is proposed, we named it concatenated residual attention UNet (CRAUNet). The network chooses UNet as the backbone of the network, that is, using the encoder and decoder structure combined with shallow semantic information to achieve a smooth transformation from the image to the segmentation mask [40]. As shown in Figure 6, the architecture of the network can be divided into the following three parts: encoder, skip connection with attention mechanism, and decoder.

In the encoder part, in order to eliminate the gradient dispersion and explosion phenomenon in the deep neural network and make the forward and backward propagation of information smoother, this part of the building block is replaced with an improved convolution residual module. In addition to the residual structure proposed in this paper, this module also includes a max pooling layer after it. The size of the convolution kernel of the convolution operation is 3 × 3, followed by a BN layer and a ReLU. The max pooling layer downsamples the input image to half of the original size. After each encoder building block, the number of input feature channels will be doubled, and the image size will be halved. The structure of the decoder is similar to that of the encoder. The residual structure

is not adopted and the maximum pooling layer is replaced with an upsampling layer. Here, we use the bilinear interpolation method to finally restore the feature map to its original size. In each decoder building block, to ensure that the output is the same size as the encoder output, the upsampling rate is set to 2. In the skip connection part, in order to make better use of the information of the multi-resolution feature map, the CBCA mode is embedded to enhance the useful green space features in the channel dimension and suppress the invalid background features, thereby improving the computational efficiency of the network model. Then, the output of this part is concatenated with the output of the previous layer of the decoder and input to the decoder of the current layer. The process is formulated as follows:

$$D\_i = F\_{\text{Conv}}(F\_{up}D\_{i+1}) + \lambda\_{\text{newlight}}E\_i \tag{3}$$

where *Di* indicates the hidden feature of decoder part in *i* layers, *Di*+<sup>1</sup> denotes the lower hidden feature of *Di*, and *Fup* represents the up-sampling operation.

**Figure 6.** The structure of the concatenated residual attention UNet.

This approach for increasing the resolution of the convolution features can provide more fine features for segmentation on the region of the edge and avoid chessboard effect [41].

#### 2.2.5. Inference Methodology

The advantage of a fully convolutional neural network is that it can make full use of the contextual information of the image to improve its performance. However, due to the large size of the high-resolution remote sensing image, directly inputting the data without preprocessing is likely to cause insufficient graphics card memory. Therefore, before the large-scale remote sensing images are input into the network architecture, the images need to be cropped into several smaller images of fixed size and predicted, respectively. After the results of each smaller tile are obtained, the whole images predicted by the network architecture are obtained by stitching operation. However, due to the large number of convolution processing in the network model, the center pixel can obtain more context information while the edge pixels gain limited information, so the pixels close to the edge of the image will not be classified as accurately as the pixels close to the center. After stitching the results of several small images into the original image size, obvious stitching traces will appear in some positions [42]. In order to alleviate these problems, this experiment adopts the sliding window method in the network inference phase. In our experiments, a large-size remote sensing image is cropped into multiple small images of 256 × 256 pixels. Notably, during the cropping process, a fixed numerical area smaller than the image size is set to control the prediction area range of each input to the network frame. In this experiment, the size of the sliding window is set to 64.

#### 2.2.6. Accuracy Assessment

In this study, seven evaluation indexes were selected to evaluate the performance, including pixel accuracy (PA), the precision, the recall, *F*1-Score (*F*1), and the mean intersection over union (MIoU), floating point of operations (FLOPs). The definitions and formulas of these indicators are listed in Table 2.


#### **Table 2.** Evaluation metrics for the accuracy assessment.

Where TP FP FN and TN are the is true positive, false positive, false negative and true negative classifications, respectively.

#### **3. Experiment**

In this section, in order to evaluate the performance of the various modules used in the designed network on the constructed urban green space dataset, we conducted two ablation experiments. In terms of evaluation indicators, we mainly use pixel accuracy (PA), and present the number of parameters and FLOPs for each network. Because it is time-consuming to test the final performance of the network, we need to reduce the learning rate and fine tune repeatedly in the process, so we only care about the performance of different network structures in the fixed hyperparameters and epoch training.

#### *3.1. Effectiveness of Concatenated Residual Structure*

In the first experiment, we tested the convergence of the proposed residual structure. In order to ensure the consistency, we choose UNet as our baseline model without a dropout layer, and the bilinear interpolation method is selected for upsampling. Then, we modify the baseline model by adding identity mapping for each layer of the UNet network decoder part, thereby increasing the residual structure (ResUNet). It should be clear that we have added a 1 × 1 convolution kernel to the identity mapping part to ensure that the number of features in this part is increased to the required output feature size, so there areslightly more parameters than the Unet model. Then we reduce the number of network parameters by reducing the number of convolution channels in the residual block by half and connecting the output of each convolution kernel (CRUNet). At the same time, for comparative analysis, we also do the above operations on the baseline model without adding a residual structure (CUNet). In Figure 7, we plot the pixel accuracy curves of all models in the validation set. It can be seen that with the modification of each

module, the performance difference of the convergence rate among the model is increasing. The baseline UNet needs about 120 epochs to reach the same level of performance that CRUNET achieved in 70 epochs. Only by modifying the convolution in the UNet decoder and concatenating the features of different convolution kernels, the convergence speed of the model can be significantly improved. In addition, it can be seen from the comparison that if only the residual module is added to the baseline model, the convergence speed of the model can be improved while the accuracy of the model will fluctuate more during the training process, and the training will be unstable. Moreover, we also compared the number of parameters and calculations of different network structures. As presented in Table 3, the introduced residual structure can improve the pixel accuracy while reducing the complexity of the model. The results show that when the convolution kernel of the decoder in the baseline network is modified and the residual module is added, the model can learn more features of urban green space, and the convergence performance of the model is significantly improved.

**Figure 7.** Convergence performance of the CRUNet architecture. Starting from a baseline UNet, we add components keeping all training hyperparameters identical.


**Table 3.** The efficiency comparison of the various methods.

#### *3.2. Performance of Convolution Block Channel Attention Module*

In the second experiment, we explored the effect of introducing the attention mechanism into skip connections on the performance of network feature extraction. Similarly, we also adopted UNet as the baseline model of this experiment to illustrate the role played by the attention mechanism. Then, we use the channel attention module (CAM) and the CBCA module proposed in previous section to replace the original skip connection, and we also applied the CBCA module to the CRUNet model mentioned above. As can be seen from Figure 8, the attention mechanism is introduced into the model, which makes use of the relationship between channels to improve the feature extraction ability of the model and compress the redundant features on the channel dimension. After adding the attention mechanism, the performance of the model is improved from 96.5% to 97.1%, obtaining a rise of 0.6%. This demonstrates that, compared with the original skip connection, the skip connection with attention mechanism can extract green space features more effectively. After feature fusion, it can recover high-resolution details better when upsampling, and the combination with residual structure can improve the convergence speed of the model.

#### **4. Results**

In this section, we will explore the performance of the proposed model and compare the performance with other structures. The training process is implemented on an NVIDIA Titan V GPU using Python 3.6, PyTorch deep learning framework, accelerated by cuDNN 10.0. Aftereach training epoch, the classification accuracy on the training data set and the validation data set is calculated. Our goal is to gain the best convergence model to the best and apply it for the extraction of urban green space.

#### *4.1. Comparative Experiment with Other Networks*

In order to illustrate the performance of CRUNet, we selected several typical semantic segmentation networks including FCN8s, UNet, Deeplab for comparison in the urban green space data set. The experimental results are shown in Table 4. Regardless of whether the CBCA module is added, the performance of our proposed network is better than other networks. Specifially, our network achieved the highest PA (97.34%) and MIoU (94.77%). DeepLabV3+ contains the highest number of parameters and calculations, and the training time is the longest, but its performance is lower than FCN8s, which has the least number of parameters, indicating that overfitting occurs during the training process. Compared with UNet, our network reduces a certain number of parameters and FLOPs, shortening training time while improving performance. After adding the CBCA module to our network, the PA and *F*1-Score did not change significantly, but the MIoU improved slightly, indicating that the addition of this module has improved the region smoothness [43]. The above results all verify the effectiveness of our proposed network. Meanwhile, it can be concluded from the number of parameters and FLOPs in the Table 4 that the improvement of our network performance is not simply at the expense of increasing the number of parameters and FLOPs.



We selected typical urban green space areas to clarify the reasons for the above accuracy differences more clearly. Figure 9 shows several examples of segmentation results of different semantic segmentation networks in different categories of remote sensing images.

**Figure 9.** The results classified by the five CNNs on the test set. (**a**) The original images; (**b**) Corresponding labels; (**c**) The predicted maps of FCN8s, (**d**) The predicted maps of DeepLabV3+; (**e**) The predicted maps of UNet; (**f**) The predicted maps of CRUNet; (**g**) The predicted maps of CRAUNet.

It can be seen that whether in urban areas, suburbs or mountains, our network extraction results are closest to the ground truth figure. In the first and second rows, the proposed network removes the influence of shadow and other interference in complex background, indicating that the network has advantages in eliminating noise while other networks are affected by different degrees of noise. In the third and fourth lines, our network extracts green space information more completely. The reason is that the residual module and the channel attention mechanism rely on each other between channels, enhancing the network's feature learning ability. Other CNNs can accurately extract green space but the boundaries extracted by DeeplabV3+ and FCN8s are smoother. To a certain extent, the confusion phenomenon caused similarity or heterogeneity in the remote sensing image is alleviated. Combined with the results of the fifth row, the edges in the prediction results we extracted are more precise, especially at the junction of buildings and green spaces. Compared with the results of the other four pictures, our edges are more detailed. The results show that compared with other networks, our proposed network segmentation results are more complete, and it also has advantages in small target recognition, so it achieved the highest accuracy. To further illustrate the performance of our proposed network for extracting urban green space, Figure 10 shows the example results of CRUNet and CRAUNet on the test set. It can be seen from the figure that the network with or without the CBCA module can identify larger green space targets, but the latter is more susceptible to noise, and there are obvious defects in the identification of smaller feature information. The above results prove that our network has strong feature extraction capabilities and

high-resolution detail recovery capabilities for high-resolution remote sensing images. Specifically, the residual structure enhances the feature expression ability of the network and provides richer feature information for the decoder. The CBCA module strengthens the expression of effective features, suppresses invalid features, and provides location guidance for the recovery of high-resolution remote sensing images.

**Figure 10.** Results comparison between CRUNet and CRAUNet. (**a**) The original images; (**b**) Corresponding labels; (**c**) The predicted maps of CRUNet; (**d**)The predicted maps of CRAUNet.

#### *4.2. Performance Comparison on Different Landcover*

In order to further evaluate the generality of CRAUNet model, we extracted different types of urban green spaces and easily confused areas from the results, and compared the results extracted from FCN8s, UNet and DeepLabV3+ based on visual inspection and overall accuracy to explain the results. Here, we mainly choose the park green space, residential green space, protective green space, farmland and sports ground and other types of landcovers. The specific features of these landcovers shown in Figure 10 are: forest, golf course, bare land, sports ground, farmland and aquaculture areas

The comparison result is shown in Figure 11 and Table 5. For large areas of green space such as forests and artificial grasses such as golf courses, the four algorithms can accurately identify them. But for bare land with sparse vegetation, FCN8s lost some small patches of green space information. For high-density building areas, the results extracted by FCN8s and DeepLabV3+ algorithms lose a lot of boundary information. And for sports fields containing artificial grass, all algorithms have a certain degree of confusion. But it can be seen from the figure that our network has the lowest error rate. In terms of distinguishing between green space and cultivated land, since more attention was paid to cultivated land during the training process, each algorithm showed better performance. In addition, for built-up areas and aquaculture areas, which are disturbed by shadows and nutrients in the water, compared with our proposed network, the results extracted by other algorithms contain more noise.

**Figure 11.** Typical Urban green space classification results. (**a**) The original images; (**b**) corresponding labels; (**c**) the predicted maps of FCN8s, (**d**) the predicted maps of DeepLabV3+; (**e**) the predicted maps of UNet; (**f**) the predicted maps of CRAUNet.

**Table 5.** The accuracy of different landcover.


From Figure 10 and Table 5 we can see that the performance of CRAUNet is better than that of the other networks, and it achieves high accuracy for each picture. FCN8s and DeepLabV3+ algorithm lost a lot of detailed information of urban green space in the extraction of green space, resulting in that some small green spaces that could not be identified and the boundaries were more blurred. Although UNet can more accurately extract the information of urban green space, it is susceptible to the influence of features with similar spectral characteristics and produces more noise. Our network has shown better performance for extracting different types of green space, and the results are better than other algorithms.

#### *4.3. Urban Green Space Mapping and Accuracy Evaluation*

In this section, we used CRAUNet to extract the urban green space of Shenzhen for 2020 based the seven images mentioned in Section 2.1. Compared with common images, remote sensing images have larger data volume. The CNN cannot process such a large image directly. In order to enable CRAUNet to extract the urban green space of a whole remote sensing image, we employed CRAUNet to obtain urban green space information based on a sliding-window method, and the size of a sliding window is 512 × 512 pixels. After extracting urban green space from the seven satellite images, we produced the map of urban green space in Shenzhen with 2 m spatial resolution. The result is shown in Figure 12.

**Figure 12.** Shenzhen urban green space coverage map in 2020.

In the accuracy evaluation part, we used two kinds of sampling methods to select validation points by ArcGIS and ENVI software. The first method is randomly selecting 500 validation points, and the second method is selecting 491 points for verification at an equal distance. The distribution of sampling points is shown in Figure 13. The confusion matrix and the accuracy evaluation indicators mentioned in Section 2.2.6 were employed to evaluate the mapping accuracy of urban green space extraction. The results are shown in Tables 6 and 7. The accuracy evaluation results of the two sampling methods are different. The accuracy evaluation index of equidistant sampling method is higher than that of random sampling method. Judging from the results in the table, the method has shown high accuracy in the extraction of regional urban green space.

**Figure 13.** Comparison of different sampling methods for accuracy verification (**a**) Random sampling method; (**b**) Equal distance sampling method.


**Table 6.** Confusion matrix for urban green space mapping.

**Table 7.** Accuracy assessment for mapping.


#### **5. Discussion**

The high-resolution remote sensing image provides a reliable data support for extracting urban green space accurately. Due to the self-learning ability of deep learning methods, the CNN has been widely used in urban green space extraction. Compared with traditional methods, the CNN can obtain features independently without selecting features artificially. However, the existing network still has some problems in extracting urban green space. In order to solve these problems, we made the following improvements to CRAUNet. According to the analysis in the previous sections, we can conclude that compared with other semantic segmentation algorithms, CRAUNet has the following three advantages in urban green space extraction:

(1) Some networks have made contributions to solving the gradient disappearance phenomenon in the training process, like DenseNet [44]. The core idea of DenseNet is to establish the connection relationship between different layers, make full use of the feature, and further reduce the problem of gradient disappearance., but the price in exchange is a sharp increase in memory usage. In order to make full use of the characteristics of different levels to solve the degradation problem of neural network, and to avoid the increase of memory, we introduce the residual model. In CRAUNet, the residual module we proposed makes it retain more feature information of the

original image during the feature extraction process, which solves the degradation problem of the neural network to a certain extent and has advantages in identifying small target green spaces. At the same time, this structure does not cause memory usage to rise sharply.


In general, CRAUNet has demonstrated powerful feature extraction capabilities and high-resolution details recovery capabilities in t urban green space extraction based on high-resolution remote sensing image. However, CRAUNet still has room for further improvement. In the future, we will do further work on restoring high-resolution detailed information.

#### **6. Conclusions**

In this article, we propose a new convolutional neural network CRAUNet for urban green space extraction from GF-1 high-resolution satellite images to better solve the problems in traditional methods. The idea of residual structure and attention mechanism are introduced into the network, and the residual module and CBCA module are proposed to enhance the feature extraction ability while reducing the amount of network parameters and calculations. The results shows that our proposed network achieved the highest performance, the PA and MIoU of our model were 97.34% and 94.77%. Based on the accuracy evaluation result and the visual comparison we can conclude that the performance of CRAUNet is better than that of FCN8s, UNet and DeepLabV3+. In addition, the research in this article is conducive to the use of a large number of high-resolution remote sensing images to dynamically monitor urban green spaces and provide decision-making support for the sustainable development of the urban environment.

**Author Contributions:** G.M., G.H., G.W. conceived of and designed the experiments. G.W. provided the original remote sensing data. G.M. made the dataset and performed the experiments. G.M. wrote the whole paper. All authors revised this paper and gave some appropriate suggestions. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by the National Natural Science Foundation of China (61731022), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090300) and the National Key Research and Development Program of China—rapid production method for large-scale global change products (2016YFA0600302).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

