1. Introduction
Regarding the concept of urban green space, different regions have their own interpretation of its definition and scope. Compared with urban green space, western countries use the concept of urban open space more in land use planning [
1,
2,
3]. Urban open space is an open space area reserved for parks and other “green spaces”, which includes water and other natural environments in addition to vegetation [
4]. In China, in order to standardize the management process of urban greening, the government has issued the “Urban Green Space Classification Standard”, which divides urban green space into two parts, including green space within urban construction land and square land and regional green space outside urban construction land [
5]. In this study, the above definition of urban green space is followed.
Urban green space is an indispensable element in the urban ecosystem which is always considered to be an important component to improve the quality of the urban ecological environment [
6]. It provides protection for the sustainable development of the city in various aspects of ecological service functions, such as reducing greenhouse gases, regulation of urban climate, reduction of energy consumption, maintenance of ecological security, etc. [
7,
8,
9,
10]. However, with the rapid development of urbanization, urban built-up areas continue to expand, and green spaces are severely damaged, affecting the quality of life of residents. The unreasonable planning and construction of urban green space restrict the healthy development of the city. Therefore, good urban green space monitoring is a necessity for the sustainable development and management of cities [
11]. How to accurately and dynamically obtain urban green space information has arisen the interest of researchers.
Since the 21st century, with the rapid development of Earth observation technology, the acquisition ability of satellite remote sensing data has been greatly improved, suggesting that it has entered a new era of multi-platform, multi angle, multi-sensor, all-time and all-weather Earth observation. Remote sensing technology can quickly and accurately monitor the dynamic changes of green space in a study area, so itis suitable for large-scale resource investigation and research. At present, a variety of optical and radar remote sensing data sources including Landsat series data [
12], GaoFen series data [
13] and Sentinel series data [
14] have been employed for urban green space extraction. For the application of remote sensing technology in urban green space research, the key technology is how to quickly and accurately obtain the surface vegetation coverage information. Using remote sensing image to extract urban green space is to identify and classify the types of land cover, so as to obtain the vegetation cover map of the real situation of land cover. The methods for urban green space extraction from remote sensing images can be divided into four kinds: threshold method [
15,
16], pixel-based classification method represented by machine learning [
17,
18,
19,
20], object-oriented classification method [
21,
22] and deep learning method [
23,
24,
25,
26]. The threshold method selecta an appropriate threshold to distinguish the green space through the difference of the spectral response of vegetation and other ground objects in one or more bands. Due to the difference between the reflection of vegetation in the visible and near-infrared bands and the soil background, the researchers improved the combination of bands and proposed a series of vegetation indices to extract surface vegetation coverage information [
16]. However, because of the different complexity of the environmental background in urban areas, the representation of vegetation on remote sensing images is easily disturbed by other features in the built-up area, especially in the classification of vegetation in areas with high-density buildings in cities. With the popularity of machine learning technology, more machine learning-based algorithms have attracted great attention from researchers, support vector machines [
17], decision trees [
18], random forests [
19], artificial neural network [
20] and other algorithms. These approaches are widely used in the research of urban green space extraction. However, there are a large number of similar objects with different spectra in high-resolution remote sensing images, which makes the traditional pixel-based classification methods unable to accurately distinguish different types of ground objects. As the traditional methods have become mature, it is difficult to improve at the technical level. In order to improve accuracy, traditional urban green space machine learning algorithms require manual design of features including texture and terrain [
27]. This process is time-consuming and laborious, and there are certain challenges for accurately extracting urban green space information.
Deep learning has become a popular method for remote sensing information extraction. The principle of applying deep learning technology to urban green space information extraction is that the most original remote sensing image passes through multiple hidden layers and abstracts from low level to high level, aiming to automatically select the characteristics of the target to discover the distributed feature representation of green space. [
28]. Compared with traditional machine learning, the advantage of deep learning is that without manual design and acquisition of features, unsupervised or semi-supervised feature learning and efficient hierarchical feature extraction algorithms are implemented for feature extraction [
29]. The convolutional neural network (CNN) is one of the most successful network architectures in deep learning algorithm which consists of input layer, convolution layer, pooling layer, full connection layer and output layer [
30]. The input layer is used to input the original data; the convolution layer is used for feature extraction; the pooling layer compresses the input feature map; the full connection layer connects all features and sends the output value to the classifier; and the output layer outputs the classification result. The reasonable architecture ensures that the feature learning process is efficient, so it has become the main algorithm to extract urban green space coverage information. Nijhawan proposed a framework that combines local binary pattern (LBP) and GIST features with multiple parallel CNNs for feature extraction, and then combined with SVM to extract vegetation in the city. As the number of parallel CNNs increases, the accuracy increases significantly [
23]. Moreno-Armendáriz built a deep neural network system based on CNN and multi-layer perceptron (MLP) to evaluate the health of urban green spaces and promote the realization of the sustainable development goals of smart cities [
24]. Timilsina proposed an object-based convolutional neural network (OB-CNN) for extracting the coverage changes of the number of cities with an accuracy of over 95%, indicating that object-based CNN technology can effectively achieve urban tree coverage Mapping and monitoring [
25]. Hartling tested the ability of densely connect convolutional networks (DenseNet) to identify the main tree species in complex urban environments in the fusion image of WorldView-2 Visible-to-Near Infrared (VNIR), Worldview-3 Selective Wavelength Infrared (SWIR) and LiDAR data sets. The study showed that, regardless of the size of the training sample, DenseNet is superior to RF and SVM technologies when processing highly complex image scenes, so it is more effective for urban tree species classification [
26].
Compared with the above methods, urban green space is not only often blocked by shadows on high-resolution remote sensing images, but misclassified owing to the spectrum similarity of farmland, or-chards, etc. Therefore, it is difficult to extract urban green space using only spectral information. At the same time, due to the complex background of the ground features and irregular boundaries, the existing methods will produce some misclassifications and omissions during extraction. Therefore, in order to solve the above problems, this paper proposes an improved fully convolutional neural network based on the encoding and decoding structure to extract urban green space from the Gaofen-1 remote sensing image which called concatenated residual attention UNet (CRAUnet). The work presented in this article focuses on the following three aspects: (1) A residual module with feature concatenation mechanism is proposed to improve the loss of original image features. (2) In order to improve the feature expression ability of the network, attention mechanism is embedded to the model, and convolutional block channel attention (CBCA) module is proposed. (3) To illustrate the applicability of the network, we compare with other classical networks to evaluate the efficiency of the network structure.