Next Article in Journal
Determining the Contributions of Vegetation and Climate Change to Ecosystem WUE Variation over the Last Two Decades on the Loess Plateau, China
Next Article in Special Issue
Urban Forests and Landscape Ecology
Previous Article in Journal
Effects of Adjacent Land Use Types on the Composition of Vascular Flora in Urban Forest Ecotones in the Southern Poland
Previous Article in Special Issue
The Temporal Variation of the Microclimate and Human Thermal Comfort in Urban Wetland Parks: A Case Study of Xixi National Wetland Park, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Concatenated Residual Attention UNet for Semantic Segmentation of Urban Green Space

1
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
College of Resource and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
3
Satellite Remote Sensing Technology Department, Key Laboratory of Earth Observation Hainan Province, Sanya 572029, China
*
Author to whom correspondence should be addressed.
Forests 2021, 12(11), 1441; https://doi.org/10.3390/f12111441
Submission received: 3 October 2021 / Revised: 19 October 2021 / Accepted: 21 October 2021 / Published: 22 October 2021
(This article belongs to the Special Issue Urban Forests and Landscape Ecology)

Abstract

:
Urban green space is generally considered a significant component of the urban ecological environment system, which serves to improve the quality of the urban environment and provides various guarantees for the sustainable development of the city. Remote sensing provides an effective method for real-time mapping and monitoring of urban green space changes in a large area. However, with the continuous improvement of the spatial resolution of remote sensing images, traditional classification methods cannot accurately obtain the spectral and spatial information of urban green spaces. Due to complex urban background and numerous shadows, there are mixed classifications for the extraction of cultivated land, grassland and other ground features, implying that limitations exist in traditional methods. At present, deep learning methods have shown great potential to tackle this challenge. In this research, we proposed a novel model called Concatenated Residual Attention UNet (CRAUNet), which combines the residual structure and channel attention mechanism, and applied it to the data source composed of GaoFen-1 remote sensing images in the Shenzhen City. Firstly, the improved residual structure is used to make it retain more feature information of the original image during the feature extraction process, then the Convolutional Block Channel Attention (CBCA) module is applied to enhance the extraction of deep convolution features by strengthening the effective green space features and suppressing invalid features through the interdependence of modeling channels.-Finally, the high-resolution feature map is restored through upsampling operation by the decoder. The experimental results show that compared with other methods, CRAUNet achieves the best performance. Especially, our method is less susceptible to the noise and preserves more complete segmented edge details. The pixel accuracy (PA) and mean intersection over union (MIoU) of our approach have reached 97.34% and 94.77%, which shows great applicability in regional large-scale mapping.

1. Introduction

Regarding the concept of urban green space, different regions have their own interpretation of its definition and scope. Compared with urban green space, western countries use the concept of urban open space more in land use planning [1,2,3]. Urban open space is an open space area reserved for parks and other “green spaces”, which includes water and other natural environments in addition to vegetation [4]. In China, in order to standardize the management process of urban greening, the government has issued the “Urban Green Space Classification Standard”, which divides urban green space into two parts, including green space within urban construction land and square land and regional green space outside urban construction land [5]. In this study, the above definition of urban green space is followed.
Urban green space is an indispensable element in the urban ecosystem which is always considered to be an important component to improve the quality of the urban ecological environment [6]. It provides protection for the sustainable development of the city in various aspects of ecological service functions, such as reducing greenhouse gases, regulation of urban climate, reduction of energy consumption, maintenance of ecological security, etc. [7,8,9,10]. However, with the rapid development of urbanization, urban built-up areas continue to expand, and green spaces are severely damaged, affecting the quality of life of residents. The unreasonable planning and construction of urban green space restrict the healthy development of the city. Therefore, good urban green space monitoring is a necessity for the sustainable development and management of cities [11]. How to accurately and dynamically obtain urban green space information has arisen the interest of researchers.
Since the 21st century, with the rapid development of Earth observation technology, the acquisition ability of satellite remote sensing data has been greatly improved, suggesting that it has entered a new era of multi-platform, multi angle, multi-sensor, all-time and all-weather Earth observation. Remote sensing technology can quickly and accurately monitor the dynamic changes of green space in a study area, so itis suitable for large-scale resource investigation and research. At present, a variety of optical and radar remote sensing data sources including Landsat series data [12], GaoFen series data [13] and Sentinel series data [14] have been employed for urban green space extraction. For the application of remote sensing technology in urban green space research, the key technology is how to quickly and accurately obtain the surface vegetation coverage information. Using remote sensing image to extract urban green space is to identify and classify the types of land cover, so as to obtain the vegetation cover map of the real situation of land cover. The methods for urban green space extraction from remote sensing images can be divided into four kinds: threshold method [15,16], pixel-based classification method represented by machine learning [17,18,19,20], object-oriented classification method [21,22] and deep learning method [23,24,25,26]. The threshold method selecta an appropriate threshold to distinguish the green space through the difference of the spectral response of vegetation and other ground objects in one or more bands. Due to the difference between the reflection of vegetation in the visible and near-infrared bands and the soil background, the researchers improved the combination of bands and proposed a series of vegetation indices to extract surface vegetation coverage information [16]. However, because of the different complexity of the environmental background in urban areas, the representation of vegetation on remote sensing images is easily disturbed by other features in the built-up area, especially in the classification of vegetation in areas with high-density buildings in cities. With the popularity of machine learning technology, more machine learning-based algorithms have attracted great attention from researchers, support vector machines [17], decision trees [18], random forests [19], artificial neural network [20] and other algorithms. These approaches are widely used in the research of urban green space extraction. However, there are a large number of similar objects with different spectra in high-resolution remote sensing images, which makes the traditional pixel-based classification methods unable to accurately distinguish different types of ground objects. As the traditional methods have become mature, it is difficult to improve at the technical level. In order to improve accuracy, traditional urban green space machine learning algorithms require manual design of features including texture and terrain [27]. This process is time-consuming and laborious, and there are certain challenges for accurately extracting urban green space information.
Deep learning has become a popular method for remote sensing information extraction. The principle of applying deep learning technology to urban green space information extraction is that the most original remote sensing image passes through multiple hidden layers and abstracts from low level to high level, aiming to automatically select the characteristics of the target to discover the distributed feature representation of green space. [28]. Compared with traditional machine learning, the advantage of deep learning is that without manual design and acquisition of features, unsupervised or semi-supervised feature learning and efficient hierarchical feature extraction algorithms are implemented for feature extraction [29]. The convolutional neural network (CNN) is one of the most successful network architectures in deep learning algorithm which consists of input layer, convolution layer, pooling layer, full connection layer and output layer [30]. The input layer is used to input the original data; the convolution layer is used for feature extraction; the pooling layer compresses the input feature map; the full connection layer connects all features and sends the output value to the classifier; and the output layer outputs the classification result. The reasonable architecture ensures that the feature learning process is efficient, so it has become the main algorithm to extract urban green space coverage information. Nijhawan proposed a framework that combines local binary pattern (LBP) and GIST features with multiple parallel CNNs for feature extraction, and then combined with SVM to extract vegetation in the city. As the number of parallel CNNs increases, the accuracy increases significantly [23]. Moreno-Armendáriz built a deep neural network system based on CNN and multi-layer perceptron (MLP) to evaluate the health of urban green spaces and promote the realization of the sustainable development goals of smart cities [24]. Timilsina proposed an object-based convolutional neural network (OB-CNN) for extracting the coverage changes of the number of cities with an accuracy of over 95%, indicating that object-based CNN technology can effectively achieve urban tree coverage Mapping and monitoring [25]. Hartling tested the ability of densely connect convolutional networks (DenseNet) to identify the main tree species in complex urban environments in the fusion image of WorldView-2 Visible-to-Near Infrared (VNIR), Worldview-3 Selective Wavelength Infrared (SWIR) and LiDAR data sets. The study showed that, regardless of the size of the training sample, DenseNet is superior to RF and SVM technologies when processing highly complex image scenes, so it is more effective for urban tree species classification [26].
Compared with the above methods, urban green space is not only often blocked by shadows on high-resolution remote sensing images, but misclassified owing to the spectrum similarity of farmland, or-chards, etc. Therefore, it is difficult to extract urban green space using only spectral information. At the same time, due to the complex background of the ground features and irregular boundaries, the existing methods will produce some misclassifications and omissions during extraction. Therefore, in order to solve the above problems, this paper proposes an improved fully convolutional neural network based on the encoding and decoding structure to extract urban green space from the Gaofen-1 remote sensing image which called concatenated residual attention UNet (CRAUnet). The work presented in this article focuses on the following three aspects: (1) A residual module with feature concatenation mechanism is proposed to improve the loss of original image features. (2) In order to improve the feature expression ability of the network, attention mechanism is embedded to the model, and convolutional block channel attention (CBCA) module is proposed. (3) To illustrate the applicability of the network, we compare with other classical networks to evaluate the efficiency of the network structure.

2. Materials and Methods

2.1. Study Area and Data Sources

As a national economic center city and an international city, Shenzhen is China’s first Special Economic Zone [31]. Its geographical location is shown in Figure 1. As one of the first cities to develop and endure environmental pressure earlier, Shenzhen has always adhered to both protection and development, and vigorously created a good forest ecological environment. Especially since 2015, Shenzhen has fully initiated the creation of a national forest city. By 2019, the green area coverage exceeded 45% [32].
The Gaofen-1 satellite was successfully launched into orbit on 26 April 2013. The selected panchromatic multispectral camera image includes a panchromatic band with a spatial resolution of 2 m and four multispectral bands with a spatial resolution of 8m. In this research, we selected 12 scenes multi temporal Gaofen-1, 1-C and 1-D remote sensing data of Shenzhen for 2017 and 2020, of which 9 scenes are used as training data and 3 scenes are used as testing data. The information about images of the study is shown in Table 1. The PCI Geo Imaging Accelerator software was employed to conduct the remote sensing data preprocessing. The rational polynomial coefficient model was adopted to geometrically correct the remote sensing image [33], then the PANSHARP method was employed to fuse the multi-spectral image and the pan-chromatic image [34], and finally all the data was unified into 8-bit. After the preprocessing, the geometric error of the remote sensing image is limited to within 1 pixel. The flow chart of preprocessing is shown in Figure 2.

2.2. Methods

2.2.1. Sample Generation

Since deep learning requires a large amount of labelled data related to the classification target for training, and the existing open-source datasets cannot meet the requirements of this article, it was necessary to manually establish an urban green space labelled dataset. Generally, the boundary of green space in the remote sensing image of the experimental area is directly delineated by manual visual interpretation. However, due to the fragmented distribution, diverse types and complex background of green space, this method is time-consuming and laborious [22]. Therefore, this paper adopts the object-oriented method for data annotation. The process mainly realized through three steps including selecting appropriate parameters to segment the image, classifying the segmented image, and manually correcting the result. The final result is stored as 8-bit ground truth binary label, where the value 1 represents the urban green space while 0 denotes the background. The sample data covers different types of green space, as shown in Figure 3. Then the processed remote sensing image data and label data are divided into training set and test set according to the area ratio of 4:1 to ensure that the training set and the test set are independent and identically distributed. At the same time, to increase the diversity of training samples and reduce the under-fitting problem caused by insufficient samples during the training process, random cropping strategy is utilized to generate the samples with the size of 256 × 256 pixels by selecting different repetition rates, and random enhancement is adopted ed to expand the training set and the test set to weaken the background noise and enhance the robustness of the model. In this paper, we mainly use flip, rotation, scaling, adding Gaussian noise and fuzzy to enhance the samples, and finally there are in total 8000 training samples and 2000 test samples, respectively.

2.2.2. Improved Residual Structure

In deep learning, based on the network training process of stochastic gradient descent, the error signal is prone to gradient dispersion or gradient explosion through multi-layer back propagation [35]. Therefore, as the network depth increases, training becomes more difficult. In order to solve the above degradation phenomenon, He and others proposed residual neural network (ResNet) in 2015, which greatly eliminated the problem of training difficulties caused by the excessive depth of the neural network [36]. The main contribution of ResNet is to invent a building block containing direct mapping for the phenomenon of network degradation [37]. This process can be expressed as follows:
x l + 1 = x l + F ( x l , w l )   x l  
where   x l   and   x l + 1   are input and output of the l - t h unit, and F is a residual function.
The residual building block is divided into an identity mapping part and a residual part, as shown in Figure 4a. x l is the identity map which is reflected in the straight line on the left side of the Figure 4a. In the convolution neural network, the number of feature maps of x l and x l + 1 may be different. At this time, one needs to use a 1 × 1 convolution for dimensionality increase or dimensionality reduction.   F ( x l , w l ) , presented at the right side of the Figure 4a, represents the residual part, which is usually composed of two or three convolution operations. The residual structure can alleviate the gradient dispersion problem of deep neural network to a certain extent, solve the degradation problem of deep neural networks, which can make the forward and backward propagation of information smoother.
Different from other ground objects, the types of urban green space are diverse and its distribution is highly fragmented. At the same time, they are often misclassified as a result of shadow occlusion in remote sensing images and the similarity of spectral features with cultivated land usually leads to confusion between targets, so effective use of any feature of the original image is particularly important for the accurate extraction of information. Therefore, it is necessary to introduce the residual structure into the extraction of urban green space to solve the above problems. In this research, we propose a new residual structure based on the idea of identity mapping in residual neural network, as shown in Figure 4b. This building block passes through a convolutional layer with a kernel size of 1 × 1 in the identity mapping part to increase the number of features to the required output feature size. The 1 × 1convolution kernel is used because it summarizes the features between pixels with a smaller kernel to avoid the loss of initial image information. In the residual part, in order to make full use of different levels of semantic features, while reducing the amount of network parameters and improving computational efficiency, the input image is convolved by halving the number of two consecutive features. Then the output results of each convolution are concatenated to obtain the semantic information of urban green space from multiple feature dimensions. At the same time, batch norm (BN) is used as normalization after each convolution kernel, and then rectified linear unit (ReLU) is used as activation function.

2.2.3. Convolutional Block Channel Attention Mechanism

In the convolutional neural network, each convolutional layer contains several filters, which can learn to express the local spatial connection pattern including all channels. However, the local channel features of adjacent spatial locations in the feature map often have a high correlation due to their overlapping receptive fields. The attention mechanism in deep learning originates from the attention mechanism of human brain. When the human brain receives external information, such as visual information and auditory information, it often does not process and understand all information, but only focuses on some significant or interesting information, which is conducive to filtering out unimportant information to improve the efficiency of information processing [38]. Introducing the attention mechanism into the convolutional neural network does not significantly increase the amount of network parameters while making selective and finer adjustments on the existing feature maps to improve the performance of the network and increase the interpretability of the neural network structure [39]. Therefore, in this study, to emphasize the spectral relationship of remote sensing image feature map, we introduce channel attention mechanism and propose a new network unit—a convolution block channel attention (CBCA) module—to replace the skip connection in the original UNET structure. The structure of the module is shown in Figure 5.
First, the feature map output by the residual module is subjected to a standard convolution to further enhance the feature expression ability. Then, the pooling layer is used to compress each channel into a single-dimensional digital vector. Here, we use max pooling and average pooling to compress the feature map in the channel dimension to obtain two different channel feature descriptors. Then, the above results are input into the fully connected layer with ReLU activation to reduce the dimension and introduce new nonlinearity, and finally a sigmoid activation function is employed to provide each channel a smooth weight, and output the weight value. The range of values is normalized to be between 0–1. Finally, the weights are multiplied by the features of the original encoder output, and then transferred to the decoder. The process can be described by the following formula:
λ w e i g h t = σ ( F F C ( P a v g ( F 3 × 3 ( x l ) ) ) + F F C ( P m a x ( F 3 × 3 ( x l ) ) ) )  
where σ , F F C , and P represent the S i g m o i d , F C , and   P o o l i n g   O p e r a t i o n , respectively.

2.2.4. Network Structure

In this paper, a convolution neural network for extracting urban green space is proposed, we named it concatenated residual attention UNet (CRAUNet). The network chooses UNet as the backbone of the network, that is, using the encoder and decoder structure combined with shallow semantic information to achieve a smooth transformation from the image to the segmentation mask [40]. As shown in Figure 6, the architecture of the network can be divided into the following three parts: encoder, skip connection with attention mechanism, and decoder.
In the encoder part, in order to eliminate the gradient dispersion and explosion phenomenon in the deep neural network and make the forward and backward propagation of information smoother, this part of the building block is replaced with an improved convolution residual module. In addition to the residual structure proposed in this paper, this module also includes a max pooling layer after it. The size of the convolution kernel of the convolution operation is 3 × 3, followed by a BN layer and a ReLU. The max pooling layer downsamples the input image to half of the original size. After each encoder building block, the number of input feature channels will be doubled, and the image size will be halved. The structure of the decoder is similar to that of the encoder. The residual structure is not adopted and the maximum pooling layer is replaced with an upsampling layer. Here, we use the bilinear interpolation method to finally restore the feature map to its original size. In each decoder building block, to ensure that the output is the same size as the encoder output, the upsampling rate is set to 2. In the skip connection part, in order to make better use of the information of the multi-resolution feature map, the CBCA mode is embedded to enhance the useful green space features in the channel dimension and suppress the invalid background features, thereby improving the computational efficiency of the network model. Then, the output of this part is concatenated with the output of the previous layer of the decoder and input to the decoder of the current layer. The process is formulated as follows:
D i = F C o n v ( F u p D i + 1 ) + λ w e i g h t E i  
where D i indicates the hidden feature of decoder part in i layers, D i + 1 denotes the lower hidden feature of D i , and F u p   represents the up-sampling operation.
This approach for increasing the resolution of the convolution features can provide more fine features for segmentation on the region of the edge and avoid chessboard effect [41].

2.2.5. Inference Methodology

The advantage of a fully convolutional neural network is that it can make full use of the contextual information of the image to improve its performance. However, due to the large size of the high-resolution remote sensing image, directly inputting the data without preprocessing is likely to cause insufficient graphics card memory. Therefore, before the large-scale remote sensing images are input into the network architecture, the images need to be cropped into several smaller images of fixed size and predicted, respectively. After the results of each smaller tile are obtained, the whole images predicted by the network architecture are obtained by stitching operation. However, due to the large number of convolution processing in the network model, the center pixel can obtain more context information while the edge pixels gain limited information, so the pixels close to the edge of the image will not be classified as accurately as the pixels close to the center. After stitching the results of several small images into the original image size, obvious stitching traces will appear in some positions [42]. In order to alleviate these problems, this experiment adopts the sliding window method in the network inference phase. In our experiments, a large-size remote sensing image is cropped into multiple small images of 256 × 256 pixels. Notably, during the cropping process, a fixed numerical area smaller than the image size is set to control the prediction area range of each input to the network frame. In this experiment, the size of the sliding window is set to 64.

2.2.6. Accuracy Assessment

In this study, seven evaluation indexes were selected to evaluate the performance, including pixel accuracy (PA), the precision, the recall, F 1 -Score ( F 1 ), and the mean intersection over union (MIoU), floating point of operations (FLOPs). The definitions and formulas of these indicators are listed in Table 2.

3. Experiment

In this section, in order to evaluate the performance of the various modules used in the designed network on the constructed urban green space dataset, we conducted two ablation experiments. In terms of evaluation indicators, we mainly use pixel accuracy (PA), and present the number of parameters and FLOPs for each network. Because it is time-consuming to test the final performance of the network, we need to reduce the learning rate and fine tune repeatedly in the process, so we only care about the performance of different network structures in the fixed hyperparameters and epoch training.

3.1. Effectiveness of Concatenated Residual Structure

In the first experiment, we tested the convergence of the proposed residual structure. In order to ensure the consistency, we choose UNet as our baseline model without a dropout layer, and the bilinear interpolation method is selected for upsampling. Then, we modify the baseline model by adding identity mapping for each layer of the UNet network decoder part, thereby increasing the residual structure (ResUNet). It should be clear that we have added a 1 × 1 convolution kernel to the identity mapping part to ensure that the number of features in this part is increased to the required output feature size, so there areslightly more parameters than the Unet model. Then we reduce the number of network parameters by reducing the number of convolution channels in the residual block by half and connecting the output of each convolution kernel (CRUNet). At the same time, for comparative analysis, we also do the above operations on the baseline model without adding a residual structure (CUNet). In Figure 7, we plot the pixel accuracy curves of all models in the validation set. It can be seen that with the modification of each module, the performance difference of the convergence rate among the model is increasing. The baseline UNet needs about 120 epochs to reach the same level of performance that CRUNET achieved in 70 epochs. Only by modifying the convolution in the UNet decoder and concatenating the features of different convolution kernels, the convergence speed of the model can be significantly improved. In addition, it can be seen from the comparison that if only the residual module is added to the baseline model, the convergence speed of the model can be improved while the accuracy of the model will fluctuate more during the training process, and the training will be unstable. Moreover, we also compared the number of parameters and calculations of different network structures. As presented in Table 3, the introduced residual structure can improve the pixel accuracy while reducing the complexity of the model. The results show that when the convolution kernel of the decoder in the baseline network is modified and the residual module is added, the model can learn more features of urban green space, and the convergence performance of the model is significantly improved.

3.2. Performance of Convolution Block Channel Attention Module

In the second experiment, we explored the effect of introducing the attention mechanism into skip connections on the performance of network feature extraction. Similarly, we also adopted UNet as the baseline model of this experiment to illustrate the role played by the attention mechanism. Then, we use the channel attention module (CAM) and the CBCA module proposed in previous section to replace the original skip connection, and we also applied the CBCA module to the CRUNet model mentioned above. As can be seen from Figure 8, the attention mechanism is introduced into the model, which makes use of the relationship between channels to improve the feature extraction ability of the model and compress the redundant features on the channel dimension. After adding the attention mechanism, the performance of the model is improved from 96.5% to 97.1%, obtaining a rise of 0.6%. This demonstrates that, compared with the original skip connection, the skip connection with attention mechanism can extract green space features more effectively. After feature fusion, it can recover high-resolution details better when upsampling, and the combination with residual structure can improve the convergence speed of the model.

4. Results

In this section, we will explore the performance of the proposed model and compare the performance with other structures. The training process is implemented on an NVIDIA Titan V GPU using Python 3.6, PyTorch deep learning framework, accelerated by cuDNN 10.0. Aftereach training epoch, the classification accuracy on the training data set and the validation data set is calculated. Our goal is to gain the best convergence model to the best and apply it for the extraction of urban green space.

4.1. Comparative Experiment with Other Networks

In order to illustrate the performance of CRUNet, we selected several typical semantic segmentation networks including FCN8s, UNet, Deeplab for comparison in the urban green space data set. The experimental results are shown in Table 4. Regardless of whether the CBCA module is added, the performance of our proposed network is better than other networks. Specifially, our network achieved the highest PA (97.34%) and MIoU (94.77%). DeepLabV3+ contains the highest number of parameters and calculations, and the training time is the longest, but its performance is lower than FCN8s, which has the least number of parameters, indicating that overfitting occurs during the training process. Compared with UNet, our network reduces a certain number of parameters and FLOPs, shortening training time while improving performance. After adding the CBCA module to our network, the PA and F 1 -Score did not change significantly, but the MIoU improved slightly, indicating that the addition of this module has improved the region smoothness [43]. The above results all verify the effectiveness of our proposed network. Meanwhile, it can be concluded from the number of parameters and FLOPs in the Table 4 that the improvement of our network performance is not simply at the expense of increasing the number of parameters and FLOPs.
We selected typical urban green space areas to clarify the reasons for the above accuracy differences more clearly. Figure 9 shows several examples of segmentation results of different semantic segmentation networks in different categories of remote sensing images.
It can be seen that whether in urban areas, suburbs or mountains, our network extraction results are closest to the ground truth figure. In the first and second rows, the proposed network removes the influence of shadow and other interference in complex background, indicating that the network has advantages in eliminating noise while other networks are affected by different degrees of noise. In the third and fourth lines, our network extracts green space information more completely. The reason is that the residual module and the channel attention mechanism rely on each other between channels, enhancing the network’s feature learning ability. Other CNNs can accurately extract green space but the boundaries extracted by DeeplabV3+ and FCN8s are smoother. To a certain extent, the confusion phenomenon caused similarity or heterogeneity in the remote sensing image is alleviated. Combined with the results of the fifth row, the edges in the prediction results we extracted are more precise, especially at the junction of buildings and green spaces. Compared with the results of the other four pictures, our edges are more detailed. The results show that compared with other networks, our proposed network segmentation results are more complete, and it also has advantages in small target recognition, so it achieved the highest accuracy. To further illustrate the performance of our proposed network for extracting urban green space, Figure 10 shows the example results of CRUNet and CRAUNet on the test set. It can be seen from the figure that the network with or without the CBCA module can identify larger green space targets, but the latter is more susceptible to noise, and there are obvious defects in the identification of smaller feature information. The above results prove that our network has strong feature extraction capabilities and high-resolution detail recovery capabilities for high-resolution remote sensing images. Specifically, the residual structure enhances the feature expression ability of the network and provides richer feature information for the decoder. The CBCA module strengthens the expression of effective features, suppresses invalid features, and provides location guidance for the recovery of high-resolution remote sensing images.

4.2. Performance Comparison on Different Landcover

In order to further evaluate the generality of CRAUNet model, we extracted different types of urban green spaces and easily confused areas from the results, and compared the results extracted from FCN8s, UNet and DeepLabV3+ based on visual inspection and overall accuracy to explain the results. Here, we mainly choose the park green space, residential green space, protective green space, farmland and sports ground and other types of landcovers. The specific features of these landcovers shown in Figure 10 are: forest, golf course, bare land, sports ground, farmland and aquaculture areas
The comparison result is shown in Figure 11 and Table 5. For large areas of green space such as forests and artificial grasses such as golf courses, the four algorithms can accurately identify them. But for bare land with sparse vegetation, FCN8s lost some small patches of green space information. For high-density building areas, the results extracted by FCN8s and DeepLabV3+ algorithms lose a lot of boundary information. And for sports fields containing artificial grass, all algorithms have a certain degree of confusion. But it can be seen from the figure that our network has the lowest error rate. In terms of distinguishing between green space and cultivated land, since more attention was paid to cultivated land during the training process, each algorithm showed better performance. In addition, for built-up areas and aquaculture areas, which are disturbed by shadows and nutrients in the water, compared with our proposed network, the results extracted by other algorithms contain more noise.
From Figure 10 and Table 5 we can see that the performance of CRAUNet is better than that of the other networks, and it achieves high accuracy for each picture. FCN8s and DeepLabV3+ algorithm lost a lot of detailed information of urban green space in the extraction of green space, resulting in that some small green spaces that could not be identified and the boundaries were more blurred. Although UNet can more accurately extract the information of urban green space, it is susceptible to the influence of features with similar spectral characteristics and produces more noise. Our network has shown better performance for extracting different types of green space, and the results are better than other algorithms.

4.3. Urban Green Space Mapping and Accuracy Evaluation

In this section, we used CRAUNet to extract the urban green space of Shenzhen for 2020 based the seven images mentioned in Section 2.1. Compared with common images, remote sensing images have larger data volume. The CNN cannot process such a large image directly. In order to enable CRAUNet to extract the urban green space of a whole remote sensing image, we employed CRAUNet to obtain urban green space information based on a sliding-window method, and the size of a sliding window is 512 × 512 pixels. After extracting urban green space from the seven satellite images, we produced the map of urban green space in Shenzhen with 2 m spatial resolution. The result is shown in Figure 12.
In the accuracy evaluation part, we used two kinds of sampling methods to select validation points by ArcGIS and ENVI software. The first method is randomly selecting 500 validation points, and the second method is selecting 491 points for verification at an equal distance. The distribution of sampling points is shown in Figure 13. The confusion matrix and the accuracy evaluation indicators mentioned in Section 2.2.6 were employed to evaluate the mapping accuracy of urban green space extraction. The results are shown in Table 6 and Table 7. The accuracy evaluation results of the two sampling methods are different. The accuracy evaluation index of equidistant sampling method is higher than that of random sampling method. Judging from the results in the table, the method has shown high accuracy in the extraction of regional urban green space.

5. Discussion

The high-resolution remote sensing image provides a reliable data support for extracting urban green space accurately. Due to the self-learning ability of deep learning methods, the CNN has been widely used in urban green space extraction. Compared with traditional methods, the CNN can obtain features independently without selecting features artificially. However, the existing network still has some problems in extracting urban green space. In order to solve these problems, we made the following improvements to CRAUNet. According to the analysis in the previous sections, we can conclude that compared with other semantic segmentation algorithms, CRAUNet has the following three advantages in urban green space extraction:
(1)
Some networks have made contributions to solving the gradient disappearance phenomenon in the training process, like DenseNet [44]. The core idea of DenseNet is to establish the connection relationship between different layers, make full use of the feature, and further reduce the problem of gradient disappearance., but the price in exchange is a sharp increase in memory usage. In order to make full use of the characteristics of different levels to solve the degradation problem of neural network, and to avoid the increase of memory, we introduce the residual model. In CRAUNet, the residual module we proposed makes it retain more feature information of the original image during the feature extraction process, which solves the degradation problem of the neural network to a certain extent and has advantages in identifying small target green spaces. At the same time, this structure does not cause memory usage to rise sharply.
(2)
Existing networks are all affected by noise to varying degrees, and the edges of green spaces are smooth. Especially in DeepLabV3+, it is not sensitive to the boundaries of urban green space. In order to solve this problem, we introduce the attention mechanism. The function of this attention mechanism is to quickly locate important information, highlight important information, suppress unimportant information and obtain information that is not easy to mine. Our network can obtain more complete target and real edge information during segmentation and is less affected by noise. This is because the added CBCA module enhances deep-level feature extraction through the interdependence of modeling channels, strengthens effective green space features and suppresses invalid features.
(3)
In the studies mentioned in the introduction of this paper and other networks that introduce residual structure for innovation, such as R2UNet [45] and ResUNet-a [28], high accuracy is usually obtained by increasing the network depth or adding other modules to increase the number of parameters. The improvement of our network performance is not at the cost of simply increasing the amount of parameters and computation, but by concatenating the features between different convolution kernels to make the network lightweight. That is to say, the network we propose not only improves the accuracy, but also reduces the number of parameters.
In general, CRAUNet has demonstrated powerful feature extraction capabilities and high-resolution details recovery capabilities in t urban green space extraction based on high-resolution remote sensing image. However, CRAUNet still has room for further improvement. In the future, we will do further work on restoring high-resolution detailed information.

6. Conclusions

In this article, we propose a new convolutional neural network CRAUNet for urban green space extraction from GF-1 high-resolution satellite images to better solve the problems in traditional methods. The idea of residual structure and attention mechanism are introduced into the network, and the residual module and CBCA module are proposed to enhance the feature extraction ability while reducing the amount of network parameters and calculations. The results shows that our proposed network achieved the highest performance, the PA and MIoU of our model were 97.34% and 94.77%. Based on the accuracy evaluation result and the visual comparison we can conclude that the performance of CRAUNet is better than that of FCN8s, UNet and DeepLabV3+. In addition, the research in this article is conducive to the use of a large number of high-resolution remote sensing images to dynamically monitor urban green spaces and provide decision-making support for the sustainable development of the urban environment.

Author Contributions

G.M., G.H., G.W. conceived of and designed the experiments. G.W. provided the original remote sensing data. G.M. made the dataset and performed the experiments. G.M. wrote the whole paper. All authors revised this paper and gave some appropriate suggestions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by the National Natural Science Foundation of China (61731022), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090300) and the National Key Research and Development Program of China—rapid production method for large-scale global change products (2016YFA0600302).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Brander, L.M.; Koetse, M.J. The value of urban open space: Meta-analyses of contingent valuation and hedonic pricing results. J. Environ. Manag. 2011, 92, 2763–2773. [Google Scholar] [CrossRef]
  2. Kabisch, N.; Strohbach, M.; Haase, D.; Kronenberg, J. Urban green space availability in European cities. Ecol. Indic. 2016, 70, 586–596. [Google Scholar] [CrossRef]
  3. Völker, S.; Kistemann, T. Developing the urban blue: Comparative health responses to blue and green urban open spaces in Germany. Health Place 2015, 35, 196–205. [Google Scholar] [CrossRef] [PubMed]
  4. Taylor, L.; Hochuli, D.F. Defining greenspace: Multiple uses across multiple disciplines. Landsc. Urban Plan. 2017, 158, 25–38. [Google Scholar] [CrossRef] [Green Version]
  5. Standard for Classification of Urban Green Space 2017. CJJ/T 85-2017. Available online: http://www.mohurd.gov.cn/wjfb/201806/t20180626_236545.html (accessed on 15 September 2021).
  6. Wolch, J.R.; Byrne, J.; Newell, J.P. Urban green space, public health, and environmental justice: The challenge of making cities ‘just green enough’. Landsc. Urban Plan. 2014, 125, 234–244. [Google Scholar] [CrossRef] [Green Version]
  7. Strohbach, M.W.; Arnold, E.; Haase, D. The carbon footprint of urban green space—A life cycle approach. Landsc. Urban Plan. 2012, 104, 220–229. [Google Scholar] [CrossRef]
  8. Heidt, V.; Neef, M. Benefits of urban green space for improving urban climate. In Ecology, Planning, and Management of Urban Forests; Springer: Berlin/Heidelberg, Germany, 2008; pp. 84–96. [Google Scholar] [CrossRef]
  9. Zhang, B.; Gao, J.-X.; Yang, Y. The cooling effect of urban green spaces as a contribution to energy-saving and emission-reduction: A case study in Beijing, China. Build. Environ. 2014, 76, 37–43. [Google Scholar] [CrossRef]
  10. Fuller, R.A.; Irvine, K.N.; Devine-Wright, P.; Warren, P.H.; Gaston, K.J. Psychological benefits of greenspace increase with biodiversity. Biol. Lett. 2007, 3, 390–394. [Google Scholar] [CrossRef] [PubMed]
  11. Villeneuve, P.J.; Jerrett, M.; Su, J.G.; Burnett, R.T.; Chen, H.; Wheeler, A.J.; Goldberg, M.S. A cohort study relating urban green space with mortality in Ontario, Canada. Environ. Res. 2012, 115, 51–58. [Google Scholar] [CrossRef]
  12. Gong, C.; Yu, S.; Joesting, H.; Chen, J. Determining socioeconomic drivers of urban forest fragmentation with historical remote sensing images. Landsc. Urban Plan. 2013, 117, 57–65. [Google Scholar] [CrossRef]
  13. Wang, H.; Wang, C.; Wu, H. Using GF-2 imagery and the conditional random field model for urban forest cover mapping. Remote Sens. Lett. 2016, 7, 378–387. [Google Scholar] [CrossRef]
  14. Zhou, X.; Li, L.; Chen, L.; Liu, Y.; Cui, Y.; Zhang, Y.; Zhang, T. Discriminating urban forest types from Sentinel-2A image data through linear spectral mixture analysis: A case study of Xuzhou, East China. Forests 2019, 10, 478. [Google Scholar] [CrossRef] [Green Version]
  15. Yao, Z.; Liu, J.; Zhao, X.; Long, D.; Wang, L. Spatial dynamics of aboveground carbon stock in urban green space: A case study of Xi’an, China. J. Arid Land 2015, 7, 350–360. [Google Scholar] [CrossRef]
  16. Myeong, S.; Nowak, D.J.; Duggin, M.J. A temporal analysis of urban forest carbon storage using remote sensing. Remote Sens. Environ. 2006, 101, 277–282. [Google Scholar] [CrossRef]
  17. Shojanoori, R.; Shafri, H.Z. Review on the use of remote sensing for urban forest monitoring. Arboric Urban 2016, 42, 400–417. [Google Scholar] [CrossRef]
  18. Shen, C.; Li, M.; Li, F.; Chen, J.; Lu, Y. Study on urban green space extraction from QUICKBIRD imagery based on decision tree. In Proceedings of the 2010 18th International Conference on Geoinformatics, Beijing, China, 18–20 June 2010; pp. 1–4. [Google Scholar] [CrossRef]
  19. Feng, Q.; Liu, J.; Gong, J. UAV remote sensing for urban vegetation mapping using random forest and texture analysis. Remote Sens. 2015, 7, 1074–1094. [Google Scholar] [CrossRef] [Green Version]
  20. Diamantopoulou, M.J. Filling gaps in diameter measurements on standing tree boles in the urban forest of Thessaloniki, Greece. Environ. Model. Softw. 2010, 25, 1857–1865. [Google Scholar] [CrossRef]
  21. Walker, J.S.; Briggs, J.M. An object-oriented approach to urban forest mapping in Phoenix. Photogramm. Eng. Remote Sens. 2007, 73, 577–583. [Google Scholar] [CrossRef]
  22. Ardila, J.P.; Bijker, W.; Tolpekin, V.A.; Stein, A. Context-sensitive extraction of tree crown objects in urban areas using VHR satellite images. Int. J. Appl. Earth Obs. Geoinf. 2012, 15, 57–69. [Google Scholar] [CrossRef] [Green Version]
  23. Nijhawan, R.; Sharma, H.; Sahni, H.; Batra, A. A deep learning hybrid CNN framework approach for vegetation cover mapping using deep features. In Proceedings of the 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Jaipur, India, 4–7 December 2017; pp. 192–196. [Google Scholar] [CrossRef]
  24. A Novel Software for Analyzing the Natural Landscape Surface Using Deep Learning. Available online: https://openreview.net/forum?id=HJgWtnd9kr (accessed on 15 September 2021).
  25. Timilsina, S.; Aryal, J.; Kirkpatrick, J.B. Mapping urban tree cover changes using object-based convolution neural network (OB-CNN). Remote Sens. 2020, 12, 3017. [Google Scholar] [CrossRef]
  26. Hartling, S.; Sagan, V.; Sidike, P.; Maimaitijiang, M.; Carron, J. Urban tree species classification using a WorldView-2/3 and LiDAR data fusion approach and deep learning. Sensors 2019, 19, 1284. [Google Scholar] [CrossRef] [Green Version]
  27. Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; Van der Meer, F.; Van der Werff, H.; Van Coillie, F. Geographic object-based image analysis–towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef] [Green Version]
  29. Kemker, R.; Salvaggio, C.; Kanan, C. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS J. Photogramm. Remote Sens. 2018, 145, 60–77. [Google Scholar] [CrossRef] [Green Version]
  30. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
  31. Yuan, Y.; Guo, H.; Xu, H.; Li, W.; Luo, S.; Lin, H.; Yuan, Y. China’s first special economic zone: The case of Shenzhen. In Building Engines for Growth and Competitiveness in China: Experience with Special Economic Zones and Industrial Clusters; Zeng, D.Z., Ed.; World Bank Publications: Washington, DC, USA, 2010; Chapter 2; pp. 55–86. [Google Scholar]
  32. Chen, X.; Li, F.; Li, X.; Hu, Y.; Hu, P. Evaluating and mapping water supply and demand for sustainable urban ecosystem management in Shenzhen, China. J. Clean. Prod. 2020, 251, 119754. [Google Scholar] [CrossRef]
  33. Tengfei, L.; Weili, J.; Guojin, H. Nested regression based optimal selection (NRBOS) of rational polynomial coefficients. Photogramm. Eng. Remote Sens. 2014, 80, 261–269. [Google Scholar] [CrossRef]
  34. Xiao, P.; Zhang, X.; Zhang, H.; Hu, R.; Feng, X. Multiscale optimized segmentation of urban green cover in high resolution remote sensing image. Remote Sens. 2018, 10, 1813. [Google Scholar] [CrossRef] [Green Version]
  35. Yang, G.; Pennington, J.; Rao, V.; Sohl-Dickstein, J.; Schoenholz, S.S. A mean field theory of batch normalization. Arxiv 2019, arXiv:1902.08129 2019. [Google Scholar]
  36. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar] [CrossRef] [Green Version]
  38. Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
  39. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  40. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef] [Green Version]
  41. Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and checkerboard artifacts. Distill 2016, 1, e3. [Google Scholar] [CrossRef]
  42. Yu, L.; Wu, H.; Zhong, Z.; Zheng, L.; Deng, Q.; Hu, H. TWC-Net: A SAR ship detection using two-way convolution and multiscale feature mapping. Remote Sens. 2021, 13, 2558. [Google Scholar] [CrossRef]
  43. Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
  44. Gao, H.; Zhuang, L.; Van Der Maaten, L.; Kilian, Q.W. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  45. Alom, M.Z.; Yakopcic, C.; Hasan, M.; Taha, T.; Asari, V. Recurrent residual U-Net for medical image segmentation. J. Med Imaging 2019, 6, 014006. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic diagram of the geographical location of the study area. (a) Shenzhen’s location in China. (b) Overview of Shenzhen.
Figure 1. Schematic diagram of the geographical location of the study area. (a) Shenzhen’s location in China. (b) Overview of Shenzhen.
Forests 12 01441 g001
Figure 2. Flow chart of remote sensing image preprocessing.
Figure 2. Flow chart of remote sensing image preprocessing.
Forests 12 01441 g002
Figure 3. For samples of different green space types, the left side of each group is the original image and the right side is the label image. (a) High density building area; (b) countryside; (c) Suburban Forest; (d) Bare land; (e) Golf Course; (f) Farmland; (g) Land for parks and squares; (h) Port.
Figure 3. For samples of different green space types, the left side of each group is the original image and the right side is the label image. (a) High density building area; (b) countryside; (c) Suburban Forest; (d) Bare land; (e) Golf Course; (f) Farmland; (g) Land for parks and squares; (h) Port.
Forests 12 01441 g003
Figure 4. Comparison of different residual structures. (a) The original residual structure; (b) The proposed residual structure.
Figure 4. Comparison of different residual structures. (a) The original residual structure; (b) The proposed residual structure.
Forests 12 01441 g004
Figure 5. The structure of the convolution block channel attention module.
Figure 5. The structure of the convolution block channel attention module.
Forests 12 01441 g005
Figure 6. The structure of the concatenated residual attention UNet.
Figure 6. The structure of the concatenated residual attention UNet.
Forests 12 01441 g006
Figure 7. Convergence performance of the CRUNet architecture. Starting from a baseline UNet, we add components keeping all training hyperparameters identical.
Figure 7. Convergence performance of the CRUNet architecture. Starting from a baseline UNet, we add components keeping all training hyperparameters identical.
Forests 12 01441 g007
Figure 8. Convergence performance of the CBCA module. We compare two modules, a CAM and a CBCA module, and introduce the CBCA module to the CRUNet.
Figure 8. Convergence performance of the CBCA module. We compare two modules, a CAM and a CBCA module, and introduce the CBCA module to the CRUNet.
Forests 12 01441 g008
Figure 9. The results classified by the five CNNs on the test set. (a) The original images; (b) Corresponding labels; (c) The predicted maps of FCN8s, (d) The predicted maps of DeepLabV3+; (e) The predicted maps of UNet; (f) The predicted maps of CRUNet; (g) The predicted maps of CRAUNet.
Figure 9. The results classified by the five CNNs on the test set. (a) The original images; (b) Corresponding labels; (c) The predicted maps of FCN8s, (d) The predicted maps of DeepLabV3+; (e) The predicted maps of UNet; (f) The predicted maps of CRUNet; (g) The predicted maps of CRAUNet.
Forests 12 01441 g009
Figure 10. Results comparison between CRUNet and CRAUNet. (a) The original images; (b) Corresponding labels; (c) The predicted maps of CRUNet; (d)The predicted maps of CRAUNet.
Figure 10. Results comparison between CRUNet and CRAUNet. (a) The original images; (b) Corresponding labels; (c) The predicted maps of CRUNet; (d)The predicted maps of CRAUNet.
Forests 12 01441 g010
Figure 11. Typical Urban green space classification results. (a) The original images; (b) corresponding labels; (c) the predicted maps of FCN8s, (d) the predicted maps of DeepLabV3+; (e) the predicted maps of UNet; (f) the predicted maps of CRAUNet.
Figure 11. Typical Urban green space classification results. (a) The original images; (b) corresponding labels; (c) the predicted maps of FCN8s, (d) the predicted maps of DeepLabV3+; (e) the predicted maps of UNet; (f) the predicted maps of CRAUNet.
Forests 12 01441 g011
Figure 12. Shenzhen urban green space coverage map in 2020.
Figure 12. Shenzhen urban green space coverage map in 2020.
Forests 12 01441 g012
Figure 13. Comparison of different sampling methods for accuracy verification (a) Random sampling method; (b) Equal distance sampling method.
Figure 13. Comparison of different sampling methods for accuracy verification (a) Random sampling method; (b) Equal distance sampling method.
Forests 12 01441 g013
Table 1. Detailed information of the study data.
Table 1. Detailed information of the study data.
SatelliteCentral Longitude (°E)Central Latitude (°N)Imaging TimeImage Size (Pixel × Pixel)Image Usage
Gaofen-1114.222.411 December 201721,196 × 21,103Training
Gaofen-1114.222.711 December 201721,200 × 21,106Training
Gaofen-1113.722.719 December 201720,599 × 20,456Training
Gaofen-1113.922.415 February 201723,436 × 22,351Training
Gaofen-1114.022.615 February 201723,976 × 22,962Training
Gaofen-1114.522.411 December 201721,476 × 21,347Testing
Gaofen-1114.622.711 December 201721,470 × 21,348Testing
Gaofen-1B114.122.422 February 202041,008 × 40,888Training
Gaofen-1B114.223.022 February 202041,008 × 40,890Training
Gaofen-1B114.622.412 January 202040,936 × 40,837Training
Gaofen-1C113.722.426 October 202040,003 × 39,699Training
Gaofen-1C113.923.026 October 202039,986 × 39,678Testing
Table 2. Evaluation metrics for the accuracy assessment.
Table 2. Evaluation metrics for the accuracy assessment.
Accuracy Evaluation
Criteria
DefinitionFormula
PAThe ratio of the correctly classified number of pixels and the total number of pixels P A = T P + T N T P + T N + F P + F N
PrecisionThe ratio of the number of correctly classified pixels and the number of the labeled pixels Precision = T P T P + F P
RecallThe ratio of the number of correctly classified pixels and the number of the actual target feature pixels R e c a l l = T P T P + F N
IoUThe ratio of the intersection and the union of the ground truth and the predicted area I o U = T P T P + F N + F P
MIoUThe average IoU for all classes M I o U = 1 n + 1 i = 0 n I o U
F 1 -ScoreThe harmonic means for precision and recall F 1 = 2 × Precision × R e c a l l Precision + R e c a l l
ParamsThe total weight parameters of all parameterized layers of the model
FLOPsThe number of multiplication and addition operations in the model
Where TP FP FN and TN are the is true positive, false positive, false negative and true negative classifications, respectively.
Table 3. The efficiency comparison of the various methods.
Table 3. The efficiency comparison of the various methods.
ModelPA (%)Params (M)FLOPs (G)
UNet96.531.0446.20
CUNet96.927.9137.03
CRUNet97.128.0937.46
ResUNet97.032.8052.51
Table 4. Results on Green Space testing set. (Bold represents the best result).
Table 4. Results on Green Space testing set. (Bold represents the best result).
ModelPA (%) F 1 - Score MIoU (%)Params (M)FLOPs (G)Train Time (min)
FCN8S95.3594.6091.4918.6420.14805
Unet96.6095.7194.6131.0446.20795
Deeplab93.8792.4889.7955.7082.671150
CRUNet97.3396.3294.6828.0937.46780
CRAUNet97.3496.2694.7728.4437.46806
Table 5. The accuracy of different landcover.
Table 5. The accuracy of different landcover.
TypeFCN8sDeepLabV3+UNetCRAUNet
Froest96.31%95.89%97.42%97.81%
Golf Course93.72%94.34%98.26%97.98%
Bare Land93.68%95.12%96.49%97.06%
Sports Ground83.91%80.82%89.17%92.33%
Farmland93.26%94.05%96.42%97.27%
Aquaculture Area96.32%95.14%96.54%98.36%
Table 6. Confusion matrix for urban green space mapping.
Table 6. Confusion matrix for urban green space mapping.
Method Classification Data
Urban Green SpaceBackgroundTotal
Random samplingReference dataUrban Green Space27419293
Background14193210
Total288212500
Equal distance samplingUrban Green Space25815273
Background7211218
Total265226491
Table 7. Accuracy assessment for mapping.
Table 7. Accuracy assessment for mapping.
MethodPAMIoUPrecisionRecall F 1 - Score
Random samplings93.40%87.32%93.52%95.14%94.32%
Equal distance sampling95.52%91.35%94.51%97.36%95.90%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Men, G.; He, G.; Wang, G. Concatenated Residual Attention UNet for Semantic Segmentation of Urban Green Space. Forests 2021, 12, 1441. https://doi.org/10.3390/f12111441

AMA Style

Men G, He G, Wang G. Concatenated Residual Attention UNet for Semantic Segmentation of Urban Green Space. Forests. 2021; 12(11):1441. https://doi.org/10.3390/f12111441

Chicago/Turabian Style

Men, Guoqiang, Guojin He, and Guizhou Wang. 2021. "Concatenated Residual Attention UNet for Semantic Segmentation of Urban Green Space" Forests 12, no. 11: 1441. https://doi.org/10.3390/f12111441

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop