**1. Introduction**

The tra ffic road network is one of the essential geographic element of the urban system, which has critical applications in many fields, such as intelligent transportation, automobile navigation, and emergency support [1]. With the development of remote sensing technology and the advancement of remote sensing data processing methods, high temporal and spatial resolution, remote sensing data can provide high-precision ground information and permit the large-scale monitoring of roads. Remote sensing image data has quickly become the primary data source for the automatic extraction of road networks [2]. Automating road extraction plays a vital role in dynamic spatial development. Extracting the road in the urban area is a significant concern for the research on transportation, surveying, and mapping [3]. However, remote sensing images usually have sophisticated heterogeneous regional features with considerable intra-class distinctions and small inter-class distinctions. It is very challenging, especially in the urban area, as many buildings and trees exist, leading to shadow problems and a large number of segmented objects. The shadows of roadside trees or buildings can be observed from high-resolution images. Consequently, it is challenging to obtain high-precision road network information in the automatic extraction of road networks from remote sensing images.

There are many image segmentation methods for these problems by such conventional methods or machine learning algorithms. These methods are mainly divided into two categories: road centerline

extraction and road area extraction. This paper focuses on extracting road areas from high-resolution remote sensing images. The road centerline is a linear element, and the spatial geometry is a line formed by a series of ordered nodes, which is an essential characteristic line of the road. The road centerline is generally obtained from the segmented image of road binary map through morphology or Medial Axis Transform (MAT) [4]. The road area is a kind of surface element. The road area is generated by image segmentation. The di fferent spatial shape structure of boundary lines forms a variety of shape structures of surface elements [5]. Road centerline extraction [6,7] is used to detect the skeleton of the road, while road area extraction [8–13] generates the pixel-level label of the road, and there are some methods to extract the road area [14] while obtaining the road centerline. Huang et al. [8] try to extract road networks from the Ranging (LiDAR) data and light detection. Mnih et al. [9] used the Deep Belief Network (DBN) model to identify road targets in airborne images. Unsalan et al. [10] integrated three modules of road shape extraction module, road center probability detection module, and graphics-based module to extract road network from high-resolution satellite images. Cheng et al. [11] automatically extracted the road network information from complex remote sensing images based on the probability propagation method of graph cut. Saito et al. [12], based on the output of the channel function is put forward a new method of CNN's tabbed semantic segmentation. Alshehhi et al. [13] proposed an unsupervised road segmentation method based on the hierarchical graph. Road area extraction can be divided into pixel-level classification or image segmentation problems. Song et al. [15] proposed a method of road area detection based on the shape indexing feature of the support vector machine (SVM). Wang et al. [16] present a road detection method based on salient features and gradient vector flow (GVF) Snake. Rianto et al. [17] proposed a method to detect main roads from SPOT satellite images. The traditional road extraction method depends on the selected features. Zhang et al. [18] selected the seed points on the road, determined the direction, width, and starting point of the road in this section with a radial wheel algorithm, and proposed a semi-automatic method for road network tracking in remote sensing images. Movaghati et al. [19] proposed a new road network extraction framework by combining an extended Kalman filter (EKF) and a special particle filter (PF) to recover road tracks on obstructed obstacles. Gamba et al. [20] used adaptive filtering steps to extract the main road direction, and then proposed a road extraction method based on the prior information of road direction distribution. Li et al. [21] gradually extracted the road from the binary segmentation tree by determining the region of interest of the high-resolution remote sensing image and representing it as a binary segmentation tree.

However, the manually selected set of features is a ffected by many threshold parameters, such as lighting and atmospheric conditions. This empirical design method only deals with specific data, which limits its application in large-scale datasets. Deep learning is a representation learning method with multiple levels of representation, which is obtained by combining nonlinear but straightforward modules, each module representing a level of representation to a higher, slightly more abstract level. It allows raw data to be supplied to the machine and representations to be automatically discovered. In recent years, the deep convolutional network has been widely used in solving quite complex classification tasks, such as classification [22,23], semantic segmentation [24,25], and natural language processing [26,27].

Most importantly, these methods have proven to be profoundly robust to the appearance of di fferent images, which prompted us to apply them to fully automated road segmentation in high-resolution remote sensing images. Long promoted the fully-convolutional network (FCN) and applied it to the field of semantic segmentation. Likewise, new segmentation methods based on deep neural networks and FCN were developed to extract roads from high-resolution remote sensing images. Mnih [28] put forward a method that combined the context information to detect road areas in aerial images.

He et al. [29] improves the performance of road extraction networks by integrating the spatial pyramid pool (ASPP) with the Encoder–Decoder network to enhance the ability to extract detailed features of the road. Zhang et al. [30] enhanced the propagation e fficiency of information flow by fusing dense connections with convolutional layers of various scales. Aiming at the rich details of remote

sensing images, Li et al. [31] proposed a Y-type convolutional neural network for road segmentation of high-resolution visible remote sensing images. The proposed network not only avoids background interference but also makes full use of complex details and semantic features to segmen<sup>t</sup> multi-scale roads. RSRCNN [32] extracts roads based on geometric features and spatial correlation of roads. Su et al. [33] enhanced the U-Net network model based on available problems. According to the characteristics of a small sample of aerial images, Zhang et al. [34] proposed an improved network-based road extraction design framework. By refining the CNN architecture, Gao et al. [35] proposed the refined deep residual convolutional neural network (RDRCNN) to enable it to detect the road area more accurately. To solve the problems of noise, occlusion, and complex background, Yang et al. [36] successfully designed an RCNN unit and integrated it into the U-Net architecture. The significant advantage of this unit is that it retains detailed low-level spatial characteristics. Zhang et al. [37] proposed the ResU-Net to extract road information by combining the advantages of a residual unit and U-Net. According to the characteristics of the narrow, connected, complex road, Zhou et al. [38] proposed the D-LinkNet model while maintaining the road information, integration of the multi-scale characteristics of the high-resolution satellite images. Based on the iterative search process guided by the decision function of CNN, Bastani [39] proposed RoadTracer, which can automatically construct accurate road network maps directly from aerial images. For irregular footprint problems between road area and image, Li et al. [40] proposed a combining GANs and multi-scale context polymerization of semantic segmentation method, used for road extraction of UAV remote sensing images. Xu et al. [41] put forward a kind of road extraction method based on local and global information, to e ffectively extract the road information in remote sensing images.

Inspired by the Densely Connected Convolutional Networks and U-Net, we propose the DenseUNet, an architecture that takes advantage of Densely Connected Convolutional Networks and U-Net architecture. The proposed deep convolutional neural network is based on the U-Net architecture. There are three di fferences between our deep DenseUNet and U-Net.

First, the model used dense units rather than ordinary neural units as the basic building blocks. Second, the proportion of road and non-road in remote sensing images is seriously unbalanced. Thus, this paper tries to analyze and propose ideas in terms of this issue. Finally, the performance of the proposed method is validated by comparison with three classical semantic segmentation methods.
