**2. Methods**

#### *2.1. Encoder–Decoder Architecture*

State-of-the-art semantic image segmentation methods are mostly based on Encoder–Decoder architecture such as FCN [42], U-Net [43], SegNet [44]. An end-to-end trainable neural network recognizes the road in images and accurately segmented at the pixel level. Encoders usually use pre-trained models (such as VGG, Inception and ResNet), and each encoding layer includes the convolution, batch normalization (BN), the ReLU function and max pool layer. Each convolutional layer extracts features from all the maps in the previous layer, which has characteristics of simple structure and strong adaptability. Batch normalization [45] normalizes the input of each layer to reduce the internal-covariate-shift problem. It accelerates training and acts as a regularizer. The result shows that estimators based on a connected deep neural network with ReLU activation function and correctly selected the network. Pooling layer aims to compress the input feature map, which reduces the number of parameters in the training process and the degree of overfitting of the model. The main task of the Decoder is to map the distinguishable features to the pixel space for dense classification. Road network density refers to the ratio of the total mileage of the road network to the space of a given areaFor the extraction of relatively dense urban roads (in the same area, there are more roads), especially from high-resolution images, significant obstacles are leading to unreliable extraction results: complex image scenes and road models, as well as occlusion caused by high buildings and their shadows. Because of the above problems, this paper proposes DenseUNet, which is also based on Encoder–Decoder architectures and designs a more dense connection mechanism for the Encoder layer. Because of the complexity of road scenes, U-Net cannot identify road features at a deeper level, and the generalization ability of multi-scale information is limited, which cannot adequately convey scale information. DenseUNet is a network architecture in which each layer feeds forward (within each dense block) directly to each of the other layers. For each layer, the feature map for all other layers is treated as a separate input, and its feature map is passed as input to all subsequent layers. Additionally, our approach has far fewer parameters due to the intelligent construction of the model. This kind of network design method not only extracts low-level features such as road edges and textures but also identifies the deep contour and location information of the road.

#### *2.2. Backpropagation to Train Multilayer Architectures*

Multilayer architectures can be trained by stochastic gradient descent. If only the input function and internal weight of the module are relatively smooth, the gradient can be computed by using the backpropagation process. The backpropagation process used to compute the gradient of the objective function about the weight of stacked multilayer modules is only the practical application of chain rules of derivatives. The significant idea is that the derivative (or gradient) of the module input can be computed by working backward from the gradient of the module output [46].

Figure 1 shows that the input space becomes iteratively warped until the data points become distinguishable through the data flow at various layers of the system. In this way, it can learn highly complex functions. Deep learning is a form of presentation learning—providing the machine with the raw data and developing the representations needed for its pattern recognition—that consists of multiple representation layers. These layers are usually arranged sequentially and consist of a large number of original nonlinear operations, where the representation of such a layer (the original data input) is fed into the next layer and converted to a more abstract representation [47]. The output layer uses softmax activation function to classify the image in one of the classes, and we can use fine-tuned CNNs as feature extractors to achieve better results.

**Figure 1.** When data flows from one layer to another of the neural network, they are linearly separated by iteratively distorting the data. The final output layer outputs the probabilities of any class. This example illustrates the basic concepts of large-scale network usage.
