Generating Urban Road Networks with Conditional Diffusion Models

Gu, Xiaoyan; Zhang, Mengmeng; Lyu, Jinxin; Ge, Quansheng

doi:10.3390/ijgi13060203

Open AccessArticle

Generating Urban Road Networks with Conditional Diffusion Models

¹

Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2024, 13(6), 203; https://doi.org/10.3390/ijgi13060203

Submission received: 6 February 2024 / Revised: 11 June 2024 / Accepted: 14 June 2024 / Published: 16 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

The auto-generation of urban roads can greatly improve efficiency and productivity in urban planning and designing. However, it has also raised concerns amongst researchers over the past decade. In this paper, we present an image-based urban road network generation framework using conditional diffusion models. We first trained a diffusion model capable of generating road images with similar characteristics to the ground truth using four context factors. Then, we used the trained model as the generator to synthesize road images conditioned in a geospatial context. Finally, we converted the generated road images into road networks with several post-processes. The experiments conducted in five cities of the United States showed that our model can generate reasonable road networks, maintaining the layouts and styles of real examples. Moreover, our model has the ability to show the obstructive effect of geographic barriers on urban roads. By comparing models with different context factors as input, we find that the model that considers all four factors generally performs the best. The most important factor in guiding the shape of road networks is intersections, implying that the development of urban roads is not only restricted by the natural environment but is more strongly influenced by human design.

Keywords:

road network generation; generative models; diffusion models; geospatial context

1. Introduction

As a fundamental city infrastructure, road networks greatly influence urban traffic and human movement. The length, density, and hierarchical structure of urban roads have been investigated in previous empirical studies to show their relations with the economic development and commuting efficiency of cities [1,2,3,4]. Manual modeling for large-scale road networks is laborious and tedious; a road network is a complex structured system composed of a large number of road segments and intersections, and the layout of the road network varies across different regions. An efficient and reasonable road network generation model can be applied in many fields, including urban planning and management, traffic simulation, autonomous driving, and video game design. With the advances in generative techniques in computer sciences, the auto-generation of road networks has gained attention in the past decade.

Generally, the automatic generation model of road networks should meet two basic requirements: fidelity and diversity. The former means that the algorithm should be able to generate plausible road networks that maintain a similar style to the original one. The latter means there should be several attributes that can be customized to produce more diversified structures [5,6]. Researchers have presented a series of generative models for constructing road networks. Those models can be roughly divided into two categories: procedural modeling and deep generative models. Procedural modeling defines a set of carefully designed rules and gradually constructs a whole road network through iterations. It can perform well in specific regions but may not be adaptable to new scenarios because it is more specialized and strongly dependent on the rule sets. Deep generative models learn road information and properties from real cases and generate road segments with similar patterns. Compared with procedural modeling, they are more reliable and flexible to use.

Deep generative models have advanced significantly in recent years. Techniques including variational auto encoder (VAE), recurrent neural network (RNN), and generative adversarial network (GAN) have greatly expanded the capabilities for road network generation. Recently, diffusion models have been rapidly gaining attention since the release of Ho’s paper [7]. As a new class of generative models, diffusion models have been proven to be surprisingly effective not only in image synthesis but also in tasks including text-to-image and text-to-video [8,9,10].

This paper introduces diffusion models for the generation of urban road networks. To evaluate the quality of the generated results, experiments were conducted in five cities across the United States: Chicago, Los Angeles, New York, Phoenix, and Washington. Unlike previous studies that solely rely on AI-based technologies, we emphasize the importance of the geospatial context in generating road networks. According to Goodchild’s definition, the geospatial context refers to the surroundings or neighborhoods of events and features [11]. In this study, we specifically consider environmental and artificial factors related to the construction and expansion of urban roads. Existing studies in the literature have shown that the underlying terrain and human design play a role in determining street network structures [12,13]. Therefore, we selected four factors to represent these aspects: land use, elevation, slope (underlying terrain), and road intersections (human design).

The contributions of this paper are two-fold. First, we present a new generation framework that combines diffusion models with the geospatial context. By inputting the geospatial context of a region, our conditional diffusion model can generate road networks with characteristics and styles similar to the ground truth. Second, we evaluate the performance of our generation model using five metrics: Fréchet inception distance (FID), F1 score (F1), intersection over union (IOU), difference of average degree (DAD), and difference of average road length (DARL). We also analyze the impact of natural obstacles such as water bodies, mountains, and vegetation on urban roads. Our work provides urban designers and planners with an efficient method to generate realistic road configurations that match specific cities.

The remainder of this paper is organized as follows: Section 2 presents a review of related work. Section 3 describes the details of the methodological framework. Section 4 shows experiments in several cities and the corresponding results. Section 5 provides the discussion and conclusions for this paper.

2. Related Work

2.1. Procedural Modeling

For the first time, Parish and Müller implemented a procedural system called CityEngine to model streets and buildings based on grammar rules [14]. Aliaga et al. performed random walks to generate a street layout considering the statistical characteristics of street intersections [5]. Galin et al. applied a weighted anisotropic shortest path algorithm to create road trajectories by minimizing the cost function [15]. Beneš et al. generated major and minor roads for cities based on traffic simulation and formed a rich road structure for multi-city scenes [16]. Lyu et al. presented a pattern-based algorithm, which contains two pattern types, radial and checker, to generate a road network consisting of three layers: highway, arterial, and distributor [6]. Teng and Bidarra considered semantic information when creating complex structured roads and used a parametric method to allow the generation of standard structures, e.g., grids and suburban styles [17]. Nishida et al. proposed an interactive road design system using statistical information obtained from real examples to create roads at a large scale [18]. Ding et al. proposed a related neighbor graph (RNG)-based growth model, where two processes—expansion and densification—were performed in each turn [19]. Their results showed a node degree distribution that was similar to the observed empirical patterns. Lima et al. presented a shape grammar-based street design strategy with the goal of minimizing urban street length and maximizing facility service accessibility [20]. However, this method can only construct grid-style street layouts and struggles to generate irregular road networks. In general, procedural modeling is strongly dependent on pre-defined rules, which require manual modification to adapt to different regions.

2.2. Deep Generative Models

The recent development of deep learning has powered the application of generative models in road network synthesis. Deep generative models of road networks can be categorized into two types according to how input data are processed. The first type of model generates a road network as a graph consisting of vertices and edges. These models take geometric data of roads as input, e.g., node coordinates and road length, then generate new nodes and construct a valid topology gradually during the inference phase. Previous researchers have proposed several generative models for graph generation, such as GraphRNN [21], NetGAN [22], and GRAN [23]. Owaki and Machida extended the generation of graphs to road networks by using a generator to produce not only the node sequences but also the displacement attributes in their RoadNetGAN model [24]. Chu et al. used an encoder RNN to encode the neighboring information of vertices and a decoder RNN to predict the coordinates of the next vertices [25]. Mi et al. employed a two-level hierarchical graph generation model called HDMapGen to produce high-definition maps [26].

Another type of model is the image-based model, which treats road network generation as an image synthesis or image inpainting task. Hartmann et al. converted road network patches into binary images and used a GAN to synthesize street network images [27]. Then, they extracted the road vectors from the generated images. Fang et al. employed a GAN-based image inpainting technique to predict the missing road segments in pre-defined regions [13,28]. By considering local context and topological information, they produced more realistic results in hilly areas. Birsak et al. leveraged vector-quantized VAE and auto-regressive transformer to create city-scale road layouts under the input of condition maps, including land–water maps and road density [29]. There are also methods utilizing multi-source data for the generation of road networks. For example, Yang et al. constructed a framework called TR2RM to extract road networks by incorporating high-resolution remote sensing images and big trajectory data [30].

Our framework belongs to the class of image-based models. We focus on predicting and reconstructing the road network for the testing regions with layouts in the style of the provided training regions. Compared with graph-based models, we utilize more geospatial context factors that affect the generation of road networks. Unlike previous image-inpainting-based works [13,28], we do not need the surrounding road information but only include the geospatial context to generate the road networks for new areas.

2.3. Factors Affecting Urban Road Networks

The evolution of urban road layouts is shaped by geographical, social-economic, and historical factors. For geographical factors, researchers have indicated that land use and terrain exhibit strong effects on road network structure and pattern [3,31,32]. By taking 100 cities around the world as the study cases, Boeing found that Caracas, Hong Kong, and Sarajevo had the largest road circuities due to topographical constraints [12]. Strano et al. found that although the mean and total road length of urban areas were quite different from those of croplands, the rescaled road length distributions were indistinguishable at the global scale [3]. Song et al. explored the relationship between street centrality and land-use intensity (LUI) in the urban area of Jinan, China. Their analysis showed that LUI had a positive effect on closeness centrality and straightness centrality and a negative impact on betweenness centrality [33]. Researchers have also conducted empirical analyses of the interactive relationship between road density and vegetation [34,35].

Other studies take social–economic factors as the explainable variables, such as population and GDP. Researchers have demonstrated that as population grows, the geometric and structural fractal dimensions of urban streets also increase [36,37]. Cao et al. examined the scaling law between road length and population based on datasets in Shenzhen [38]. In terms of the effect of historical events, Barthelemy et al. revealed abrupt changes in the street network pattern of Paris arising from Haussmann transformations [39].

Although previous studies have presented various road network generation models, there still lacks an efficient model that can predict high-quality road networks based solely on geospatial context factors. We only consider land use, elevation, slope, and road intersections as natural geographical factors in this paper. Elevation and slope data characterize the topography of the study areas; together with land use data, they have a significant impact on urban road networks, as mentioned in previous studies. Intersections represent human planning and design, which is also important in the generation of road networks. We currently do not consider social–economic factors because geographical factors are relatively more stable and can provide enough guiding information for this research. However, we can incorporate them as an extension in our future research.

3. Methodology

The main idea of our method is to use road images as training input and context factors as extra conditions so that the urban road layouts can be guided to fit real situations. The overall workflow is illustrated in Figure 1. Our framework contains three parts in total. We first prepare the geospatial context and road network images and use them as input to train the conditional diffusion model. After epochs of training, we obtain and save the model parameters with the best performance. In the sampling phase, we use the pre-trained model as the generator to synthesize the road images from noises regarding the local context. Given the generated road images, we conduct post-processes to convert them to the final road networks.

3.1. Data Preparation

The input data consist of two parts: road networks and the geospatial context. Researchers have explored the relationships between street configurations and geographical conditions such as land use [40,41], elevation [13,42], and slope [13,43,44]. In addition, the effects of human design and interactions were also highlighted [39]. Therefore, we selected land use, elevation, slope, and road intersections as the four context factors to assist in the generation of road networks.

The urban road networks were downloaded from OpenStreetMap (OSM), including six road types: motorway, residential, secondary, primary, tertiary, and trunk. Nodes with a degree greater than 1 were extracted from the road networks as intersections. The land use was collected from ESRI Sentinel-2 10-Meter Land Use/Land Cover data, which contain nine classes: water, trees, flooded vegetation, crops, built area, bare ground, snow/ice, clouds, and rangeland. The DEM was obtained from the Shuttle Radar Topography Mission (SRTM), which has a spatial resolution of 30 m. The slope data were calculated based on the elevations.

Because our model is an image generator, we needed to convert the obtained road networks and geospatial context data into image patches for training. We randomly generated 10,000 squares with a size of 1 km × 1 km within each city. To remove areas where the road density is too low, we only retained the squares with a road length that is larger than 2 km. Then, we clipped the road networks and geospatial contexts with the squares to produce the corresponding road images and context images, each of which is a 1 × 128 × 128 image. Under this condition, the spatial resolution of the pixels is 7.81 m, which is smaller than that of the DEM (30 m) and land use data (10 m). To resolve these resolution mismatches, bilinear interpolation was conducted for the DEM and land use data to adjust their resolutions to 7.81 m. Considering that the land use value was categorical and contained 9 classes in total, we first converted land use to multi-channel images, each of which has a size of 9 × 128 × 128. Then, we generated 9 independent gray images for each multi-channel image. We normalized the values of elevation and slope images into a range of 0 to 255 with the following equation:

V_{n o r m} = 255 \cdot \frac{V - V_{m i n}}{V_{m a x} - V_{m i n}}

(1)

where

V

is the raw elevation or slope,

V_{n o r m}

is the normalized value, and

V_{m i n}

and

V_{m a x}

are the minimum and maximum values of a city, respectively. A detailed description of the input images is presented in Table 1.

3.2. Conditional Diffusion Model

In this study, the diffusion model was employed as the generator to create road images. Specifically, we used the denoising diffusion probabilistic model (DDPM) proposed by Ho et al. [7] to perform conditional image generation. DDPM can be seen as a parameterized Markov chain with T steps, and it consists of the forward process and the denoising process. In the following paragraphs, we first introduce the details of the original DDPM and then explain how we adapt the diffusion model to conditional image generation.

3.2.1. Forward Process

Given the original image

y_{0}

sampled from the distribution

q (y)

, the forward process gradually adds noise to it for

T

steps, forming the following trajectory:

y_{0}, y_{1}, y_{2}, \dots, y_{T}

. The conditional probability

q (y_{t} | y_{t - 1})

is a normal distribution as follows:

q (y_{t} | y_{t - 1}) = N (y_{t}; \sqrt{α_{t}} y_{t - 1}, (1 - α_{t}) I)

(2)

where

α_{1 : T}

are hyperparameters and subject to (0, 1). Given

y_{0}

as the input, the trajectory after

T

iterations is

q (y_{1 : T} | y_{0}) = \prod_{t = 1}^{T} q (y_{t} | y_{t - 1})

(3)

Furthermore, we can obtain the distribution of

y_{t}

conditioned on

y_{0}

as follows:

q (y_{t} | y_{0}) = N (y_{t}; \sqrt{{\bar{α}}_{t}} y_{0}, (1 - {\bar{α}}_{t}) I)

(4)

where

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

. When T is near-infinite,

y_{T}

represents an isotropic Gaussian noise. For a noisy image

{\tilde{y}}_{t} \sim q (y_{t} | y_{0})

, we can rewrite it with the reparameterization trick as follows:

{\tilde{y}}_{t} = {\sqrt{\bar{α}}}_{t} y_{0} + \sqrt{1 - {\bar{α}}_{t}} ε, ε \sim N (0, I)

(5)

After some algebraic manipulation, we can derive the distribution of

y_{t - 1}

conditioned on

y_{t}

and

y_{0}

, which is helpful for the parametrization of the following denoising process.

\begin{array}{l} q (y_{t - 1} | y_{t}, y_{0}) = N (y_{t - 1}; μ_{q} (y_{t}, y_{0}), Σ_{q} (t)) \\ μ_{q} (y_{t}, y_{0}) = \frac{\sqrt{α_{t}} (1 - {\bar{α}}_{t - 1})}{1 - {\bar{α}}_{t}} y_{t} + \frac{\sqrt{{\bar{α}}_{t - 1}} (1 - α_{t})}{1 - {\bar{α}}_{t}} y_{0} \\ Σ_{q} (t) = \frac{(1 - α_{t}) (1 - {\bar{α}}_{t - 1})}{1 - {\bar{α}}_{t}} I \end{array}

(6)

3.2.2. Denoising Process

If we can reverse the forward process, we will be able to recover the original image from Gaussian noise. Denoising is the backward process that learns the reverse distribution

q (y_{t - 1} | y_{t})

, which is estimated using a parameterized model

p_{θ}

, where

θ

denotes the parameters of the neural network. The conditional probability

p_{θ} (y_{t - 1} | y_{t})

in the reverse process is given by

p_{θ} (y_{t - 1} | y_{t}) = N (y_{t - 1}; μ_{θ} (y_{t}, t), Σ_{θ} (y_{t}, t))

(7)

The joint distribution of the reverse process is

p_{θ} (y_{0 : T}) = p_{θ} (y_{T}) \prod_{t = 1}^{T} p_{θ} (y_{t - 1} | y_{t})

(8)

3.2.3. Model Training

Our training target is to recover the original image given a noisy image generated by the forward process. Note that the forward processes are modeled as Gaussian distributions with pre-defined parameters; thus, denoising is the only process that we need to learn. In order to optimize the denoising model, researchers have demonstrated that the denoising transition should be matched as closely as possible to the ground-truth posterior distribution [7]. It has also been found that learning a denoised model by predicting the original image is equivalent to predicting the source noise that determines the noisy image. Furthermore, better performance can be achieved through noise prediction [45]. Eventually, the loss function is expressed as

E_{y_{0}, t, ε} ‖ε - ε_{θ} (\underset{{\tilde{y}}_{t}}{\underset{︸}{\sqrt{{\bar{α}}_{t}} y_{0} + \sqrt{1 - {\bar{α}}_{t}} ε}}, t {)‖}^{2}

(9)

3.2.4. Conditional Image Generation

The dataset we used contains road images and corresponding context images and is denoted as

D = {y_{i}, c_{i}}_{i = 1}^{N}

, where N is the number of samples. Following Saharia et al. [46], we adapted the DDPM to conditional image generation.

In the forward process, we add Gaussian noise to an original road image to generate noisy images, as mentioned above. In the denoising process, the main difference to the plain DDPM is that we predict the original road image with both noisy road images and context images. In this circumstance, the formula can be replaced by

p_{θ} (y_{t - 1} | y_{t}, c)

. We start with the Gaussian noise image

y_{T}

and concatenate it with the corresponding context images as input to recover

y_{T - 1}

. Then, we predict

y_{T - 2}

under

y_{T - 1}

and the context images. We finally recover the original road image through iterative refinements.

Our DDPM employs a U-net model for the denoising process. It is a symmetric architecture with an input and output of the same size. A detailed description of the U-net architecture used in our diffusion model is shown in Figure 2, where the output size of each layer is shown below that layer.

3.3. Post-Processing

After obtaining the synthesized road images, we conducted three post-processes. First, we converted the grayscale images to binary-value images by setting pixels greater than 127 to 255 and those smaller than 127 to 0. Then, we used the thin algorithm proposed by Zhan [47] to reduce the width of the linear features of the images. Lastly, we performed vectorization for the thinned images to generate road vector lines. We can reconstruct the road graphs based on these road vectors.

3.4. Evaluation

In order to comprehensively evaluate the performance of our models, we investigated the similarity in terms of both the generated images and networks. Two types of metrics were used, as described below.

3.4.1. Image-Based Metrics

(1): Fréchet inception distance (FID). The FID is a metric that assesses the quality of images created by generative models. It is calculated by comparing the distribution of generated images with that of the ground truth. For more detail, refer to [48]. A smaller FID means that the two datasets are more similar and the corresponding model is better.
(2): F1 score. The F1 score is defined based on the precision and recall scores, which are denoted as follows [49]:

\begin{array}{l} precision = \frac{TP}{TP + FP} \\ recall = \frac{TP}{TP + FN} \\ F 1 = 2 \cdot \frac{precision \cdot recall}{precision + recall} \end{array}

(10)

where TP (true positive) is the number of samples correctly predicted as positive, FP is the number of samples wrongly predicted as positive, and FN is the number of samples wrongly predicted as negative. For each generated road image, we consider its pixels as samples and calculate the F1 score. After obtaining the F1 scores of all images, we compute the average as the final F1 score metric. A larger F1 score implies better generation results.

(3): Intersection over union (IOU). We first calculated the area of overlapping road pixels between the generated image and the ground truth, and then the area of union road pixels. Then, IOU is calculated as follows [50]:

IOU = \frac{Area of Overlap}{Area of Union}

(11)

Similar to the F1 score, the average IOU of all images is computed as the final IOU. A larger IOU is better.

3.4.2. Network-Based Metrics

(1): Difference in average degree (DAD). As mentioned earlier, a road network is reconstructed for each generated road image after the post-processes. We calculate the average degree for both the generated and real road networks in the testing regions. The difference between these two average degrees is used as one metric to evaluate the performance of our models in terms of topology [51]. Obviously, the closer DAD is to 0, the better the generation model is.
(2): Difference in average road length (DARL). For each generated road network, we first sum the length (in kilometers) of roads in that network and then calculate the average value across all road networks. Similarly, the average road length of real networks is also calculated. The difference between these two average values is taken as the other evaluation metric from the perspective of geometry [51]. Again, the closer DARL is to 0, the better the generation model is.

4. Experiments and Results

4.1. Experiment Area

We collected road networks and geospatial data from five cities in the United States: Chicago, Los Angeles, New York, Phoenix, and Washington (Figure 3). The statistics presented in Tables S1–S3 illustrate the differences in road networks and geospatial context data among these cities. These five cities were selected for two reasons. First, they represent diverse road network patterns. New York, Chicago, and Washington have relatively more regular streets, whereas the streets of Los Angeles and Phoenix show more irregular patterns. Second, the five cities differ in their geographical conditions. New York and Los Angeles are coastal cities, Chicago is located near a large lake (Lake Michigan), Washington has rivers running through its downtown area, and Phoenix is an inland and mountainous city. This geographical diversity ensures that the findings from this research can be generalized to a broader range of settings, enhancing the relevance and practical applications of the results.

As shown in Figure 3, for each city, we randomly split the patches into a training set (80%) and a testing set (20%). The exact splitting process is as follows:

(1): Randomly select a square $s$ from the set of non-testing squares and mark it as a testing square.
(2): Mark every neighboring square that intersects with $s$ as testing squares.
(3): Calculate the proportion of the current testing set. If it exceeds 20%, end the process and collect all image patches corresponding to the testing squares to generate the testing set. If not, return to (1).

Because the training and testing sets were spatially non-overlapping, our testing could be performed completely on the untouched spatial area. During the training procedure, we used the road image patches and context image patches as input to generate road images with the size of 128 × 128 pixels. The batch size was set to 32, and the steps of the diffusion process were set to 1000. By minimizing the loss function mentioned in Section 3, we obtained the best-performance diffusion model. In the testing phase, we used the model to synthesize road images conditioned on context images. Finally, we evaluated the similarity between the generated results and real road networks in terms of the three metrics (see Section 4.3). All the experiments were carried out on a Linux platform with 2 Intel Xeon Gold6230 processors and 4 NIVIDIA A100 GPUs. Our road network generation model was implemented using PyTorch. For each city, the training and testing tasks took around 2 and 3 h, respectively.

4.2. Generation Results

For each city, we trained diffusion models and chose the one that achieved the lowest loss as the final road network generator. The probability distributions of the pixels in the generated road images for the different cities are displayed in Figure 4. The bar height represents the probability, while the bar width is fixed to 5 pixels. It is noteworthy that most of the pixel values are concentrated at two ends, where 0 represents non-road pixels and 255 signifies road pixels. However, there is an exception for Washington, where the left peak of the distribution is located between 10 and 15. Given that all the distributions exhibit a bimodal pattern, it is reasonable to use a threshold of 127 for binarization to obtain the final road images.

Following previous research [52,53], we roughly classified the road networks into three patterns: regular, irregular, and suburban. The regular pattern is mainly composed of grid-like streets. The irregular pattern includes both main and side roads and has a more complex road network layout. The suburban pattern is located in the suburban regions of cities; it is characterized by a relatively small road density and the presence of curved road lines. Figure 5 shows several examples of generated road networks of each pattern in the testing set. The generated results of regular patterns were the most similar to the real road networks. Our model successfully recovered the road layouts and the majority of road segments in the four cases, except for some minor roads in cases 1 and 4. For the irregular pattern, our model can capture the overall shape of road networks, but the result is less accurate for short and side roads. The discrepancy between our generated results and the ground truth was the largest in suburban regions. A possible reason for this is that there are fewer road intersections in suburban regions, giving our model more freedom to generate road images. Another reason is that curved road courses are harder to predict compared with straight lines, as our training datasets in the five cities contain more straight roads than curved roads.

4.3. Obstructive Effect

The forms and topologies of road networks are constrained by natural obstacles, including water, mountains, and vegetation [54,55,56]. Herein, vegetation refers specifically to tall dense trees, which are represented with a pixel value of 2 in the land use data. We first present several samples in Figure 6 to examine the obstructive effects of the three types of obstacles on road network generation. In Figure 6a, land and water are painted yellow and blue, respectively. It can be found that water significantly restricts the expansion of roads. Compared with cases 3 and 4, cases 1 and 2 show better results in recovering the spatial distribution of roads, which should be due to their regular street forms and larger number of intersections. It is worth noting that our model does not seem to be effective at predicting the bridge in case 4. The existence of water in the middle prevents the bridge line from expanding across the two sides. Figure 6b shows the cases near mountains. In all four cases, the roads generated by our model were mainly located in gently sloping areas, consistent with the real situation. This suggests that mountains are also an obvious obstacle to road expansion in our model. Figure 6c illustrates the cases containing vegetation. Similar to the above two obstacles, the existence of vegetation patches presents obvious obstruction to urban roads.

Next, we focused on individual road network patches within the testing regions that contain water, mountains, or vegetation. We considered each of these patches as an observation and built linear regression models for four different scenarios based on these observations (Figure 7). In these models, the dependent variable is the sum of road length in a given patch, and the explanatory variables are the areas of water, mountains, or vegetation in that patch. To calculate the water area, we utilized land use data to extract water pixels and then summed their individual areas. The vegetation area was also calculated using this method. For the sake of simplicity, the extraction of mountains was performed on DEMs using the mathematical morphological algorithm described in [57].

The regression results for the water scenario are shown in Tables S4 and S5. The coefficients of the water area were −6.8124 and −6.4778 for the generated results (Table S4) and ground truth (Table S5), respectively, with p-values less than 0.01. This suggests that water has a significant negative impact on the length of the road network. However, the R2 values for both scenarios were relatively low, indicating that other factors besides water also influence the development and growth of urban roads. When compared to the real cases, the regression model using generated results decreased the coefficient and improved the R2, indicating that our model enhanced the obstructive effect of water on road network generation.

Tables S6 and S7 present the statistics for mountain area and road length for two datasets. The R² values were 0.0648 and 0.0796, respectively, which were much smaller than those obtained in previous models. This suggests that the impact of mountains on road networks is less pronounced than that of water. The regression model using generated results improved the coefficient from −2.8406 to −2.5283 and achieved a lower R², indicating that the obstructive effect of mountains is slightly weaker in our generation model compared to that of water.

Tables S8 and S9 present the regression results for the cases containing vegetation. The coefficient of vegetation for the generated results (−2.6512) was larger than that for the ground truth (−4.1866). The R² values for the two cases were 0.0773 and 0.2147, respectively. This indicates that the obstructive effects on road length in our model were slighter than in the real cases.

Overall, the p-values in all six cases were below the significance level (0.01), and the generated cases exhibited similar negative coefficients to the real-world cases. Therefore, it can be concluded that there exist obstructive effects in watery, mountainous, and vegetated regions, and our model can faithfully reproduce such effects.

4.4. Model Comparison

We built five models with distinct combinations of context factors to further investigate the effects of context factors on road generation. Table 2 shows that Models 1–4 include three context factors with only one factor being excluded each time, while Model 5 is comprehensive and incorporates all the factors. In contrast, the baseline model is a simple diffusion model that does not utilize any context factors as inputs.

To assess the visual and geometric similarity between the generated results and ground truths, we evaluated both the generated images and road networks in the testing set. The evaluation results are summarized in Table 3. We assigned scores to each model based on their rankings in each row of Table 3. The scoring rule awards points from 6 to 1 to models ranked 1 to 6. By summing the scores across all 25 rows, we obtained the final score for each model. As Figure 8 demonstrates, Model 5 achieved the highest score among all models, indicating its superior performance in predicting road networks for the five cities. Furthermore, all our five models outperformed the baseline, confirming that geospatial factors contribute to the formation of realistic street layouts, which aligns with previous research findings [12,13].

In terms of the three image-based metrics (FID, F1, and IOU), it is interesting that Model 1, the only model that did not take land use as context, outperformed the other models in both F1 and IOU for Los Angeles. This may be attributed to the fact that the built area accounts for a significant proportion (89%) of all land use in the study area of Los Angeles, indicating that land use data may play a relatively minor role in generating road networks in this city. Moreover, the two models (Model 3 and Model 4) that did not utilize terrain factors (elevation and slope) achieved superior image quality compared to Model 5 for regions of Washington. This suggests that terrain has less impact on road networks in Washington. For the two network-based metrics (DAD and DARL), Model 5 achieved the smallest absolute values in three cities: Los Angeles, Phoenix, and Washington. This demonstrates the model’s superiority in generating road networks with similar geometric and structural characteristics in these cities.

In general, the conditional diffusion model could learn more hidden patterns and laws by taking into consideration more geospatial context factors; thus, it could perform better in yielding reasonable road networks. Model 4 is the worst-performing model except for the baseline, implying that intersections are the most important guide for the road generation task. Due to the lack of this key artificial information, Model 4 purely learned with natural factors (land use, elevation, and slope) and generated more free-style results, suggesting that the development of urban road networks is not only restricted by the natural environment but is more strongly influenced by human design and planning.

5. Discussion

Currently, deep generative models are rapidly growing and attracting interest in fields such as urban planning and street design [58]. This paper proposes a road network generation framework based on conditional diffusion models. We first generated road images using diffusion models and then converted them into network formats through several post-processing steps. To evaluate the performance of our model, we conducted experiments on five US cities (Chicago, Los Angeles, New York, Phoenix, and Washington) with diverse geographical conditions.

The original DDPM was adapted for conditional image generation, enabling the utilization of the geospatial context to control the denoising process. After a comprehensive analysis of previous studies, four context factors were chosen: land use, elevation, slope (as representatives of the underlying terrain), and intersections (reflecting human design). The comparative experiments demonstrate that the model incorporating all four context factors generally outperforms the other four models across the five evaluation metrics. This indicates that all four context factors play a significant role in road network generation. By considering more factors, the model produces more reasonable and realistic results. Notably, the model that does not utilize road intersections as a condition performs the worst in all five cities, emphasizing the critical impact of manual planning in the development of road networks.

The obstructive effects of water, mountains, and vegetation were analyzed for both synthetic and real patches. The negative coefficients of the explanatory variables in the six regression models, specifically the water, mountain, and vegetation areas, serve as confirmation that all three types of obstacles constrain road expansion. Our model accurately captures and reproduces these obstructive effects. When comparing the impact of water and the other two factors (mountain and vegetation), it is observed that water has a smaller coefficient but a larger R² value. This indicates that water has a more pronounced obstructive effect than mountains and vegetation. This difference can be attributed to the fact that the obstacles posed by long rivers or large seas often require the construction of bridges to be overcome, whereas mountainous or vegetation-covered regions with gentle slopes may still be amenable to road construction. While the obstructive effect is evident in both synthetic and real scenes, there are notable differences in the coefficients between these two settings. In the synthetic scene, water has a smaller coefficient (−6.8124), whereas mountains and vegetation have larger coefficients (−2.5283 and −2.6512). This suggests that during the generation process, our model tends to amplify the influence of water and diminish the influence of mountains and vegetation.

By combining image-based generative techniques with geospatial context data, our model demonstrates the ability to reconstruct street layouts with high visual and structural qualities. Furthermore, our model offers the advantage of producing road networks that resemble ground truths without having prior knowledge of surrounding road information. By inputting only context data, we can predict the road network distribution of a testing region. When comparing different road patterns, our model performs best in regular patterns, followed by irregular patterns. However, in suburban areas characterized by low road segment density and curved lines, our model’s performance is relatively poor.

There are two limitations in our research at present. The first is that we generate a local road network within a grid with the size of 1 km × 1 km each time. The second is that the resolution of the generated image is limited to 128 × 128 pixels. This is because our U-net architecture has a number of self-attention layers and is computationally heavy. Therefore, we should generate road images at small resolution. Our future studies will be focused on enlarging the generation range of road networks and improving road image resolution by introducing techniques for the images including outpainting and super-resolution.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijgi13060203/s1, Table S1: Statistics of road networks in the five cities; Table S2: Rates of land use classes in the five cities; Table S3: Statistics of elevation and slope in the five cities; Table S4: Regression result for generated road networks (water area vs. road length); Table S5: Regression result for real road networks (water area vs. road length); Table S6: Regression result for generated road networks (mountain area vs. road length); Table S7: Regression result for real road networks (mountain area vs. road length); Table S8: Regression result for generated road networks (vegetation area vs. road length); Table S9: Regression result for real road networks (vegetation area vs. road length).

Author Contributions

Xiaoyan Gu designed the research, implemented the code, and wrote the paper; Mengmeng Zhang and Jinxin Lyu helped in data collection and language correction. Quansheng Ge provided research suggestions and funding. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No. XDA23100040308.

Data Availability Statement

The codes and data that support this work are available at the figure share link: https://doi.org/10.6084/m9.figshare.24618942.v1 (accessed on 23 November 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, L.; Li, R.; Zhang, J.; Di, Z. Population-weighted efficiency in transportation networks. Sci. Rep. 2016, 6, 26377. [Google Scholar] [CrossRef] [PubMed]
Ng, C.P.; Law, T.H.; Jakarni, F.M.; Kulanthayan, S. Road infrastructure development and economic growth. IOP Conf. Ser. Mater. Sci. Eng. 2019, 512, 012045. [Google Scholar] [CrossRef]
Strano, E.; Giometto, A.; Shai, S.; Bertuzzo, E.; Mucha, P.J.; Rinaldo, A. The scaling structure of the global road network. R. Soc. Open Sci. 2017, 4, 170590. [Google Scholar] [CrossRef] [PubMed]
Molinero, C.; Thurner, S. How the geometry of cities determines urban scaling laws. J. R. Soc. Interface 2021, 18, 20200705. [Google Scholar] [CrossRef] [PubMed]
Aliaga, D.G.; Vanegas, C.A.; Benes, B. Interactive example-based urban layout synthesis. In Proceedings of the ACM SIGGRAPH Asia 2008 Papers, Singapore, 10–13 December 2008. [Google Scholar] [CrossRef]
Lyu, X.; Han, Q.; Vries, B. Procedural modeling of urban layout: Population, land use, and road network. Transp. Res. Procedia 2017, 25, 3333–3342. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar]
Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.; Kamyar, G.; Karagol Ayan, B.; Mahdavi, S.S.; Gontijo Lopes, R.; et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv 2022, arXiv:2205.11487. [Google Scholar]
Kumari, N.; Zhang, B.; Zhang, R.; Shechtman, E.; Zhu, J.Y. Multi-Concept Customization of Text-to-Image Diffusion. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]
Khachatryan, L.; Movsisyan, A.; Tadevosyan, V.; Henschel, R.; Wang, Z.; Navasardyan, S.; Shi, H. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators. arXiv 2023, arXiv:2303.13439. [Google Scholar]
Goodchild, M.F. The Quality of Geospatial Context. In Proceedings of the QuaCon 2009, Stuttgart, Germany, 25–26 June 2009. [Google Scholar] [CrossRef]
Boeing, G. Urban spatial order: Street network orientation, configuration, and entropy. Appl. Netw. Sci. 2019, 4. [Google Scholar] [CrossRef]
Fang, Z.; Qi, J.; Fan, L.; Huang, J.; Jin, Y.; Yang, T. A topography-aware approach to the automatic generation of urban road networks. Int. J. Geogr. Inf. Sci. 2022, 36, 2035–2059. [Google Scholar] [CrossRef]
Parish, Y.I.H.; Müller, P. Procedural modeling of cities. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 12–17 August 2001. [Google Scholar] [CrossRef]
Galin, E.; Peytavie, A.; Maréchal, N.; Guérin, E. Procedural Generation of Roads. Comput. Graph. Forum 2010, 29, 429–438. [Google Scholar] [CrossRef]
Beneš, J.; Wilkie, A.; Křivánek, J. Procedural Modelling of Urban Road Networks. Comput. Graph. Forum 2014, 33, 132–142. [Google Scholar] [CrossRef]
Teng, E.; Bidarra, R. A semantic approach to patch-based procedural generation of urban road networks. In Proceedings of the 12th International Conference on the Foundations of Digital Games, Hyannis, MA, USA, 14–17 August 2017. [Google Scholar] [CrossRef]
Nishida, G.; Garcia-Dorado, I.; Aliaga, D.G. Example-Driven Procedural Urban Roads. Comput. Graph. Forum 2016, 35, 5–17. [Google Scholar] [CrossRef]
Ding, J.-X.; Qin, R.-K.; Guo, N.; Long, J.-C. Urban road network growth model based on RNG proximity graph and angle restriction. Nonlinear Dyn. 2019, 96, 2281–2292. [Google Scholar] [CrossRef]
Lima, F.T.; Brown, N.C.; Duarte, J.-P. A Grammar-Based Optimization Approach for Designing Urban Fabrics and Locating Amenities for 15-Minute Cities. Buildings 2022, 12, 1157. [Google Scholar] [CrossRef]
You, J.; Ying, R.; Ren, X.; Hamilton, W.L.; Leskovec, J. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. arXiv 2018, arXiv:1802.08773. [Google Scholar]
Bojchevski, A.; Shchur, O.; Zügner, D.; Günnemann, S. NetGAN: Generating Graphs via Random Walks. arXiv 2018, arXiv:1803.00816. [Google Scholar]
Liao, R.; Li, Y.; Song, Y.; Wang, S.; Hamilton, W.L.; Duvenaud, D.; Urtasun, R.; Zemel, R. Efficient Graph Generation with Graph Recurrent Attention Networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Owaki, T.; Machida, T. RoadNetGAN: Generating Road Networks in Planar Graph Representation. In Proceedings of the Neural Information Processing, Bangkok, Thailand, 18–22 November 2020. [Google Scholar] [CrossRef]
Chu, H.; Li, D.; Acuna, D.; Kar, A.; Shugrina, M.; Wei, X.; Liu, M.-Y.; Torralba, A.; Fidler, S. Neural turtle graphics for modeling city road layouts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
Mi, L.; Zhao, H.; Nash, C.; Jin, X.; Gao, J.; Sun, C.; Schmid, C.; Shavit, N.; Chai, Y.; Anguelov, D. HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps, In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [CrossRef]
Hartmann, S.; Weinmann, M.; Wessel, R.; Klein, R. StreetGAN: Towards Road Network Synthesis with Generative Adversarial Networks. In Proceedings of the 25th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, Czech, 29 May–2 June 2017. [Google Scholar]
Fang, Z.; Qi, J.; Fan, L.; Huang, J.; Jin, Y.; Yang, T. A framework for human-computer interactive street network design based on a multi-stage deep learning approach. Comput. Environ. Urban Syst. 2022, 96, 101853. [Google Scholar] [CrossRef]
Birsak, M.; Kelly, T.; Para, W.; Wonka, P. Large-Scale Auto-Regressive Modeling of Street Networks. arXiv 2022, arXiv:2209.00281. [Google Scholar]
Yang, X.; Fan, X.; Su, Y.; Guan, Q.; Tang, L. TR2RM: An urban road network generation model based on multisource big data. Int. J. Digit. Earth 2024, 17, 2344596. [Google Scholar] [CrossRef]
Eliou, N.; Kehagia, F. The interaction between road network and natural landscape type. WIT Trans. Ecol. Environ. 2007, 102, 861–868. [Google Scholar] [CrossRef]
Mohajeri, N.; French, J.R.; Gudmundsson, A. Entropy Measures of Street-Network Dispersion: Analysis of Coastal Cities in Brazil and Britain. Entropy 2013, 15, 3340–3360. [Google Scholar] [CrossRef]
Song, C.; Liu, Q.; Song, J.; Yang, D.; Jiang, Z.; Ma, W.; Niu, F.; Song, J. The Interactive Relationship between Street Centrality and Land Use Intensity—A Case Study of Jinan, China. Int. J. Environ. Res. Public Health 2023, 20, 5127. [Google Scholar] [CrossRef] [PubMed]
Kleinschroth, F.; Laporte, N.; Laurance, W.F.; Goetz, S.J.; Ghazoul, J. Road expansion and persistence in forests of the Congo Basin. Nat. Sustain. 2019, 2, 628–634. [Google Scholar] [CrossRef]
Riitters, K.; Potter, K.; Iannone, B.V., III; Oswalt, C.; Fei, S.; Guo, Q. Landscape correlates of forest plant invasions: A high-resolution analysis across the eastern United States. Divers. Distrib. 2018, 24, 274–284. [Google Scholar] [CrossRef]
Deng, H.; Wen, W.; Zhang, W. Analysis of Road Networks Features of Urban Municipal District Based on Fractal Dimension. ISPRS Int. J. Geo-Inf. 2023, 12, 188. [Google Scholar] [CrossRef]
Lu, Z.; Zhang, H.; Southworth, F.; Crittenden, J. Fractal dimensions of metropolitan area road networks and the impacts on the urban built environment. Ecol. Indic. 2016, 70, 285–296. [Google Scholar] [CrossRef]
Cao, J.; Tu, W.; Cao, R.; Gao, Q.; Chen, G.; Li, Q. Untangling the association between urban mobility and urban elements. Geo-Spat. Inf. Sci. 2023, 1–19. [Google Scholar] [CrossRef]
Barthelemy, M.; Bordin, P.; Berestycki, H.; Gribaudi, M. Self-organization versus top-down planning in the evolution of a city. Sci. Rep. 2013, 3, 2153. [Google Scholar] [CrossRef]
Wang, Z.; Han, Q.; de Vries, B. Land Use/Land Cover and Accessibility: Implications of the Correlations for Land Use and Transport Planning. Appl. Spat. Anal. Policy 2019, 12, 923–940. [Google Scholar] [CrossRef]
Ahmadzai, F. Analyses and modeling of urban land use and road network interactions using spatial-based disaggregate accessibility to land use. J. Urban Manag. 2020, 9, 298–315. [Google Scholar] [CrossRef]
Fang, Z.; Jin, Y.; Yang, T. Incorporating Planning Intelligence into Deep Learning: A Planning Support Tool for Street Network Design. J. Urban Technol. 2022, 29, 99–114. [Google Scholar] [CrossRef]
Kweon, H. Comparisons of Estimated Circuity Factor of Forest Roads with Different Vertical Heights in Mountainous Areas, Republic of Korea. Forests 2019, 10, 1147. [Google Scholar] [CrossRef]
Drazic, S.; Danilovic, M.; Ristic, R.; Stojnic, D.; Antonic, S. Evaluation of Morphometric Terrain Parameters and Their Influence on Determining Optimal Density of Primary Forest Road Network. Croat. J. For. Eng. 2023, 44, 301–312. [Google Scholar] [CrossRef]
Luo, C. Understanding Diffusion Models: A Unified Perspective. arXiv 2022, arXiv:2208.11970. [Google Scholar]
Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image Super-Resolution via Iterative Refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4713–4726. [Google Scholar] [CrossRef] [PubMed]
Zhan, C. A hybrid line thinning approach. In Proceedings of the Auto-Carto 11, Minneapolis, MN, USA, 30 October–1 November 1993. [Google Scholar]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Sasaki, Y. The truth of the F-measure. Teach. Tutor Mater 2007, 1, 1–5. [Google Scholar]
Moulton, R.; Jiang, Y. Maximally Consistent Sampling and the Jaccard Index of Probability Distributions. arXiv 2018, arXiv:1809.04052. [Google Scholar]
Kempinska, K.; Murcio, R. Modelling urban networks using Variational Autoencoders. Appl. Netw. Sci. 2019, 4, 114. [Google Scholar] [CrossRef]
Zhou, Q.; Li, Z. Empirical determination of geometric parameters for selective omission in a road network. Int. J. Geogr. Inf. Sci. 2016, 30, 263–299. [Google Scholar] [CrossRef]
Hacar, M.; Gökgöz, T. A New, Score-Based Multi-Stage Matching Approach for Road Network Conflation in Different Road Patterns. ISPRS Int. J. Geo-Inf. 2019, 8, 81. [Google Scholar] [CrossRef]
Strano, E.; Nicosia, V.; Latora, V.; Porta, S.; Barthélemy, M. Elementary processes governing the evolution of road networks. Sci. Rep. 2012, 2, 296. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Ducruet, C.; Wang, W. Evolution, accessibility and dynamics of road networks in China from 1600 BC to 1900 AD. J. Geogr. Sci. 2015, 25, 451–484. [Google Scholar] [CrossRef]
Koks, E.; Rozenberg, J.; Tariverdi, M.; Dickens, B.; Fox, C.; van Ginkel, K.; Hallegatte, S. A global assessment of national road network vulnerability. Environ. Res. Infrastruct. Sustain. 2023, 3, 025008. [Google Scholar] [CrossRef]
Sathyamoorthy, D. Extraction of mountains from digital elevation models using mathematical morphology. GIS Malays. 2006, 1, 16–19. [Google Scholar]
Jiang, F.; Ma, J.; Webster, C.J.; Chiaradia, A.J.F.; Zhou, Y.; Zhao, Z.; Zhang, X. Generative urban design: A systematic review on problem formulation, design generation, and decision-making. Prog. Plan. 2023, 180, 100795. [Google Scholar] [CrossRef]

Figure 1. The overall workflow.

Figure 2. Description of the U-net architecture used in the diffusion model.

Figure 3. Locations and road networks of the five cities used in the experiment.

Figure 4. Probability distribution of the road image pixels in different cities. (a) Chicago. (b) Los Angeles. (c) New York. (d) Phoenix. (e) Washington.

Figure 5. Generated results and the corresponding ground truth in different cities. (a) Regular pattern. Cases 1 and 2 are from Chicago, cases 3 and 4 are from New York. (b) Irregular pattern. The four cases are from Washington, New York, Los Angeles and Phoenix, respectively. (c) Suburban pattern. Cases 1 to 3 are from Los Angeles, case 4 is from Phoenix.

Figure 6. The impact of water, mountains, and vegetation on road network generation. (a) Cases containing water. Cases 1 to 3 are from New York, case 4 is from Washington. (b) Cases near mountains. Cases are all from Phoenix. (c) Cases containing vegetation. Case 1 is from Los Angeles, cases 2 and 3 are from New York, case 4 is from Washington.

Figure 7. Correlations between water/mountain/vegetation areas and road length for the road networks in the testing regions. (a) Water area vs. road length for generated results. (b) Water area vs. road length for ground truth. (c) Mountain area vs. road length for generated results. (d) Mountain area vs. road length for ground truth. (e) Vegetation area vs. road length for generated results. (f) Vegetation area vs. road length for ground truth.

Figure 8. Scores of six models in the experiments.

Table 1. Size and pixel values of the input images.

Data	Image Size	Pixel Value	Note
Road network	1 × 128 × 128	0 or 255	0 represents non-road; 255 represents road
Land use	1 × 128 × 128 for each class	0 or 255	0 represents the other type; 255 represents the current type
Elevation	1 × 128 × 128	[0, 255]	Normalized elevation with the range of 0 to 255
Slope	1 × 128 × 128	[0, 255]	Normalized slope with the range of 0 to 255
Intersections	1 × 128 × 128	0 or 255	0 represents non-intersection; 255 represents intersection pixel

Table 2. Different models for road network generation.

	Model 1	Model 2	Model 3	Model 4	Model 5
Land use		✓	✓	✓	✓
Elevation	✓		✓	✓	✓
Slope	✓	✓		✓	✓
Intersections	✓	✓	✓		✓

Table 3. Comparison of six models in terms of five metrics.

		Model 1	Model 2	Model 3	Model 4	Model 5	Baseline
Chicago	FID	44.11	44.60	51.24	146.67	34.79 *	199.82
	F1	72.28%	72.81%	70.98%	16.73%	74.62% *	18.00%
	IOU	58.96%	59.27%	57.15%	9.22%	61.53% *	9.98%
	DAD	−0.172	−0.153	−0.087	−0.039 *	−0.129	0.173
	DARL	−1.32	−1.715	−0.402 *	1.632	−1.089	4.649
Los Angeles	FID	80.48	74.41 *	75.61	284.36	78.11	314.86
	F1	56.00% *	54.78%	53.45%	17.32%	54.85%	8.28%
	IOU	41.07% *	39.79%	38.64%	9.56%	40.00%	4.36%
	DAD	−0.087	−0.126	−0.120	0.071	0.066 *	−0.473
	DARL	−1.166	−1.412	−0.709	1.083	0.430 *	−4.901
New York	FID	143.97	108.80	68.03 *	299.29	76.65	361.57
	F1	60.63%	62.63%	65.62%	15.68%	66.90% *	25.98%
	IOU	45.98%	48.04%	51.57%	8.60%	52.75% *	15.22%
	DAD	−0.086	0.027 *	−0.108	−0.485	−0.028	0.337
	DARL	−0.585 *	−0.615	−1.648	−4.570	−1.078	8.712
Phoenix	FID	132.27	114.80	225.14	258.79	102.54 *	320.00
	F1	44.23%	48.94%	38.37%	17.78%	49.98% *	3.70%
	IOU	29.44%	33.71%	24.46%	9.89%	34.56% *	1.91%
	DAD	−0.160	−0.178	−0.574	0.303	−0.097 *	−0.969
	DARL	0.257	−0.672	−3.421	5.530	−0.107 *	−5.908
Washington	FID	109.78	80.65 *	82.19	171.90	85.77	174.46
	F1	50.47%	51.49%	52.48% *	20.51%	50.52%	9.84%
	IOU	35.13%	35.94%	36.95% *	11.54%	35.31%	5.22%
	DAD	−0.224	−0.185	−0.105	0.256	0.009 *	−0.579
	DARL	−2.729	−2.171	−1.756	4.098	−0.262 *	−4.231

*: the best-performing model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, X.; Zhang, M.; Lyu, J.; Ge, Q. Generating Urban Road Networks with Conditional Diffusion Models. ISPRS Int. J. Geo-Inf. 2024, 13, 203. https://doi.org/10.3390/ijgi13060203

AMA Style

Gu X, Zhang M, Lyu J, Ge Q. Generating Urban Road Networks with Conditional Diffusion Models. ISPRS International Journal of Geo-Information. 2024; 13(6):203. https://doi.org/10.3390/ijgi13060203

Chicago/Turabian Style

Gu, Xiaoyan, Mengmeng Zhang, Jinxin Lyu, and Quansheng Ge. 2024. "Generating Urban Road Networks with Conditional Diffusion Models" ISPRS International Journal of Geo-Information 13, no. 6: 203. https://doi.org/10.3390/ijgi13060203

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generating Urban Road Networks with Conditional Diffusion Models

Abstract

1. Introduction

2. Related Work

2.1. Procedural Modeling

2.2. Deep Generative Models

2.3. Factors Affecting Urban Road Networks

3. Methodology

3.1. Data Preparation

3.2. Conditional Diffusion Model

3.2.1. Forward Process

3.2.2. Denoising Process

3.2.3. Model Training

3.2.4. Conditional Image Generation

3.3. Post-Processing

3.4. Evaluation

3.4.1. Image-Based Metrics

3.4.2. Network-Based Metrics

4. Experiments and Results

4.1. Experiment Area

4.2. Generation Results

4.3. Obstructive Effect

4.4. Model Comparison

5. Discussion

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI