Next Article in Journal
SemConvTree: Semantic Convolutional Quadtrees for Multi-Scale Event Detection in Smart City
Previous Article in Journal
The Problem of Integrating Digital Twins into Electro-Energetic Control Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards Urban Accessibility: Modeling Trip Distribution to Assess the Provision of Social Facilities

by
Margarita Mishina
1,*,
Sergey Mityagin
2,
Alexander Belyi
1,
Alexander Khrulkov
1 and
Stanislav Sobolevsky
1,3
1
Department of Mathematics and Statistics, Faculty of Science, Masaryk University, 611 37 Brno, Czech Republic
2
Research Center “Strong Artificial Intelligence in Industry”, ITMO University, 199034 Saint Petersburg, Russia
3
Center for Urban Science and Progress, New York University, New York, NY 11201, USA
*
Author to whom correspondence should be addressed.
Smart Cities 2024, 7(5), 2741-2762; https://doi.org/10.3390/smartcities7050106
Submission received: 9 June 2024 / Revised: 8 September 2024 / Accepted: 9 September 2024 / Published: 18 September 2024
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Abstract

:

Highlights

Main findings:
  • By comparing traditional gravity-based and optimization-focused approaches used for trip distribution modeling, we identified spatial conditions under which these two mathematical frameworks can yield divergent provision assessments that could lead to varying urban planning and policy decisions. To support the choice of the appropriate modeling approach in the absence of actual data on population–facility interactions, we revealed each model’s strengths and limitations in different urban contexts.
  • Exploring scenarios where data on facility utilization are available for a part of a city, we tested supervised machine learning models that integrate the available data to capture prevalent patterns of human behavior, thereby enhancing the prediction of trip distribution and associated provision assessments. After a careful examination of the existing benchmarks, we proposed a new modification of the deep learning trip distribution model that incorporates physical constraints related to fixed demand-and-supply conditions, achieving superior performance on synthetic datasets compared to established baselines.
Implications of the main findings:
  • Offer a methodological framework for modeling population–facility interactions that can be adapted to various urban contexts and data availability scenarios.
  • Provide urban planners with enhanced tools for assessing the provision of social facilities, supporting more equitable and sustainable urban development.

Abstract

Assessing the accessibility and provision of social facilities in urban areas presents a significant challenge, particularly when direct data on facility utilization are unavailable or incomplete. To address this challenge, our study investigates the potential of trip distribution models in estimating facility utilization based on the spatial distributions of population demand and facilities’ capacities within a city. We first examine the extent to which traditional gravity-based and optimization-focused models can capture population–facilities interactions and provide a reasonable perspective on facility accessibility and provision. We then explore whether advanced deep learning techniques can produce more robust estimates of facility utilization when data are partially observed (e.g., when some of the district administrations collect and share these data). Our findings suggest that, while traditional models offer valuable insights into facility utilization, especially in the absence of direct data, their effectiveness depends on accurate assumptions about distance-related commute patterns. This limitation is addressed by our proposed novel deep learning model, incorporating supply–demand constraints, which demonstrates the ability to uncover hidden interaction patterns from partly observed data, resulting in accurate estimates of facility utilization and, thereby, more reliable provision assessments. We illustrate these findings through a case study on kindergarten accessibility in Saint Petersburg, Russia, offering urban planners a strategic toolkit for evaluating facility provision in data-limited contexts.

1. Introduction

In modern cities, urban facilities constitute an integral part of urban infrastructure that directly impacts socioeconomic development and the well-being of the population. Among them, social facilities (providing educational, healthcare, recreational, and religious services) hold particular significance as they fulfill fundamental population needs and ensure a basic level of livability. Thus, one of the vital steps in cities’ socioeconomic and spatial development lies in identifying potential gaps in social facilities provision, that may occur due to mismatches in the spatial distribution of demand and supply in different parts of a city. This, in turn, helps to effectively determine optimal locations for brand-new facilities, thereby enhancing overall community well-being through improved accessibility and distribution of essential services.
The location of social facilities is essential for their proximity to residential areas, as they usually provide daily or urgent services [1,2]. For many of them, urban regulatory standards and research institutions define a catchment area to specify how far away a facility can be located to deliver a service to a given population [3,4,5]. Thus, assessing social facilities provision frequently involves estimating the probability that residents in a given city location will be served within a reasonable catchment area [6,7]. This probability mainly depends on the presence of facilities nearby, their capacities, and how they are allocated among the surrounding population. The latter reflects who exactly utilizes facilities’ capacities and to what extent, providing the crucial information that sheds light on the existing spatial disparities and their root causes.
Therefore, assessing social facility provision requires comprehensive data, including information on population demand, urban infrastructure (such as places of demand–supply concentration and road networks), and crucially, realized population–facilities interactions. While data on population and urban infrastructure can often be sourced from government data portals or mapping services, information on facilities’ utilization is typically unavailable or limited to a small observation area. National statistical bureaus currently do not release data on realized population interactions with any kind of facility, and big tech companies remain hesitant to share their data on population movements. At the same time, collecting such data across an entire city requires large-scale sociological surveys or collaboration with local authorities and facility management, which is often hampered by bureaucratic nuances. Consequently, urban experts grapple with incomplete data regarding the utilization of facilities, which hinders the precise evaluation of social facility provision within a city [8,9].
In response to this challenge, previous studies have developed various approaches that leverage other available information to assess the facility’s accessibility and provision, such as mapping catchment areas [10,11], computing travel impedance [12,13,14], calculating population-to-provider ratios [15,16,17], and other approaches [18,19]. Among these approaches, the Two-Step Floating Catchment Area (2SFCA) family of methods [20,21,22,23,24] has gained special prominence due to its focus on spatial interactions between population and facilities. Its latest modifications incorporate the Singly-Constrained Gravity Model [25]—the fundamental trip distribution model traditionally used in transportation planning—to estimate potential population interactions with facilities based on the spatial distribution of supply and demand in a city. The estimated interactions are then used as a proxy of actual facility utilization data in accessibility assessment, making it possible to apply the approach in data-sparse environment.
However, despite its widespread use, the 2SFCA and similar approaches still face significant limitations in estimating facility utilization. These include insufficient examination of the impact of the distance decay function [26], a lack of consideration for competition among population for limited facility capacity [27,28], and an inability to capture complex patterns of population behavior in urban settings [29,30,31]. Such limitations may compromise the accuracy of accessibility assessments, potentially leading to sub-optimal urban planning decisions. This establishes the need for an in-depth examination of existing trip distribution models and the development of new ones specifically tailored for assessing facility accessibility and provision in data-limited contexts.
To address these gaps and enhance the accuracy of facility provision assessments, our study focuses on two primary research questions:
  • First, we assess the extent to which traditional trip distribution models can accurately capture and explain population–facility interaction patterns using only data on demand–supply distribution and assumptions about mobility behavior. By identifying the specific patterns the models can reveal, we enhance the transparency of these models’ functionality and provide clearer guidance on their applicability, particularly in scenarios where direct data on facility utilization are unavailable.
  • Second, we investigate the potential of machine-learning-based trip distribution models to uncover interaction patterns in observed data and generate more accurate estimates of facility utilization. To improve the accuracy of population–facility interaction predictions, we propose a novel deep learning model that accounts for population competition over limited facility capacity. Our model demonstrates superior performance compared to other recently published methods and proves effective in scenarios where data on facility utilization are only partially available, e.g., when district administrations collect and share such data.
To conduct our analysis, we use the collected data on the demand and supply of kindergarten services in Saint Petersburg, Russia.
Among the different traditional models applied to the trip distribution problem, we consider the Allocation Model [32] and the Doubly Constrained Gravity Model [25], which are most often found in practice. A key aspect of these models is their ability to adhere to natural demand and supply constraints, ensuring that total population outflows and inflows do not exceed the capacities of buildings and facilities. This property allows accurate provision assessments without biases that might be posed by the spurious inflation effect in demand or supply, as discussed in [27]. Both models consider population mobility patterns, reflecting residents’ willingness to travel to a facility based on its remoteness. However, they approach the trip distribution problem in fundamentally different ways: the Allocation Model focuses on process optimization, while the Gravity Model uses probabilistic inferences. Our study demonstrates that applying these models to estimate population–facilities interactions offers a distinct perspective on provision assessments within the study area, highlighting the importance of making context-specific decisions when selecting a trip distribution model.
The availability of partly observed data on facilities’ utilization in some areas of a city paves the way for the use of machine learning technologies to reconstruct population behavior patterns, thereby enhancing the accuracy of facility provision assessments. In our study, we investigate the effectiveness of Log–Linear Regression, the Deep Gravity model [33], and its modification employing Graph Neural Networks (GNNs) [34,35] to identify different interaction patterns and transfer them to previously unseen data. Based on the in-depth examination of the existing benchmarks, we enhance the performance of the GNN-based Deep Gravity Model through the incorporation of a balancing mechanism to combine the robustness of the traditional constrained Gravity Model and the predictive power of graph representation learning. Due to the absence of actual information on facility utilization in collected data, in this part of the study, we conduct the experiments using synthetic datasets on population–facilities interactions generated by the traditional trip distribution models. The results obtained with synthetic datasets indicate that the enhanced model outperforms the baselines and accurately predicts facilities’ utilization.
By comparing different traditional trip distribution models and adopting novel deep-learning techniques to estimate population–facilities interactions, our study introduces a comprehensive framework designed to assess the provision of social facilities in the context where data on facility utilization is absent or incomplete. In contrast to previous studies, we emphasize the estimation of population–facility interactions as a critical, standalone step that significantly influences the accuracy of facility accessibility and provision assessments. Through accurate modeling of facility utilization, our framework uncovers targeted spatial disparities between population demand and facilities’ capacities, facilitating informed decision making regarding the optimal locations for new services. The adaptability of this here-introduced framework allows for its application across different urban contexts, providing urban planners, developers, and civil servants with a holistic tool to guide long-term development strategies and ensure that growth is both sustainable and inclusive.

2. Materials and Methods

This section presents the methodologies employed to assess the provision of social facilities in urban settings through the modeling of trip distribution when direct facility utilization data are unavailable or incomplete. We begin in Section 2.1 by outlining the approach used to assess facility provision levels across residential buildings and the importance of data on population–facilities interactions in this process. In Section 2.2, we describe the collection and processing of the data necessary for estimating kindergarten utilization and provision in Saint Petersburg, Russia. Finally, Section 2.3 details the application of both traditional and machine-learning-based trip distribution models to estimate potential population–facilities interactions. Among the traditional models, we cover the Linear Allocation Model and the Doubly Constrained Gravity Model, which estimate population–facility flows based on fixed demand–capacity constraints and a distance decay function. Among the machine-learning-based models, we describe the Log–Linear Regression and Deep Gravity family of models that predict interactions based on patterns identified from partially observed data. Within the Deep Gravity group, we introduce our modification of the Graph-based Deep Gravity Model, which incorporates demand–supply constraints through a balancing mechanism to enhance prediction accuracy.

2.1. Assessing Social Facilities Provision

In the study, we assess the level of provision for each residential building where the demand for the service provided by the considered type of facility is greater than zero. The level of facility provision p i is evaluated as the probability of being served within a facility’s catchment area defined by urban regulatory standards, usually as walking time or distance. The catchment area limits the territory within which people typically would apply for a service. Depending on urban regulatory standards, the facility’s catchment area can impose strict restrictions in terms of serving the population living outside of it [36] or apply only for planning purposes and ensuring public safety and well-being [37]. Generally, the lack of facilities or spare capacity in proximity to residential buildings forces people to obtain a service at a greater distance or look for other ways to satisfy their needs. Nevertheless, in the proposed framework, we deem that people are not provided for by the required facilities if they do not obtain service within a distance, defining a catchment area.
p i = P i O i = j f ( y i j ) O i , if O i > 0 , f ( y i j ) = y i j , if d i j r a 0 , if d i j > r a .
In Equation (1), for every building, i, the denominator represents the total number of people, O i , who need a specific service; in the nominator, P i reflects how many of them can obtain it within a catchment area. In this way, P i can be represented as the total number of interactions, y i j , of the population from building, i, with facilities, j, located at a distance, d i j , up to a certain radius, r a , limiting a catchment area. In the context of facility utilization, each “interaction”—or, more generally, “trip”—implies commuting and receiving a service in a specific facility. In some of the literature on human mobility, trips between two locations are referred to as population “flow”, emphasizing their aggregated nature [38].
We estimate the population flows by employing traditional trip distribution models, such as the Linear Allocation Model and Doubly Constrained Gravity Model, as well as machine learning techniques, as represented by the Log–Linear Regression and Deep Gravity family of models. These are discussed in the following subsections. All models distribute trips based on the prior information about the demand in each residential building i, the capacity of each facility j, and walking distances between i and j.

2.2. Data Collecting and Processing

The data collected for the study cover the territory of Saint–Petersburg and include information about kindergartens (locations, capacities, and catchment area), residential buildings (locations and demand), and the distances between them.
Information on 1264 kindergartens was obtained from mapping services Open Street Map (OSM) [39] and Google Maps [40]. The approximate capacity of each kindergarten was determined by analyzing the standardized building type visible in satellite imagery. While the normative radius for a kindergarten’s catchment area in Russia is established at 300 m, this study adopted an expanded catchment distance of 600 m. This adjustment was based on survey results indicating that a doubled distance better reflects acceptable commuting times for parents. The discrepancy between the normative and empirical catchment areas is attributed to significant deviations from regulatory standards in the layout of kindergartens in Saint Petersburg, leaving a substantial portion of the population outside the 300-m catchment areas.
The target population group was children aged 3–6 years. To determine the size of this group, we retrieved the social and demographic characteristics of the population in each district from Russian census data [41], and then disaggregated this information from the district level to the individual buildings. Based on the open-access housing data [42], we identified 18,023 residential buildings. During the data disaggregation process, the total population for each district was distributed among its buildings based on factors correlated with population density, particularly the approximate living area of each building. The relationship between a building’s population and its living area was determined using city standards, which set the minimum living area per person at 15 m 2 . The number of children in the target age group within each residential building was then calculated by multiplying the estimated building population by the proportion of children in the district’s population.
To consider the influence of distance on willingness to attend a kindergarten, we computed a distance matrix containing information about walking distance from each building, i, to each kindergarten, j, based on the city road network collected from OSM. As the next step, we applied logical masking to the distance matrix and selected its elements, which stood for a distance less than the radius of the kindergarten’s catchment area.
For the graph-based trip distribution models, we transformed the data into a directed bipartite graph (Figure 1). The distance matrix, D M , described the weighted adjacency matrix of the graph where nodes stood for kindergartens and residential buildings while edges represented connections between them. The node’s features, h, contained information about its type and quantitative property—the capacity for nodes representing kindergartens or the number of target population for nodes representing residential buildings. Edge’s features included walking distance, d i j , and the number of interactions between nodes y i j —in other words, the number of people from the building, i, who obtain a service in the kindergarten, j. The whole graph consisted of 19,287 nodes, 53,441 edges, and 122 connected components constituted from nodes of disjoint sets. Each connected component in the graph represented a closed system, which means that the distribution of the population between the kindergartens in one connected component could not affect the same process in another component.

2.3. Location–Allocation Model

Within the scope of trip distribution, the Location–Allocation Model has been widely used in transportation planning [43,44], modeling facilities’ utilization [45,46], and developing emergency strategies [47] to find the most suitable location for a new supply point (i.e., the location part) or efficiently allocate available resources between demand points (i.e., the allocation part). The core principle of the Allocation Model is based on optimizing the objective function subject to certain constraints. Applying the Allocation Model to estimating facilities’ utilization suggests formulating the objective function that maximizes the overall satisfaction of the population’s needs, which, in turn, implies maximizing the total number of interactions y i j while minimizing their distance, d i j , (Equation (2)).
maximize i = 1 m 1 j n d i j < r a y ^ i , j d i j , 1 j n d i j r a y ^ i , j O i , i = 1 m , 1 i m d i j r a y ^ i , j D j , j = 1 n ,
where m and n denote the numbers of residential buildings and facilities, respectively; y ^ i , j represents the estimated number of interactions between a residential building, i, and the facility, j, as separated by distance, d i j , up to r a . O i stands for the demand in building i and D j stands for the capacity of facility j.
By maximizing the objective function, the model allocates as much population as possible across facilities while adhering to natural constraints. These constraints ensure that the total interactions originating from each building, i, do not exceed its demand, O i (with the assumption that a resident can engage with only one facility), and that the total interactions with each facility, j, do not surpass its capacity, D j (preventing facility overloading). The non-strict inequalities accommodate potential capacity shortages or surpluses: in the former scenario, a portion of the population remains non-allocated in buildings; in the latter, some facility capacity goes underutilized.
This mathematical formulation constitutes an integer linear programming where all variables y ^ i , j are constrained to be non-negative integers. To solve this, we employ the COIN Branch and Cut solver implemented in the open-source Python library PuLP for linear optimization [48].

2.4. Doubly Constrained Gravity Models

Diverging from the optimization-focused approach of the Linear Allocation Model, the Gravity Model offers a probabilistic perspective on spatial interactions rooted in the analogy to Newton’s law of gravity [25]. The Gravity Model (Equation (3)) posits that the interaction between two places, y ^ i j , is proportional to their respective “masses”, M (e.g., area, population count, number of points of interest, and others), and inversely proportional to some power of the distance, d i j , separating them. In the distance decay function, f ( d i j ) , the parameter β refines the influence of distance on the number of interactions between locations.
y ^ i j M i M j f ( d i j ) , where f ( d i j ) = d i j β or f ( d i j ) = exp ( β d i j ) ,
Singly and doubly constrained versions of the model have been developed to adjust the total population outflows and inflows with respect to the fixed demand or/and supply conditions. The Doubly Constrained Gravity Model (Equation (4)), in particular, has been derived as a special case of the Entropy Maximizing Model, which aims to maximize the disorder or randomness of the trips, subject to certain constraints [49].
y ^ i , j = A i B j O i D j d i j β , A i = 1 j = 1 B j D j d i j β ; B j = 1 i = 1 A i O i d i j β ,
where O i and D j represent the cumulative population flows originating in the residential building, i, and arriving to the facility, j, respectively; A i and B j indicate the origin and destination balancing factors.
In Equation (4), the balancing factors A i and B j ensure that the estimated interactions align with the demand and capacity constraints. These factors are typically calculated using the Iterative Proportional Fitting (IPF) procedure, which adjusts the elements of the origin–destination matrix to match specific row and column totals. However, when dealing with large and dense spatial interaction matrices, IPF often encounters significant numerical stability and convergence issues [50]. In particular, the balancing factors can become extremely small, causing most matrix elements to turn to zero. During our study, when modeling the population flows between residential buildings and facilities in a large city, IPF failed to converge. To address this issue, we implemented a custom algorithm for balancing the origin–destination matrix, as detailed in Appendix A.

2.5. Log–Linear Regression

Log–Linear Regression, when employed to predict the number of interactions between two locations, emerges as another form of the unconstrained Gravity Model [51]. In essence, Log–Linear Regression linearizes the multiplicative relationship between flow and its determinants by taking the natural logarithm of both sides of Equation (3) and turning the product of variables into a sum of their logs (Equation (5)). This transformation helps easily estimate the coefficients that describe the influence of locations’ masses and the distance on the population flows.
ln y ^ i , j = a 1 ln O i + a 2 ln D j β ln d i j + b ,
where the coefficients a 1 , a 2 represent the origin and destination effects; β denotes the calibration parameter of the distance decay function; b is the intercept.

2.6. Deep Gravity Model

2.6.1. Original Deep Gravity Model

In their original paper [33], F. Simini et al. described the derivation of the Deep Gravity Model, adding nonlinearity and hidden layers to the traditional Gravity Model. We adopted the architecture of the model for estimating population–facilities interactions the following way: for each building–facility pairing, we define an input vector, x i j , by concatenating the population demand, O i , the facility’s capacity, D j , and the distance, d i j . These input vectors are fed in parallel to a feed-forward neural network (FNN) (Equation (6)).
x i j = O i , D j , d i j , y ^ i , j = σ ( x i j W T + b ) .
where W represents a linear transformation weight matrix; b stands for a bias; σ denotes the nonlinear activation function.
The original Deep Gravity Model includes four hidden layers of 32 dimensions with a dropout parameter equal to 0.20, the LeakyReLU activation function, and layer normalization. We used fewer hidden layers with lower dimensions compared to the original model’s architecture [33] to avoid overfitting, considering a smaller number of location features. The last layer maps the hidden state into a one-dimensional vector with the predicted number of interactions limited by the ReLU activation function with the interval [0, ).

2.6.2. GNN-Based Deep Gravity Model

The integration of Graph Neural Networks (GNNs) marked the next step in the evolution of trip distribution models, owing to their inherent strength in handling graph-structured data. Studies such as [34,35] proposed incorporating Graph Attention Networks (GATs) [52] to learn node embeddings that encapsulate latent geographic context information. GAT embeddings enhanced the original Deep Gravity Model by capturing not only nonlinear relationships but also structural dependencies within the data, which was achieved through the recursive aggregation of features from each node’s neighborhood. This capability of GAT is particularly advantageous for predicting population–facility interactions, where an individual’s choice of a facility is influenced not only by its proximity and inherent characteristics but also by the availability and attractiveness of alternative options.
To encode geospatial dependencies into node embeddings, we include in the model two layers of multi-head GAT operating on a directed bipartite graph representing connections between residential buildings and facilities. Each GAT layer performs two convolutions but with different message-passing directions (from origin to destination and from destination to origin) to learn representations of nodes of each type. The multi-head attention mechanism executes five independent transformations. As recommended in [52], the first layer of GAT concatenates new node embeddings, while the second layer averages them (Equation (7)). Each GAT layer contains 128 neurons, 5 attention heads, the LeakyReLU activation function, a dropout parameter of 0.2, and layer normalization.
h i = k = 1 k = 5 σ j N ( i ) α i j k W k h j , h i = σ 1 K k = 1 K = 5 j N ( i ) α i j k W k h j ,
where h j is the input features of node j located in the neighborhood of node i; W k and W k stand for the linear transformation weight matrices; α i j k and α i j k represent attention coefficients computed by the k-th attention head for nodes i and j; h i and h i denote node embeddings on the first and the second layers, respectively.
The learned node embeddings h i and h j and the averaged attention coefficients α ¯ i j , α ¯ j i are included in the input vector x i j fed in parallel to the FNN (Equation (8)).
x i j = d i j , O i , D j , h i , h j , α ¯ i j , α ¯ j i , y ^ i , j = σ ( x i j W T + b ) ,
The FNN block includes four hidden layers of 64 dimensions with a dropout parameter equal to 0.2, the LeakyReLU activation function, and layer normalization. The output layer returns a one-dimensional vector with the predicted number of interactions limited by the ReLU activation function.

2.6.3. Constrained GNN-Based Deep Gravity Model (Our Modification)

Preserving the core architecture of the GNN-based Deep Gravity Model, we additionally incorporate the balancing mechanism inspired by the traditional Doubly Constrained Gravity Model (Figure 2). The idea of a balancing mechanism lies in the iterative adjustment of predicted flows to fixed totals on both sides: origin node’s population and destination node’s capacity. This is performed by incorporating a simple recurrent block that, at each step, t, processes the predicted flow y ^ i j between nodes i and j, the sum of outflows from node i, the sum of inflows to node j, and the natural constraints on the total outflows O i and inflows D j .
In the recurrent block, we first concatenate the flow-related variables into an input vector v i j and then pass it through a shallow FNN with a nonlinear activation function. By analogy with the Doubly Constrained Gravity Model, FNN computes the balancing factor z i j for each flow. The adjusted flow is obtained by the multiplication of its previous value, and the balancing factor computed on the current step t (Equation (9)).
v i j t = y ^ i j t , j N ( i ) y ^ i j t , i N ( j ) y ^ i j t , O i , D j , z i j t = σ ( v i j t W t + b t ) , y ^ i j t + 1 = z i j t y ^ i j t .
The FNN in the recurrent block includes one hidden layer of 64 dimensions, ReLU activation function, and a dropout parameter equal to 0.2. The number of steps in the recurrent block may vary depending on the data. In the experiments, we tested the model with 1 to 10 steps. Besides the recurrent block, the rest of the model’s architecture and parameters setting are the same as for the unconstrained GNN-based Deep Gravity Model.

3. Results

In the following, we evaluate the effectiveness of the previously described trip distribution models in accurately estimating urban accessibility and the provision of social facilities. This evaluation is crucial for ensuring that these models can reliably guide urban planning decisions, particularly in situations where direct data on facility utilization are incomplete or unavailable. In Section 3.1, we begin by examining the outcomes of the Linear Allocation Model and Doubly Constrained Gravity Model in estimating the provision of kindergartens in Saint Petersburg, Russia. Section 3.2 then delves into the performance of the Log–Linear Regression and Deep Gravity models, with a focus on their ability to uncover mobility patterns within synthetic datasets. For each model, we assess the accuracy of facility utilization estimates and analyze how these estimates impact the overall assessment of facility provision.

3.1. Modeling Facilities’ Trip Distribution

In situations when data on the actual utilization of urban facilities is absent, the Allocation Model and Doubly Constrained Gravity Model emerge as crucial tools for estimating population–facilities interactions. We compared both models and uncovered how each of them interprets the engagement between populations and urban infrastructure, highlighting their unique perspectives on urban dynamics.
Figure 3a depicts an example territory of six adjacent blocks in Saint Petersburg, Russia. Within this territory, 41 kindergartens with capacities from 12 to 300 places are aimed to supply 11,260 people. The population-to-provider ratio, calculated as the ratio between overall facilities’ capacities and the population demand inside the overlapping catchment area, reveals the capacity shortage in these territories. According to the calculation, 10 percent of the target population are not provided with kindergartens. In order to identify where precisely unprovided people stay, we applied the Linear Allocation Model and the Doubly Constrained Gravity Model to distribute the target population of residential buildings between available kindergartens with catchment areas equal to 600 m. Figure 3b,c presents the results of the performed trip distribution along with the assessments calculated by Equation (1). In the example, the provision assessments obtained with both trip distribution models indicated the problem areas primarily in the central, southeast, and northeast parts of the territory (Figure 3d).
Although both trip distribution models revealed the same problem areas, they yielded different provision assessments for most of the population across the city. As can be seen from Figure 3e, provision assessments based on trip distribution performed by the Linear Allocation Model split the urban population roughly into two groups—those provided ( p i = 1 ) and unprovided ( p i = 0 ) with the service. The reason for this is the model’s optimization function (Equation (2)), while maximizing the overall satisfaction of population needs, the function seeks to assign the nearest facility to the maximum number of people from each residential building. As a result, according to the model, the population living closer to facilities reserves all available capacities, not leaving such an opportunity for people from more remote areas. In contrast, provision assessments obtained by the Doubly Constrained Gravity Model with parameter β = 0 , as shown in Figure 3f, look more uniformly distributed on the whole interval [0, 1]. This is explained by the model’s parameter β , which, in this case, eliminates the influence of distance within the catchment area on willingness to travel to a facility. Thus, this model aims to allocate capacity from each facility equally between buildings within their catchment area.
The difference in provision assessments comes from the difference in distance distributions of trips generated by each model. The Linear Allocation Model, aiming to minimize the overall travel distance across the population, implies that people tend to make shorter trips, as evidenced by the similarity between the graphs representing the distance distributions of trips to the nearest facility and the trips to the facility assigned by the model in Figure 3g,h. A strong inverse dependence of trip probability on the distance practically assumes the priority of servicing people living closer to facilities and, thus, reflects the extreme view of the potential inequity in a city. On the other hand, the Doubly Constrained Gravity Model provides a more adjustable way for trip distribution. By varying the parameter β in the distance decay function, the people can be distributed between facilities either equally with β = 0 (Figure 3i) or in proportion to the distance to them if β > 0 (see Figure A1). Reducing the distance factor to zero increases the gyration radius of population trips from each building and skews distance distribution to the left. Such population reallocation relieves the load on the facilities from the nearest buildings and allows people from more remote areas to apply for the service within the catchment area.
The distinct outcomes of the Linear Allocation Model and the Doubly Constrained Gravity Model highlight how different assumptions about population behavior may influence urban planning decisions, such as the strategic placement of new facilities. When applying these models to assess facility provision, it is crucial to consider the specific interaction patterns between the population and the type of facility in question. The Linear Allocation Model, which prioritizes minimizing travel distance, is particularly effective in scenarios where individuals strongly prefer or are mandated to utilize the nearest facility, such as in the absence of personal transportation or due to regulatory requirements. Conversely, the Doubly Constrained Gravity Model is advantageous in contexts where individuals are less constrained by distance, allowing them to opt for less crowded or more desirable facilities, even if they are farther away. According to the experiments, the above-described difference in outcomes of these trip distribution models is most noticeable within the area of capacities’ mild deficiency. In case of capacities’ extreme shortage or, on the contrary, its abundance, the difference diminishes, which seems logical as in these states, most people will be either severely unprovided or fully satisfied regardless of how they were distributed between facilities (see Figure A2).
Along with the selected model, the predefined catchment area also significantly influences the perception of facility provision within a city by setting strict limits on acceptable commuting time or distances. Smaller catchment areas tend to lower provision estimates by excluding a substantial segment of facilities located beyond accessible zones. Conversely, larger catchment areas can potentially increase provision estimates by assuming that people are willing to travel greater distances to access services in less populated or better-equipped areas, while the latter approach might present more positive outcomes, it can, however, foster unrealistic expectations about the population’s commute patterns. It is thus crucial to carefully assess and justify the size of the catchment area to balance the need for accessibility with realistic travel behaviors in order to ensure equitable and practical urban planning.
In summary, applying traditional trip distribution models, such as the Allocation Model and the Doubly Constrained Gravity Model, offers a straightforward yet flexible approach to replicating population–facility interactions based solely on proximity to facilities. However, these models, while effective in capturing basic spatial relationships, require careful calibration of the distance decay function and the size of a catchment area, reflecting population mobility patterns, to ensure accurate provision assessments. Additionally, their ability to predict facility utilization is limited by the exclusion of other critical factors, such as the type, quality, cost, and reputation of the services offered by the facilities. This limitation suggests that, while traditional models can provide useful insights—particularly in data-sparse environments—they may not fully capture the complexity of real-world population–facility interactions.

3.2. Predicting Facilities’ Trip Distribution

Altering parameters of the traditional trip distribution models enables us to analyze the level of facilities provision across a city from multiple perspectives, considering various assumptions about residents’ behavior. However, to obtain simulation results that accurately reflect real-world facility provision, it is crucial to understand the dominant patterns of interaction between the population and different types of facilities. When dealing with partially observed data on population–facilities interactions, leveraging machine learning technologies can help uncover these patterns and provide more accurate insights.
We analyzed the effectiveness of four machine-learning models in identifying population mobility patterns when interacting with urban facilities. Among these models, the Log–Linear Regression (LLR) and Deep Gravity Model (DGM) represent classic solutions for the regression task of predicting the number of travelers between two locations. On the other hand, increasingly popular Graph Neural Networks appear to be a new state-of-the-art model for representation learning in many domains, where data can be described in the form of graphs. Thus, we additionally considered the Deep Gravity Model enhanced with GAT embedding (GNN-DGM). Lastly, we tailored the GNN-based model for a doubly constrained task by integrating a balancing mechanism and evaluated this adaptation as our fourth model (C-GNN-DGM). For a better understanding of the effect that the balancing mechanism exerts on the model’s performance, we tested the model with a different number of balancing steps.
Due to the absence of actual information on urban facilities utilization, for the experiments, we employed the data on population distribution between kindergartens generated by the Linear Allocation Model and Doubly Constrained Gravity Model. When employing the Gravity Model, the datasets were built upon the distance decay function with the varying parameter, β , which enabled us to examine and compare how well the models can distinguish different mobility patterns. To train and test the models, we performed the repeated k-fold cross-validation technique with random splits. Each dataset was shuffled and split into five folds that contained an equal number of connected components of the graph representing population–facilities interactions. Four folds were used to train the models, and the fifth one to test them. The overall process of cross-validation was repeated 10 times to capture the inherent variability in the performance due to different initializations and data sampling.
We assumed that the number of interactions between locations i and j follows a Poisson process [53]. Thus, all of the machine-learning models were optimized by the Negative Log–Likelihood function (Equation (10)). Additionally, we reported the evaluation metric of the Common Part of Commuters (Equation (11)), which is widely used in network analysis to compute the similarity between the observed flows y i j and predicted flows y ^ i , j [33]. The CPC value ranges from 0 to 1 within a closed interval. A CPC of 1 indicates a perfect match between the observed and predicted flows, while a CPC of 0 emphasizes poor model performance, indicating no alignment.
N L L = i = 1 n j = 1 m y ^ i j y i j log y ^ i j
C P C = 2 i , j min ( y ^ i j , y i j ) i , j y ^ i j + i , j y i j
Table 1 contains the average values and 95% confidence intervals of evaluation metrics values over 10 independent models’ runs for two datasets generated by the Allocation Model and Doubly Constrained Gravity Model with β = 0 . The evaluation metrics for datasets with β > 0 are presented in Table A1. The values in bold indicate the best performance among the models.
The patterns in the models’ performances were consistent over different generated datasets. The gradual improvement of evaluation metrics in the baseline models aligned with the results reported in other studies. The Deep Gravity Model increased the predictive accuracy compared to Log–Linear Regression due to the nonlinear recombinations of the locations’ features. The incorporation of GNN embeddings into the model led to significant performance improvement relating to the recognition of structural dependencies in data. The further advancement of the model was achieved through the implementation of the balancing mechanism suggested in the paper. An evident boost in predictive accuracy was achieved with four–seven recurrent steps in the balancing mechanism. According to the experiments, a subsequent increment in the number of steps resulted in the accuracy stagnation or decrease (see Figure A3).
The higher accuracy of predicted population–facilities interactions eventually translated into the higher accuracy of provision assessments. In Figure 4a,b, the scatter plots illustrate the predicted versus the observed number of people in each building provided with kindergarten services, P i , according to the GNN-DGM and C-GNN-DGM models. For each dataset, the predictions made by the constrained model were more densely concentrated near the best-fit line and characterized by higher R-squared scores, which indicate a closer alignment with the observed values. However, all machine learning models, including the constrained one, allowed for some inflation in cumulative in-going and out-going flows, as can be seen from the left side of the distributions in Figure 4c,d,i,f. In other words, the sum of the predicted out-going or in-going flows for some locations appeared to be higher than the corresponding demand or supply. This happened due to the absence of strict double-constraints that could only be set in mathematical models. Yet, the implementation of the balancing mechanism in the constrained version of the Deep Gravity Model helped significantly decrease the mean inflation in both supply and demand nodes. To assess the difference in mean inflation error between models, we conducted 10 independent runs of each model. For the accumulated inflation errors, we employed a one-tailed bootstrap hypothesis test that was suitable due to the non-normal distribution of inflation errors. Details of the test procedure are available in Table A2.
Overall, the experimental results revealed the Constrained Graph-based Deep Gravity Model’s strong capability to reconstruct population mobility patterns and accurately predict trip distribution. This effectiveness underscores the power of constantly evolving deep learning models in capturing intricate dependencies inherent in population interactions with urban facilities. These interactions are shaped not only by population mobility patterns but also by factors such as the type of services offered, their importance, quality, costs, the facility’s reputation, and other factors not covered in this study due to the lack of more comprehensive and granular datasets. With access to such information, it would be possible to enrich the bipartite graph of population–facilities interactions with additional node and link features, thereby allowing the proposed model to learn more nuanced dependencies in data and gain a more detailed insight into urban facility utilization. In this context, deep learning models prove indispensable in accounting for this broad spectrum of factors, compared to the traditional trip distribution approaches limited by their more simplistic assumptions and inability to incorporate such a wide array of influencing factors.
With access to more comprehensive and granular datasets, it would be possible to enhance the bipartite graph of population–facility interactions with additional node and link features. This would enable the model to learn more nuanced dependencies and provide deeper insights into urban facility utilization. In this regard, deep learning models are invaluable, offering a more robust framework for accounting for a broad range of influencing factors compared to traditional trip distribution approaches, which are constrained by simpler assumptions and a limited capacity to incorporate such diverse variables.

4. Discussion and Conclusions

In pursuit of sustainable urban development, our study highlights the crucial role of both traditional and machine-learning-based trip distribution models in enhancing our understanding of social facility accessibility and provision, especially when dealing with incomplete data on facility utilization. By applying the Allocation Model and the Doubly Constrained Gravity Model to estimate population interactions with kindergartens, we demonstrate how differing assumptions about population commute patterns within these traditional models can yield varied assessments of facility provision within the same urban areas. Our results show that, particularly in regions with mild shortage of facility capacities, these models provide contrasting insights into urban dynamics, emphasizing the need for careful selection of the appropriate analytical approach and its calibrated parameters, which aligns with the findings presented in [26]. The models’ comparative analysis and interpretation provided in this study aim to equip urban experts with the knowledge to make informed decisions regarding model selection and calibration based on prevalent population behavior patterns.
Revealing patterns in population interactions with a specific type of facility to calibrate a mathematical model often presents an additional challenge. While some facilities, such as kindergartens, primary schools, and other childcare services, may exhibit similar interaction patterns, others—like healthcare centers or recreational areas—may require distinct approaches due to differing usage characteristics and influencing factors. For instance, healthcare facility utilization is often influenced by urgency and specialization, whereas recreational facilities might depend on demographic preferences. In such cases, the application of machine learning technologies becomes invaluable. With access to partially observed data on facilities’ utilization in some districts of a city, one can train a deep learning model to calibrate appropriate parameters and unveil hidden dependencies within the data. In the study, we develop and train the novel Constrained Deep Gravity Model to predict population interaction with kindergartens, accounting for natural limits in supply and demand inherent to such tasks [46,54]. The experiment results show that the proposed model outperforms the unconstrained version of the Deep Gravity Model, increasing the accuracy of predicted flows and, thereby, provision assessments. The significant improvement over baselines is achieved due to the mitigation of flow inflation errors, first discussed in [27], through the direct learning of balancing coefficients.
While our work makes important contributions to the area of equitable urban planning and development, it comes with certain limitations. Firstly, in this study, we concentrate solely on kindergarten services within a single city to ensure a controlled environment and feasible data collection. Therefore, despite the inherent adaptability of the discussed framework, further research is needed to confirm its generalizability across different types of social facilities and varied urban contexts. Secondly, it has to be noted that the machine-learning experiments employ synthetic datasets generated through mathematical trip distribution models, as real data on facility utilization were unavailable. While we acknowledge that the simulated data might not perfectly mirror reality, they still provide a valuable foundation for testing the capabilities of deep-learning models to identify different mobility patterns in population–facilities interactions. This initial exploration not only demonstrates the potential of these models but also lays the groundwork for future research that could leverage more precise data across the broad range of geographical areas for further validation and enhancement.

Author Contributions

Conceptualization, M.M., S.M. and S.S.; Data curation, A.K.; Formal analysis, M.M.; Investigation, M.M. and A.K.; Methodology, M.M. and S.S.; Project administration, S.M. and S.S.; Software, M.M.; Supervision, S.S.; Validation, A.B. and S.S.; Visualization, M.M.; Writing—original draft, M.M.; Writing—review and editing, M.M., A.B. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in Zenodo repository at https://doi.org/10.5281/zenodo.11152093 (accessed on 7 June 2024). The results of conducted experiments and the corresponding Python code is available in the GitHub repository at https://github.com/RitaMargari/urban_facilities_provision (accessed on 7 June 2024).

Acknowledgments

Margarita Mishina, Alexander Belyi, Alexander Khrulkov and Stanislav Sobolevsky acknowledge Masaryk University for its support in accommodating our research team, and Sergey Mityagin acknowledges support of the Research Center “Strong Artificial Intelligence in Industry” at ITMO University.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
OSMOpen Street Map
LAMLinear Allocation Model
DCGMDoubly Constrained Gravity Model
GNNGraph Neural Network
FNNfeed-forward neural network
GATGraph Attention Network
LLRLog–Linear Regression
DGMDeep Gravity Model
GNN-DGMGraph-Neural-Network-Based Deep Gravity Model
C-GNN-DGMConstrained Graph-Neural-Network-Based Deep Gravity Model

Appendix A

Custom Implementation of Doubly Constrained Gravity Model

Since the classic IPF procedure failed to converge when calculating the balancing factors in Equation (4), we developed a custom algorithm to balance the origin–destination matrix for the Doubly Constrained Gravity Model.
The algorithm operates on an iterative principle. In the first step (Equation (A1)), we calculate the probability p i j o for each building, i, of visiting facilities S = { s 1 , s 2 , , s k , , s n } with capacities D = { c 1 , c 2 , , c k , , c n } . Simultaneously, for each facility j, we determine the probability p i j d of serving buildings H = { h 1 , h 2 , , h l , , h m } with demands O = { o 1 , o 2 , , o l , , o m } :
p i j o = f ( d i j ) / k = 1 n f ( d i k ) if d i j r a , otherwise p i j o = 0 , p i j d = f ( d i j ) / k = 1 m f ( d k j ) if d i j r a , otherwise p i j d = 0 ,
where the distance decay function f ( d i j ) takes the form of power law d i j β with free parameter β , which we set manually.
The number of interactions between each building and facility pair (e.g., h i = l and s j = k ) is sampled twice using the probabilities P l j = { p l 1 , p l 2 , , p l k , , p l n } and P i k = { p 1 k , p 2 k , , p l k , , p m k } . The minimum of the two sampled values is taken as the resulting number of interactions in the current step (Equation (A2)).
y l k = min j = k O l P l j ( S ) , i = l D k P i k ( H ) .
Applying the minimum function ensures that the entire system of interactions remains within the upper limits of demand and capacity, but it can lead to the under-allocation of resources on one side. To address this flaw, we calculate the remaining resources by subtracting the total allocated demand and capacity from their initial values (Equation (A3)) and then resample the number of interactions (Equation (A2)).
O i : = O i j = 1 m y i j , D j : = D j i = 1 n y i j .
The process is repeated until all demand or capacities are fully allocated. The estimated number of interactions between each building and facility pair is calculated as the sum of the values obtained in each iteration.
This algorithm retains the core features of the Doubly Constrained Gravity Model, such as the probabilistic approach and adherence to constraints, while offering faster convergence and improved numerical stability.

Appendix B

Figure A1. Distance distributions of trips to facilities obtained with the Doubly Constrained Gravity Model (DCGM). The free parameter β determines the influence of distance (within a catchment area) on the likelihood of people visiting a facility. Low values of β imply a weak distance effect meaning residents are more willing to travel longer distances and can potentially reach more facilities. Thus, in scenarios where distance has little impact on the willingness to attend a facility, the majority of people are distributed between facilities that are farther away from their homes. As the value of β increases, the influence of distance becomes stronger corresponding to situations when people tend to visit closer facilities.
Figure A1. Distance distributions of trips to facilities obtained with the Doubly Constrained Gravity Model (DCGM). The free parameter β determines the influence of distance (within a catchment area) on the likelihood of people visiting a facility. Low values of β imply a weak distance effect meaning residents are more willing to travel longer distances and can potentially reach more facilities. Thus, in scenarios where distance has little impact on the willingness to attend a facility, the majority of people are distributed between facilities that are farther away from their homes. As the value of β increases, the influence of distance becomes stronger corresponding to situations when people tend to visit closer facilities.
Smartcities 07 00106 g0a1
Figure A2. Relationship between provision assessments and population-to-provider ratio (PPR). The PPR metric represents the ratio of facilities’ capacities to the total population demand within overlapping catchment areas, providing a preliminary estimate of capacity shortages in a zone. Red markers indicate buildings located further away from the nearest kindergarten (up to 600 m), while blue markers represent buildings located closer to the nearest kindergarten. (a) The provision assessments calculated using the Linear Allocation Model (LAM) tend to split most of the population into two groups: those provided ( p i = 1 ) and those unprovided ( p i = 0 ), depending on the remoteness of the nearest kindergarten. As can be seen, the provided group mostly consists of residents living in buildings located closer to the nearest facility, whereas the unprovided group is primarily composed of those living in more remote buildings. The connection to the PPR metric is apparent only at the extremes, where the markers indicate either the unprovided population in zones with severe capacity shortages (PPR = 0) or the provided population in zones with ample capacity (PPR = 1). (b) In contrast, provision assessments calculated with interactions estimated by the Doubly Constrained Gravity Model (DCGM β = 0 ) show a more consistent relationship with the PPR metric across the entire interval. The increase in capacity (and thus in PPR) gradually improves the provision assessment of the population regardless of the distance to the nearest facility. (c) Overall, the absolute difference between the provision assessments P i obtained by LAM and DCGM is most pronounced in the middle of the PPR interval, which corresponds to zones with mild capacity deficiencies.
Figure A2. Relationship between provision assessments and population-to-provider ratio (PPR). The PPR metric represents the ratio of facilities’ capacities to the total population demand within overlapping catchment areas, providing a preliminary estimate of capacity shortages in a zone. Red markers indicate buildings located further away from the nearest kindergarten (up to 600 m), while blue markers represent buildings located closer to the nearest kindergarten. (a) The provision assessments calculated using the Linear Allocation Model (LAM) tend to split most of the population into two groups: those provided ( p i = 1 ) and those unprovided ( p i = 0 ), depending on the remoteness of the nearest kindergarten. As can be seen, the provided group mostly consists of residents living in buildings located closer to the nearest facility, whereas the unprovided group is primarily composed of those living in more remote buildings. The connection to the PPR metric is apparent only at the extremes, where the markers indicate either the unprovided population in zones with severe capacity shortages (PPR = 0) or the provided population in zones with ample capacity (PPR = 1). (b) In contrast, provision assessments calculated with interactions estimated by the Doubly Constrained Gravity Model (DCGM β = 0 ) show a more consistent relationship with the PPR metric across the entire interval. The increase in capacity (and thus in PPR) gradually improves the provision assessment of the population regardless of the distance to the nearest facility. (c) Overall, the absolute difference between the provision assessments P i obtained by LAM and DCGM is most pronounced in the middle of the PPR interval, which corresponds to zones with mild capacity deficiencies.
Smartcities 07 00106 g0a2
Figure A3. The dependence of the out-of-sample performance of the Constrained Deep Gravity Model on the number of balancing steps. The graphs display the average values and 95% confidence intervals for the Negative Log–Likelihood (NLL) and Common Part of Commuters (CPC) metrics, used to evaluate the model’s performance with 1 to 16 balancing steps, based on datasets generated by the Linear Allocation Model (LAM) and the Doubly Constrained Gravity Model (DCGM), respectively. The average values and corresponding confidence intervals at each step count were derived from ten independent experiments. The graphs reveal a clear trend in the relationship between the number of balancing steps and the model’s performance. Initially, as the number of balancing steps increases, there is a noticeable improvement in the average values of both NLL and CPC, indicating better model performance. These metrics reach their optimal levels at certain points (marked by the dashed line), after which the trend reverses. Specifically, for DCGM-generated data, beyond the peak point, NLL begins to rise, while CPC starts to decline, signaling a deterioration in model performance. For LAM-generated data, both NLL and CPC metrics stagnate after passing their optimal levels.
Figure A3. The dependence of the out-of-sample performance of the Constrained Deep Gravity Model on the number of balancing steps. The graphs display the average values and 95% confidence intervals for the Negative Log–Likelihood (NLL) and Common Part of Commuters (CPC) metrics, used to evaluate the model’s performance with 1 to 16 balancing steps, based on datasets generated by the Linear Allocation Model (LAM) and the Doubly Constrained Gravity Model (DCGM), respectively. The average values and corresponding confidence intervals at each step count were derived from ten independent experiments. The graphs reveal a clear trend in the relationship between the number of balancing steps and the model’s performance. Initially, as the number of balancing steps increases, there is a noticeable improvement in the average values of both NLL and CPC, indicating better model performance. These metrics reach their optimal levels at certain points (marked by the dashed line), after which the trend reverses. Specifically, for DCGM-generated data, beyond the peak point, NLL begins to rise, while CPC starts to decline, signaling a deterioration in model performance. For LAM-generated data, both NLL and CPC metrics stagnate after passing their optimal levels.
Smartcities 07 00106 g0a3
Table A1. Models’ in-sample and out-of-sample performance (continuation of the Table 1).
Table A1. Models’ in-sample and out-of-sample performance (continuation of the Table 1).
ModelIn-Sample PerformanceOut-of-Sample Performance
LossR2LossR2
Doubly Constrained Gravity Model (b = 0.5)
LLR−4.702 ± 0.3010.737 ± 0.003−5.655 ± 0.8190.733 ± 0.007
DGM−4.797 ± 0.2910.748 ± 0.002−5.754 ± 0.8210.745 ± 0.008
GNN-DGM−5.119 ± 0.3030.793 ± 0.003−6.054 ± 0.8320.784 ± 0.007
Our ( t = 1 )−5.258 ± 0.3120.822 ± 0.006−6.232 ± 0.8440.817 ± 0.008
Our ( t = 4 )−5.328 ± 0.3040.841 ± 0.005−6.336 ± 0.870.836 ± 0.01
LossR2LossR2
Doubly Constrained Gravity Model (b = 1)
LLR−4.852 ± 0.2980.709 ± 0.003−5.808 ± 0.8170.706 ± 0.006
DGM−5.04 ± 0.2970.739 ± 0.003−5.992 ± 0.8130.736 ± 0.007
GNN-DGM−5.343 ± 0.3090.779 ± 0.006−6.252 ± 0.8250.768 ± 0.009
Our ( t = 1 )−5.559 ± 0.3120.817 ± 0.005−6.529 ± 0.8510.809 ± 0.008
Our ( t = 4 )−5.621 ± 0.3130.832 ± 0.007−6.636 ± 0.8620.829 ± 0.011
Doubly Constrained Gravity Model (b = 1.5)
LLR−4.973 ± 0.2790.674 ± 0.002−5.92 ± 0.8060.672 ± 0.004
DGM−5.309 ± 0.2780.73 ± 0.002−6.276 ± 0.8280.729 ± 0.007
GNN-DGM−5.656 ± 0.2950.772 ± 0.004−6.58 ± 0.8360.763 ± 0.006
Our ( t = 1 )−5.902 ± 0.2990.812 ± 0.005−6.878 ± 0.860.805 ± 0.007
Our ( t = 4 )−6.048 ± 0.3020.836 ± 0.004−7.057 ± 0.8760.831 ± 0.007
Doubly Constrained Gravity Model (b = 2)
LLR−5.063 ± 0.2750.639 ± 0.002−6.064 ± 0.8310.637 ± 0.004
DGM−5.535 ± 0.2810.718 ± 0.002−6.574 ± 0.8630.715 ± 0.007
GNN-DGM−5.932 ± 0.270.765 ± 0.005−6.934 ± 0.8960.753 ± 0.009
Our ( t = 1 )−6.267 ± 0.2920.813 ± 0.005−7.331 ± 0.9180.806 ± 0.009
Our ( t = 4 )−6.406 ± 0.2950.836 ± 0.005−7.525 ± 0.9410.831 ± 0.01
Doubly Constrained Gravity Model (b = 2.5)
LLR−5.078 ± 0.2860.639 ± 0.002−5.9 ± 0.7600.637 ± 0.003
DGM−5.56 ± 0.2920.718 ± 0.002−6.398 ± 0.7860.715 ± 0.005
GNN-DGM−5.98 ± 0.3060.767 ± 0.004−6.776 ± 0.8050.756 ± 0.005
Our ( t = 1 )−6.205 ± 0.3080.802 ± 0.006−7.073 ± 0.830.795 ± 0.008
Our ( t = 4 )−6.391 ± 0.3250.83 ± 0.008−7.295 ± 0.8420.828 ± 0.009
Table A2. Parameters of bootstrap hypothesis test.
Table A2. Parameters of bootstrap hypothesis test.
 Data Groups MeanNumber of
Resampling
Mean
Difference
 95% CI p-Value
in-going flows inflation error
DCGMGNN-DGM−54.72210,000−35.741(−35.754, −35.728)0
C-GNN-DGM−18.984
LAMGNN-DGM−58.05210,000−36.956(−36.971, −36.941)0
C-GNN-DGM−21.094
out-going flows inflation error
DCGMGNN-DGM−2.93910,000
0.661
(−0.661, −0.660)0
C-GNN-DGM−2.277
LAMGNN-DGM−3.21910,000−0.212(−0.213, −0.212)0
C-GNN-DCM−3.006

References

  1. Fraser, T.; Feeley, O.; Ridge, A.; Cervini, A.; Rago, V.; Gilmore, K.; Worthington, G.; Berliavsky, I. How far I will go: Social infrastructure accessibility and proximity in urban neighborhoods. Landsc. Urban Plan. 2024, 241, 104922. [Google Scholar] [CrossRef]
  2. Haugen, K. The advantage of ‘near’: Which accessibilities matter to whom? Eur. J. Transp. Infrastruct. Res. 2011, 11, 368–388. [Google Scholar] [CrossRef]
  3. Bernelius, V.; Vilkama, K. Pupils on the move: School catchment area segregation and residential mobility of urban families. Urban Stud. 2019, 56, 3095–3116. [Google Scholar] [CrossRef]
  4. Mazumdar, S.; Feng, X.; Konings, P.; McRae, I.; Girosi, F. A brief report on primary care service area catchment geographies in New South Wales Australia. Int. J. Health Geogr. 2014, 13, 38. [Google Scholar] [CrossRef] [PubMed]
  5. Pan, X.; Kwan, M.P.; Yang, L.; Zhou, S.; Zuo, Z.; Wan, B. Evaluating the accessibility of healthcare facilities using an integrated catchment area approach. Int. J. Environ. Res. Public Health 2018, 15, 2051. [Google Scholar] [CrossRef]
  6. Talen, E. Neighborhoods as service providers: A methodology for evaluating pedestrian access. Environ. Plan. B Plan. Des. 2003, 30, 181–200. [Google Scholar] [CrossRef]
  7. Radke, J.; Mu, L. Spatial decompositions, modeling and mapping service regions to predict access to social programs. Geogr. Inf. Sci. 2000, 6, 105–112. [Google Scholar] [CrossRef]
  8. Orta Ortiz, M.S.; Geneletti, D. Assessing mismatches in the provision of urban ecosystem services to support spatial planning: A case study on recreation and food supply in Havana, Cuba. Sustainability 2018, 10, 2165. [Google Scholar] [CrossRef]
  9. Skiles, M.P.; Burgert, C.R.; Curtis, S.L.; Spencer, J. Geographically linking population and facility surveys: Methodological considerations. Popul. Health Metrics 2013, 11, 14. [Google Scholar] [CrossRef]
  10. Caselli, B.; Carra, M.; Rossetti, S.; Zazzi, M. Exploring the 15-minute neighbourhoods. An evaluation based on the walkability performance to public facilities. Transp. Res. Procedia 2022, 60, 346–353. [Google Scholar] [CrossRef]
  11. Ortega, J.; Tóth, J.; Péter, T. Mapping the catchment area of park and ride facilities within urban environments. ISPRS Int. J. Geo-Inf. 2020, 9, 501. [Google Scholar] [CrossRef]
  12. Burdziej, J. Using hexagonal grids and network analysis for spatial accessibility assessment in urban environments—A case study of public amenities in Toruń. Misc. Geogr. 2019, 23, 99–110. [Google Scholar] [CrossRef]
  13. Rabiei-Dastjerdi, H.; Matthews, S.A. Who gets what, where, and how much? Composite index of spatial inequality for small areas in Tehran. Reg. Sci. Policy Pract. 2021, 13, 191–205. [Google Scholar] [CrossRef]
  14. Zheng, Z.; Xia, H.; Ambinakudige, S.; Qin, Y.; Li, Y.; Xie, Z.; Zhang, L.; Gu, H. Spatial accessibility to hospitals based on web mapping API: An empirical study in Kaifeng, China. Sustainability 2019, 11, 1160. [Google Scholar] [CrossRef]
  15. Dahmann, N.; Wolch, J.; Joassart-Marcelli, P.; Reynolds, K.; Jerrett, M. The active city? Disparities in provision of urban public recreation resources. Health Place 2010, 16, 431–445. [Google Scholar] [CrossRef] [PubMed]
  16. Mansour, S. Spatial analysis of public health facilities in Riyadh Governorate, Saudi Arabia: A GIS-based study to assess geographic variations of service provision and accessibility. Geo-Spat. Inf. Sci. 2016, 19, 26–38. [Google Scholar] [CrossRef]
  17. Zhang, L.; Zhou, T.; Mao, C. Does the difference in urban public facility allocation cause spatial inequality in housing prices? Evidence from Chongqing, China. Sustainability 2019, 11, 6096. [Google Scholar] [CrossRef]
  18. Omer, I. Evaluating accessibility using house-level data: A spatial equity perspective. Comput. Environ. Urban Syst. 2006, 30, 254–274. [Google Scholar] [CrossRef]
  19. Zhang, X.; Lu, H.; Holt, J.B. Modeling spatial accessibility to parks: A national study. Int. J. Health Geogr. 2011, 10, 31. [Google Scholar] [CrossRef]
  20. Luo, W.; Wang, F. Measures of spatial accessibility to health care in a GIS environment: Synthesis and a case study in the Chicago region. Environ. Plan. B Plan. Des. 2003, 30, 865–884. [Google Scholar] [CrossRef]
  21. Luo, W.; Qi, Y. An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians. Health Place 2009, 15, 1100–1107. [Google Scholar] [CrossRef] [PubMed]
  22. Wan, N.; Zou, B.; Sternberg, T. A three-step floating catchment area method for analyzing spatial access to health services. Int. J. Geogr. Inf. Sci. 2012, 26, 1073–1089. [Google Scholar] [CrossRef]
  23. Wang, F. Inverted two-step floating catchment area method for measuring facility crowdedness. Prof. Geogr. 2018, 70, 251–260. [Google Scholar] [CrossRef]
  24. Lin, J.; Cromley, G. A narrative analysis of the 2SFCA and i2SFCA methods. Int. J. Geogr. Inf. Sci. 2022, 36, 943–967. [Google Scholar] [CrossRef]
  25. Zipf, G.K. The P 1 P 2/D hypothesis: On the intercity movement of persons. Am. Sociol. Rev. 1946, 11, 677–686. [Google Scholar] [CrossRef]
  26. Chen, X.; Jia, P. A comparative analysis of accessibility measures by the two-step floating catchment area (2SFCA) method. Int. J. Geogr. Inf. Sci. 2019, 33, 1739–1758. [Google Scholar] [CrossRef]
  27. Paez, A.; Higgins, C.D.; Vivona, S.F. Demand and level of service inflation in Floating Catchment Area (FCA) methods. PLoS ONE 2019, 14, e0218773. [Google Scholar] [CrossRef]
  28. Soukhov, A.; Paez, A.; Higgins, C.D.; Mohamed, M. Introducing spatial availability, a singly-constrained measure of competitive accessibility. PLoS ONE 2023, 18, e0278468. [Google Scholar] [CrossRef]
  29. Wei, Y.; Xiu, C.; Gao, R.; Wang, Q. Evaluation of green space accessibility of Shenyang using Gaussian based 2-step floating catchment area method. Prog. Geogr. 2014, 33, 479–487. [Google Scholar]
  30. Zhang, S.; Yu, P.; Chen, Y.; Jing, Y.; Zeng, F. Accessibility of park green space in Wuhan, China: Implications for spatial equity in the post-COVID-19 era. Int. J. Environ. Res. Public Health 2022, 19, 5440. [Google Scholar] [CrossRef]
  31. Brizan-St. Martin, R.; Paul, J. Evaluating the performance of GIS methodologies for quantifying spatial accessibility to healthcare in Multi-Island Micro States (MIMS). Health Policy Plan. 2022, 37, 690–705. [Google Scholar] [CrossRef] [PubMed]
  32. Hitchcock, F.L. The distribution of a product from several sources to numerous localities. J. Math. Phys. 1941, 20, 224–230. [Google Scholar] [CrossRef]
  33. Simini, F.; Barlacchi, G.; Luca, M.; Pappalardo, L. A deep gravity model for mobility flows generation. Nat. Commun. 2021, 12, 6576. [Google Scholar] [CrossRef] [PubMed]
  34. Liu, Z.; Miranda, F.; Xiong, W.; Yang, J.; Wang, Q.; Silva, C. Learning geo-contextual embeddings for commuting flow prediction. AAAI Conf. Artif. Intell. 2020, 34, 808–816. [Google Scholar] [CrossRef]
  35. Rong, C.; Wang, H.; Li, Y. origin–destination Network Generation via Gravity-Guided GAN. arXiv 2023, arXiv:2306.03390. [Google Scholar]
  36. Wong, S.K.; Deng, K.K. School catchment zone mergers and housing wealth redistribution. J. Plan. Educ. Res. 2021, 44, 754–765. [Google Scholar] [CrossRef]
  37. Schuurman, N.; Fiedler, R.S.; Grzybowski, S.C.; Grund, D. Defining rational hospital catchments for non-urban areas based on travel-time. Int. J. Health Geogr. 2006, 5, 43. [Google Scholar] [CrossRef]
  38. Barbosa, H.; Barthelemy, M.; Ghoshal, G.; James, C.R.; Lenormand, M.; Louail, T.; Menezes, R.; Ramasco, J.J.; Simini, F.; Tomasini, M. Human mobility: Models and applications. Phys. Rep. 2018, 734, 1–74. [Google Scholar] [CrossRef]
  39. Map Data from OpenStreetMap. Available online: https://www.openstreetmap.org/copyright (accessed on 7 June 2024).
  40. Google Maps. Satellite Image. Available online: https://www.google.com.sg/maps/ (accessed on 7 June 2024).
  41. Rosstat. Available online: https://rosstat.gov.ru/ (accessed on 7 June 2024).
  42. Territory Development Fund. Available online: https://xn--p1aee.xn--p1ai/ (accessed on 7 June 2024).
  43. Ishfaq, R.; Sox, C.R. Hub location–allocation in intermodal logistic networks. Eur. J. Oper. Res. 2011, 210, 213–230. [Google Scholar] [CrossRef]
  44. Caggiani, L.; Colovic, A.; Ottomanelli, M. An equality-based model for bike-sharing stations location in bicycle-public transport multimodal mobility. Transp. Res. Part A Policy Pract. 2020, 140, 251–265. [Google Scholar] [CrossRef]
  45. Farahani, R.Z.; Fallah, S.; Ruiz, R.; Hosseini, S.; Asgari, N. OR models in urban service facility location: A critical review of applications and future developments. Eur. J. Oper. Res. 2019, 276, 1–27. [Google Scholar] [CrossRef]
  46. Lin, J.; Cromley, G. Using the transportation problem to build a congestion/threshold constrained spatial accessibility model. J. Transp. Geogr. 2023, 112, 103691. [Google Scholar] [CrossRef]
  47. Fan, C.; Jiang, X.; Lee, R.; Mostafavi, A. Equality of access and resilience in urban population-facility networks. Npj Urban Sustain. 2022, 2, 9. [Google Scholar] [CrossRef]
  48. Mitchell, S.; OSullivan, M.; Dunning, I. PuLP: A linear programming toolkit for python. Univ. Auckl. Auckl. N. Z. 2011, 65, 25. [Google Scholar]
  49. Wilson, A. A statistical theory of spatial distribution models. Transp. Res. 1967, 1, 253–269. [Google Scholar] [CrossRef]
  50. Choupani, A.A.; Mamdoohi, A.R. Population synthesis using iterative proportional fitting (IPF): A review and future research. Transp. Res. Procedia 2016, 17, 223–233. [Google Scholar] [CrossRef]
  51. Demissie, M.G.; Phithakkitnukoon, S.; Kattan, L. Trip distribution modeling using mobile phone data: Emphasis on intra-zonal trips. IEEE Trans. Intell. Transp. Syst. 2018, 20, 2605–2617. [Google Scholar] [CrossRef]
  52. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  53. Flowerdew, R.; Aitkin, M. A method of fitting the gravity model based on the Poisson distribution. J. Reg. Sci. 1982, 22, 191–202. [Google Scholar] [CrossRef]
  54. Jin, G.; Liang, Y.; Fang, Y.; Shao, Z.; Huang, J.; Zhang, J.; Zheng, Y. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey. IEEE Trans. Knowl. Data Eng. 2023. [Google Scholar] [CrossRef]
Figure 1. Data. (a) Graph representation of the data. (b) Connected components of the graph and nodes of disjoint sets.
Figure 1. Data. (a) Graph representation of the data. (b) Connected components of the graph and nodes of disjoint sets.
Smartcities 07 00106 g001
Figure 2. Model architecture. A directed bipartite graph is treated as an input to the multi-head GAT that learns attention coefficients and nodes’ embeddings. For every edge, an input vector is obtained by concatenating the travel distance between nodes d i j , their input features O i , D j , learned embeddings h i , h j , and the averages of attention coefficients α ¯ i j , α ¯ j i . Edges’ input vectors are fed to the FNN to predict population flows, which are subsequently balanced in the recurrent block (the yellow frame in the figure). On each step of the recurrent block, first, the predicted flows are aggregated over the origin and destination nodes. Next, for each edge, the total predicted inflows and outflows are concatenated with other edge-related features: distance d i j , origin’s population O i , and destination’s capacity D j . The resulting vectors are passed to the shallow FNN, computing the flows’ balancing factors. Finally, the updated population flows are obtained by the multiplication of their previous values and corresponding balancing factors.
Figure 2. Model architecture. A directed bipartite graph is treated as an input to the multi-head GAT that learns attention coefficients and nodes’ embeddings. For every edge, an input vector is obtained by concatenating the travel distance between nodes d i j , their input features O i , D j , learned embeddings h i , h j , and the averages of attention coefficients α ¯ i j , α ¯ j i . Edges’ input vectors are fed to the FNN to predict population flows, which are subsequently balanced in the recurrent block (the yellow frame in the figure). On each step of the recurrent block, first, the predicted flows are aggregated over the origin and destination nodes. Next, for each edge, the total predicted inflows and outflows are concatenated with other edge-related features: distance d i j , origin’s population O i , and destination’s capacity D j . The resulting vectors are passed to the shallow FNN, computing the flows’ balancing factors. Finally, the updated population flows are obtained by the multiplication of their previous values and corresponding balancing factors.
Smartcities 07 00106 g002
Figure 3. Comparison of trip distribution models. (a) Overlapping catchment areas of kindergartens located in six adjacent blocks (highlighted by blue area). The population-to-provider ratio indicates that based on available capacities, 10 percent of the target population living within this territory lacks access to kindergartens. (b,c) Provision assessments for each building in the territory using the Linear Allocation Model (LAM) and the Doubly Constrained Gravity Model (DCGM), respectively. On the map, each point represents a residential building, and each square marker depicts a kindergarten. The color of the points indicates the level of facility provision. The sizes of the points and square markers denote the unprovided population and kindergarten capacity, respectively. Lines connecting markers represent population–facility interactions. (d) Revealed problem areas where residents lack sufficient kindergarten capacities (highlighted by red areas). (e,f) Distribution of provision assessments across the entire target population of the city obtained with LAM and DCGM, respectively. (g) Distance distribution from each building to the nearest kindergarten within the catchment area. (h,i) Distance distributions of trips to kindergartens obtained with LAM and DCGM, respectively.
Figure 3. Comparison of trip distribution models. (a) Overlapping catchment areas of kindergartens located in six adjacent blocks (highlighted by blue area). The population-to-provider ratio indicates that based on available capacities, 10 percent of the target population living within this territory lacks access to kindergartens. (b,c) Provision assessments for each building in the territory using the Linear Allocation Model (LAM) and the Doubly Constrained Gravity Model (DCGM), respectively. On the map, each point represents a residential building, and each square marker depicts a kindergarten. The color of the points indicates the level of facility provision. The sizes of the points and square markers denote the unprovided population and kindergarten capacity, respectively. Lines connecting markers represent population–facility interactions. (d) Revealed problem areas where residents lack sufficient kindergarten capacities (highlighted by red areas). (e,f) Distribution of provision assessments across the entire target population of the city obtained with LAM and DCGM, respectively. (g) Distance distribution from each building to the nearest kindergarten within the catchment area. (h,i) Distance distributions of trips to kindergartens obtained with LAM and DCGM, respectively.
Smartcities 07 00106 g003
Figure 4. Comparison of GNN-DGM and C-GNN-DGM predictions. (a,b) The scatter plots of the predicted versus observed number of people in each building provided with kindergarten services. (c,d) The distribution of differences between facilities’ capacities and in-going flows. (e,f) The distribution of differences between demand in buildings and out-going flows.
Figure 4. Comparison of GNN-DGM and C-GNN-DGM predictions. (a,b) The scatter plots of the predicted versus observed number of people in each building provided with kindergarten services. (c,d) The distribution of differences between facilities’ capacities and in-going flows. (e,f) The distribution of differences between demand in buildings and out-going flows.
Smartcities 07 00106 g004
Table 1. Models’ in-sample and out-of-sample performance.
Table 1. Models’ in-sample and out-of-sample performance.
ModelIn-Sample PerformanceOut-of-Sample Performance
NLLCPCNLLCPC
Integer Linear Programming
LLR−5.935 ± 0.3600.396 ± 0.002−6.955 ± 0.960.398 ± 0.003
DGM−6.932 ± 0.3890.557 ± 0.003−8.002 ± 1.0670.554 ± 0.005
GNN-DGM−7.88 ± 0.4020.666 ± 0.013−8.926 ± 1.1250.657 ± 0.014
C-GNN-DGM ( t = 1 )−8.246 ± 0.4110.687 ± 0.011−9.383 ± 1.1760.679 ± 0.013
C-GNN-DGM ( t = 7 )−9.08 ± 0.4570.769 ± 0.01−10.289 ± 1.2720.761 ± 0.012
Doubly Constrained Gravity Model (b = 0)
LLR−4.585 ± 0.2670.743 ± 0.002−5.438 ± 0.7380.740 ± 0.007
DGM−4.629 ± 0.2510.745 ± 0.003−5.482 ± 0.7370.743 ± 0.008
GNN-DGM−5.002 ± 0.2680.798 ± 0.003−5.832 ± 0.7460.791 ± 0.006
C-GNN-DGM ( t = 1 )−5.099 ± 0.2570.822 ± 0.006−5.975 ± 0.780.816 ± 0.011
C-GNN-DGM ( t = 4 )−5.189 ± 0.2810.844 ± 0.005−6.087 ± 0.7640.843 ± 0.008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mishina, M.; Mityagin, S.; Belyi, A.; Khrulkov, A.; Sobolevsky, S. Towards Urban Accessibility: Modeling Trip Distribution to Assess the Provision of Social Facilities. Smart Cities 2024, 7, 2741-2762. https://doi.org/10.3390/smartcities7050106

AMA Style

Mishina M, Mityagin S, Belyi A, Khrulkov A, Sobolevsky S. Towards Urban Accessibility: Modeling Trip Distribution to Assess the Provision of Social Facilities. Smart Cities. 2024; 7(5):2741-2762. https://doi.org/10.3390/smartcities7050106

Chicago/Turabian Style

Mishina, Margarita, Sergey Mityagin, Alexander Belyi, Alexander Khrulkov, and Stanislav Sobolevsky. 2024. "Towards Urban Accessibility: Modeling Trip Distribution to Assess the Provision of Social Facilities" Smart Cities 7, no. 5: 2741-2762. https://doi.org/10.3390/smartcities7050106

Article Metrics

Back to TopTop