An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem

Pereira, André Maia; Dingil, Ali Enes; Přibyl, Ondřej; Myška, Vojtěch; Vorel, Jakub; Kříž, Milan

doi:10.3390/app121910032

Open AccessArticle

An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem

by

André Maia Pereira

¹

,

Ali Enes Dingil

^1,*

,

Ondřej Přibyl

¹

,

Vojtěch Myška

²,

Jakub Vorel

² and

Milan Kříž

¹

Faculty of Transportation Sciences, Czech Technical University, 110 00 Prague, Czech Republic

²

Faculty of Architecture, Czech Technical University, 160 00 Prague, Czech Republic

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 10032; https://doi.org/10.3390/app121910032

Submission received: 27 August 2022 / Revised: 25 September 2022 / Accepted: 29 September 2022 / Published: 6 October 2022

(This article belongs to the Special Issue Micro-Mobility and Sustainable Cities)

Download

Browse Figures

Versions Notes

Abstract

In this study, an activity-based travel demand model of the Ústí nad Labem district (Czech Republic) is created. To do this, an advanced travel demand synthesis process is presented by utilizing the Eqasim framework, which is a pipeline-processing, initial raw data to simulation step. The framework is extensively modified and extended with several algorithms in order to utilize multiple data points for increasing realism in mobility for travel demand models. Two major extensions are provided. First, the pipeline framework is improved to estimate inbound and outbound trips of the study area, comprising a main city and 23 surrounding municipalities. The extended framework assigns synthetic gates for the study area as hubs for the inclusion of inbound and outbound trips. Second, the pipeline framework is advanced to provide a more compatible match of travel destination and activity location state. To do this, the extended framework assigns a capacity for each facility identified for the study area, the expected number of visitors to each facility, and the number of residents in each building. The resulting demand model is presented and the generated trips are evaluated based on locational, transport mode, and sociodemographic characteristics with origin–destination (OD) bundling. Additionally, distribution analyses of the present model are conducted to understand the matching results on a detailed level. The results demonstrate that the present model provides a reasonable output for transport researchers when testing different mobility scenarios and the provided extensions helps them to reduce implausible reflections of the distribution of travel and activity characteristics in household travel surveys while creating demand models, thus increasing realism. Lastly, open-source playground and code repository for further future improvement of synthetic travel synthesis methods are created, which enhances a deep understanding of the preparatory and methodological backgrounds required for complex activity-based simulations in order to inspire transport planners.

Keywords:

activity-based approach; synthetic population; travel demand; MATSim model; statistical matching

1. Introduction

1.1. Background

In recent decades, transport models have significantly contributed to the decision-making process for the creation of future transportation systems and related infrastructure investments, improvements, and innovations. Macroscopic transport models are the most widely used modeling approach in transport planning. They divide a study area into zones, and the centroids of these zones represent the origins and destinations of trips taken between the defined zones. “Travel demand” is generally illustrated with origin–destination (OD) matrices between the defined zone centroids. Macroscopic modeling approaches rely on aggregate travel forecasts, which compile and summarize the output of simplified trips. The most common trip-based approach is the four-stage model [1], established in the 1950s in order to measure the impact of the residential and commercial development. Nevertheless, the inadequacy of trip-based models for simulating complex policy scenarios has been noticed [1]. In real life, city inhabitants travel to participate in activities; therefore, one of the main limitations of trip-based models is reckoning without the fact that travel demand stems from people’s needs to participate in various activities that are distributed through time and space. Consequently, a new modeling approach has emerged, activity-based modeling [2], in which the behaviors of transport users are simulated based on daily activity patterns. Activity-based models include the sociodemographic characteristics of each inhabitant of a study area and also incorporate non-mobility activities for more precise forecasts. Activity-based models work at a disaggregated level [3], and travel is considered only one of the activity attributes. Each simulated inhabitant is modeled at a personal level, considering household-level attributes, in order to produce detailed information across a broader set of performance metrics. Origin–destination (OD) points are usually presented as precise coordinates in the study area instead of generalized centroids in the macroscopic approach. Activity participation is influenced by spatial, temporal, interpersonal, and accessibility interdependencies that harmonize in the activity-based model. Incorporating activities distributed in space and time brings a new, more realistic dimension to transportation demand modeling. These characteristics make this approach a strong tool when modeling the impact of new transport policies and transport investments [4,5] together with the impact of unexpected events on travel behaviors such as the COVID-19 pandemic [6,7], which cannot be modeled using traditional approaches.

1.2. Practice

MATSim (Multi-Agent Transport Simulation) is an agent-based simulator that leads the example of the activity-based model approach considered in this study. The program is a new open-source framework for the implementation of large-scale activity-based mobility simulations [8]. MATSim uses a co-evolutionary principle, in which each agent optimizes its daily activity plan iteratively in a competition between the space–time slots of all agents traveling. Each cycle starts with an initial demand arising from the daily activity chains of the inhabitants in the study area. This initial demand is optimized individually, based on natural selection, by each agent during iterations. Each plan in a daily activity chain has an associated score. A replaceable econometric utility model [8] is used to calculate these scores, which is the main tool for optimization within the program. The software is flexible and modifications of the program’s source code are possible. The software has been utilized in recent years for many purposes, including actual travel demand management [9], testing planned transport infrastructures and new transportation modes [10], observing travel behavioral changes [11], urban logistics planning [12], evacuation planning for natural disasters such as tsunami and hurricane [13,14], and smart city solutions such as testing V2V (vehicle-to-vehicle) and V2I (vehicle-to-infrastructure) technologies [15].

Agent-based models such as MATSim mainly require: (1) the simulation environment and (2) synthetic travel demand and transport supply. The simulation environment of MATSim is easily accessible by the open-source code repository of the software while the second requirement is a specific process dependent on certain spatial–temporal raw datasets. In order to support the reproducibility of agent-based simulation studies, the Eqasim framework was introduced in [16]. Eqasim is a pipeline-processing initial raw data to simulation step that utilizes raw datasets such as census data (CD), household travel surveys (HTS), and geographic datasets (GD).

1.3. Reasoning and Contributions

It is unlikely that nationwide census surveys could be conducted to collect the daily activity plans for all of a country’s inhabitants, mainly due to cost pressures and the lag in data collection over time. Therefore, the MATSim activity-based model approach brings with it some limitations and challenges for urban planners and policy-makers when testing urban scenarios, because a large amount of data is required to create such simulations. Additionally, even though the required data types are captured sometimes with household surveys and censuses, some important variables for creating travel demand models are absent or incomplete. Therefore, it is indispensable to use an algorithmic tool to complete missing pieces of the puzzle in a convenient way to create a demand model for any study area. MATSim does not have such an algorithmic tool or module for converting all available mobility datasets into creation of a synthetic population; therefore, transport demand and supply data must be processed and prepared in MATSim format before any urban simulation can be attempted. There are some noteworthy MATSim activity models presented in the literature, for Berlin [17], Paris and Ile-de-France [9,18], the Greater São Paulo Metropolitan Region [19], Jakarta [20], Zurich [21], and San Diego and San Francisco [11,22]. Only a few of these studies use a holistic algorithm for producing a more realistic activity model [9,19], in which the Eqasim [16] tool is utilized to create a synthetic population. However, the Eqasim tool also has important gaps to provide enough mobility realism in the demand models, which this study addresses and aims to fill. The main research problems are identified as the absence of the inbound and outbound trips of the study areas and the insufficiency of match compatibility between travel destination and activity location state in the study areas. This paper focuses on a specific Czech example, the Czech district of Ústí nad Labem to test the planned extensions. The novel contributions of the present work are as follows:

The first-ever (to the best of our knowledge) travel demand synthesis for a city in the Czech Republic is presented and the activity-based travel demand model of the study area is created.
The Eqasim framework is extensively modified and extended with several algorithms in order to utilize multiple data points for increasing realism in mobility for travel demand models. Two major extensions are provided. First, the pipeline framework is improved to estimate inbound and outbound trips of the study area by assigning synthetic gates for the study area as hubs. Second, the pipeline framework is advanced to provide a more compatible match of travel destination and activity location state by assigning a capacity for each facility identified for the study area, the expected number of visitors to each facility, and the number of residents in each building.
An open-source playground and code repository of the applied methodology is created that enhances a deep understanding of the preparatory and methodological backgrounds required for complex activity-based simulations.

This paper is organized as follows. Section 2 identifies the study area and data sources, and then introduces the applied synthesis framework. Section 3 presents and evaluates the resulting activity model for the study area. Section 4 concludes the present work.

2. Methodology

2.1. Study Area and Data

The background of this study stems from a need for activity-based analysis of travel behaviors as part of a Czech national project, “Smart City—Smart Region—Smart Community” [23]. The study focuses on the Ústí nad Labem catchment area, referred to as the district area. In general, a district area in the Czech Republic represents a functional region consisting of a central city and surrounding municipalities that are dependent on the central city in terms of jobs and services. The district of Ústí nad Labem is located in the northwest part of the Czech Republic, shown in Figure 1. The main city (Ústí nad Labem) and its surrounding 23 municipalities are home to 90,378 and 26,538 inhabitants, respectively, for a total of 116,916 residents [24] The Czech Land Survey Office provided zonal boundary data for the study area [25]. In recent decades, the city of Ústí nad Labem had a dramatic transition from an industrial-based economy to service-oriented one. The city is a regional center with important administrative, cultural, and educational institutions, including a university. There is a complex railway junction connecting many national and international cities in the Ústí nad Labem city center, making the city an important transportation hub.

Census data (CD) provided by the Czech Statistical Office [26] was used to generate a synthetic population for the study area together with household travel surveys (HTS) to reflect travel demand and geographic datasets (GD) containing useful information about the buildings, facilities, and land use of the study area in order to increase the realism of travel demand assignment. Census data are collected every ten years in the Czech Republic, and the latest available census was conducted in 2011 (CD-2011). The CD-2011 provides demographics, travel attitudes, housing conditions, and work and education locations relative to home locations (e.g., in the same municipality or in another municipality of the same district) for each inhabitant. A demographic transition algorithm was applied to this dataset to reflect a realistic sociodemographic structure for the year 2016, as explained in the next subsection. Demographic transition was implemented by utilizing a dataset provided by the Czech Statistical Office containing birth rates, mortality rates, and the information about the residential mobility of inhabitants [27,28,29].

HTS represents travel data of a sample of the citizens containing transport mode preferences, daily activity purposes, daily activity chains, origins and destination (OD) points of trips, trip purposes, trip distances, and trip durations. Two household travel surveys were utilized here, one is at national level (Czech Republic HTS) and the other is at the city level (City HTS). The Czech Republic HTS was conducted by the Transport Research Center (CDV) in 2017, and this dataset is openly available [30]. The City HTS survey was conducted in 2016 by the NMS Research Center [31]. City HTS provided information about trips starting or ending in the city of Ústí nad Labem, while Czech Republic HTS was used to simulate travel demand for the remaining 23 municipalities. Czech Republic HTS and City HTS datasets provided travel behaviors of 21,209 and 2054 respondents, respectively. However, after some data filtering and cleaning (see the next subsection), the sample size was reduced to 12,397 and 937 respondents, respectively. The census and travel survey datasets do not contain weights showing how many respondents a specific entry represented. The weights of each sample were assigned based on their age and gender in the datasets. Furthermore, HTS datasets only provided information about inhabitants having trips on the day of the survey. Therefore, average trip frequencies from the “EU Survey on Issues Related to Transport and Mobility” [32] were used in this study to estimate the distribution of the inhabitants staying at home based on their employment statuses, which were available in the HTS datasets to a large extent; the Czech Republic average was used when employment data were missing.

Both the CD and HTS datasets required some generalization to enable a matching process between them by transforming some attributes into common terms. This is a harmonization process; we use this term in the next section. The harmonization process is applied to 19 attributes; we briefly describe two of them (please see the open-code repository in Supplementary Materials for the details). The activity types represented in daily chains (e.g., home–work–home) of the respondents in the HTS datasets were categorized into six activity purposes: home, work, education, free time, shopping, and errands. In order to apply harmonization to the activity purposes for City HTS, activities related to sports, culture, and leisure were unified into one “free time” category. The activity purposes called “workplace”, “business meeting”, and “entrepreneurship” were placed in a “work” category, and activity purposes called “services”, “visiting public institutions”, “doctor”, and “other” were categorized as “errands.” For Czech Republic HTS, activity purposes related to eating and shopping were united in one “shopping” category, while work and business trip activities were placed in the “work”, and errands and other activities into “errands” category. For the transportation modes of each journey (i.e., the mode of the main trip) in the CD and HTS surveys, we reference the transportation mode types from the Czech Republic HTS, which are walk, bicycle, public transport, regional bus, regional train, car driver, car passenger, and other. In this case, for the City HTS, taxi and passenger car were grouped together into “passenger car”, while traveling by motorcycle was placed in the “other” category to match with the Czech Republic HTS arrangement. The public transport category was used for “town bus” and “trolleybus” in both HTS, meanwhile Czech Republic HTS also has additional “tram” and “metro” possibilities that were also defined as public transport. For the CD, the harmonization process for the transportation modes was much more complex as it had dozens of possibilities; please check Supplementary Materials for information.

Geographic datasets (GD) provided detailed information about the location of residential and work-related facilities in the study area. They were extracted from the RCDB (Register of Census Districts and Buildings) database provided by the Czech Statistical Office [33] for a more precise location assignment. The facilities where free time, shopping, and errands activities took place were extracted according to OpenStreetMap (OSM) tags and sourced with the OSM 442324 reference. The data regarding the capacity of each facility (for work) and expected number of visitors for each facility (for free time, shopping, and errands activities) were estimated based on each building’s area building which is sourced from the RCDB [33] together with the number of residents. Moreover, educational facilities (except for universities) and their capacities were sourced from the Register of Schools and Educational Facilities (RSEF) database [34] provided by the Czech Ministry of Education Youth and Sports. University facilities and their capacities were manually added, based on information provided in university reports and annual reports of the municipalities.

The synthesis of the population with the data sources presented provided the most holistic approach to date ever considered in the Czech Republic in terms of presenting disaggregate characteristics and activities at an individual level. Figure 2 shows the entity–relationship (E–R) diagram of the data model used in this study, representing which data are necessary from each data source and how they are related.

Nevertheless, there were some data management limitations in this study due to the extreme dependency on available data, which were not always complete. An important socioeconomic indicator, income, was not available in the CD, for example. However, the CD contained primary journey data for each person, including journey transport modes and average journey times, which was crucial for the demand synthesis process. Therefore, there were many undefined answers in the CD; for example, 72.3% of travel characteristics (average journey time, journey transport mode, etc.) for respondents were also missing in the CD dataset. As a result, a sampling process was conducted to complete missing values based on the attributes of defined subpopulations, explained in the following subsection. The City HTS dataset’s primary limitations were the low number of survey respondents, and some questions did not have responses. Because regional HTS data were not available for the study area, national HTS data were used, which generalized activity patterns to a certain extent. Therefore, both HTS data had to be considered as being limited in terms of their ability to illustrate actual behaviors. Data limitations for OpenStreetMaps in estimating the capacities of facilities and areas in terms of workers and visitors [11,19,21] were resolved by using building registration datasets, which provided floor plan information and activity purposes for each certified building in the city area. This is important to note, because solid assignment of facilities data is critical for increasing a realistic foundation for activity-based scenarios. However, the building registration datasets were also missing information about the floors and categories for some facilities. Thus, some steps were conducted to complete the dataset, as explained in the following subsection. The following subsection presents the applied synthesis framework for generating a synthetic population for the study area.

2.2. Synthesis Framework

In this section, the steps leading to the creation of the synthetic population are presented. The synthetic population including travel demand of the study area was generated by modifying and extending the Eqasim framework used for the São Paulo case [19]. This scenario served as a basis for the application of Eqasim for the district of Ústí nad Labem because the types of data available in both cases were similar. However, one major difference between the two cases is the binding of different algorithms. The São Paulo case utilizes the SYNPP package [35] Python to bind different algorithms together in order to provide a large pipeline generating MATSim input files, which is also originally used in the Eqasim framework. There are some difficulties of code debugging and of some simplifications while parallelization in coding for the tqdm package [36] of Python; therefore, we prefer to write codes by mimicking (partially) SYNPP functionalities to avoid these issues. Figure 3 shows a simplified data and process flowchart of the applied framework, which is divided into data handling and population synthesis steps, including the additional data sources for each operation. The operations are described along with this subsection according to the order of the process flow.

First, the Czech zoning dataset was loaded in step Z1, which is includes the basic settlement units, cadastral area units, municipalities, and districts. In step C1, a demographic transition was conducted to mimic real population changes for the period from 2011 (when the census was conducted) to 2016 (when the City HTS was conducted). This mimicking process was generated with a stochastic simulation of several demographic transitions coded in Python. The demographic transition process was conducted based on age, gender, and geolocational data. The process utilized rates of birth and mortality and residential mobility data for the population. First, mortality of citizens was simulated for each year by converting mortality rates to the mortality probabilities assigned to each person based on age and gender. Deceased citizens were removed from the synthetic population at the end of each simulated year. Then, birth dates for the citizens were simulated for each year by converting birth rates to the birth probabilities assigned to each female, also considering the ages of the women in the study area. Genders for newborn citizens were probabilistically assigned with the ratio of 1.05 male to 1 female in the study area. Newborn citizens were included in the synthetic population at the end of each simulated year. Relocation of citizens between municipalities was performed based on the demographic database information provided by the Czech Statistical Office. The database provided the number of citizens relocating in each municipality each year, and it also contained information about the inbound and outbound migrations for the municipalities studied, including age and gender for previous and new residents. Citizens relocating to the outer part of the studied municipalities were assigned by justifying the distribution of age and gender of citizens in the database. Such citizens were consequently removed from the synthetic population. The missing values in personal attributes, i.e., family status, work sectors, places of education, travel characteristics, zones of residence, and housing types were completed by a sampling process performed on the original citizen records based on each subpopulation group categorized by gender and age characteristics in the dataset. If a defined subgroup did not have complete indicator values for each person, the missing values were completed by conducting a sampling process considering only three age subgroups: 0–14, 15–64, and 65+ years of age.

Figure 4 shows population changes in the study area (i.e., the district of Ústí nad Labem) from 2011 to 2016 as a result of the demographic transition procedure. The population of the city of Ústí nad Labem declined, while population growth was observed in its surrounding municipalities. This indicated that there was a suburbanization process taking place during the study period, and changes in the commuting patterns for the study area were to be expected. Step C2 cleaned the census data (CD-2016), extracting only citizens residing in the study area and harmonizing the CD-2016 data with both City HTS and Czech Republic HTS data according to the process described in the previous section.

Step H1 organized both City HTS and Czech Republic HTS data, in addition to also applying the harmonization process. Inconsistent data were removed, notably when some of attributes were unknown, such as average trip distances and the district codes for the origins or destinations of a trip, when trip purposes were missing, when activity chains did not start and end at home, and also when trips were repeated. Step H2 was originally intended to filter HTS trips that were totally carried outside the study area. This process is important to ensure more precise travel behavior, and it is also available in the open code in Supplementary Materials; tough it removed too many samples. Therefore, we deactivated this step of Eqasim framework and focused on the assignment of synthetic gates for inbound and outbound trips of the study area based on the average travel times between the study area and any municipality in the Czech Republic. As the Czech Republic HTS have only OD data between certain municipalities, synthetic gates were assigned using weighted random with the population of the destination’s municipality as weight parameter. This step enabled us to model city visitors/residents traveling in and out of the city in the simulation.

Step P1 performed the statistical matching process applied in the São Paulo scenario [19] for CD-2016, Czech Republic HTS, and City HTS datasets, which utilizes a simplified version of the hot-deck matching algorithm [37]. The principle of this process relies on assigning travel behavior characteristics from an HTS sample to a sample in the CD-2016 based on the sociodemographic attribute values in both the CD and HTS datasets. It requires mandatory attributes, which represent attributes that must be same value in both CD-2016 and HTS samples and preferential attributes that are not required to be the same. Let a be defined as an attribute, the sum of mandatory and preferential attributes as N, and a sample from the CD dataset as c, we can define the sample’s vector of its attribute values as follows:

a_{c} = (a_{c, 1}, \dots, a_{c, N})

(1)

Similarly, for a sample from the HTS dataset, s:

a_{s} = (a_{s, 1}, \dots, a_{s, N})

(2)

First, the algorithm creates a vector with all possible combinations k of mandatory attribute values with preferential attribute values, for instance:

a_{k} = (a_{k, 1}, \dots, a_{k, N})

(3)

Then, for every combination k of attribute values, we define β = {s|a_s = a_k} as the samples with same attribute values towards a combination and its size as the number of HTS samples matching with this combination, likewise γ = {c|a_c = a_k} while its size is the number of matching CD samples. Considering a minimum number of HTS samples to match with CD samples M that exist with any matching CD sample, i.e., |β| ≥ M and |γ| > 0, then the algorithm randomly assigns HTS samples to CD-2016 samples. Afterward, if remaining CD samples need to be matched, the algorithm repeats the process by dropping one by one each preferential attribute; for example, for the last preferential attribute:

a_{k} = (a_{k, 1}, \dots, a_{k, N - 1})

(4)

We used mostly preferential rather than mandatory attributes, and the minimum number of HTS samples to be matched with a CD-2016 sample is set to M = 3. These settings may lead to overfitting, but the main reason for these settings was the insufficiency of HTS data sample size for many groups of people with specific characteristics. The following attributes were used for preferential matching:

(a): gender;
(b): education;
(c): economic activity (e.g., student, employee, retired);
(d): main journey mode;
(e): primary activity location related to home;
(f): zone (the cadastral area unit for City HTS, and the region code for Czech Republic HTS).

The attributes used as a mandatory matching were:

(a): age;
(b): town size (only for the Czech Republic HTS dataset).

This configuration enabled us to match 100% of the CD-2016 persons. However, the inclusion of any attribute from preferential to mandatory resulted in the removal of 10~25% of the synthetic population, especially for the City HTS dataset. Therefore, retaining all citizens was preferred.

In step F1.1, OpenStreetMaps (OSM) data were imported by utilizing specific tags [38] of OSM data, the values of building and amenity keywords were sufficient to import facilities related to almost each activity purpose (i.e., home, free time, shopping, work, education, errands), although some purposes such as free time also required other tag keys, such as “leisure”, “natural”, “tourism”, and “hiking” to capture especially the squares, parks, etc. Step D1.1 calculates the proportions of work- and education-related trips between zones by utilizing O–D pairs for each trip from the HTS datasets according to the weights of each respondent sample. In step F1.2, facility locations and their characteristics were defined by a complex combination of different datasets. Educational facilities were imported from the Register of Schools and Educational Facilities (RSEF), containing student capacity of the facilities and educational degrees granted by them (e.g., primary schools, high schools, universities). The Register of Census Districts and Buildings (RCDB) database was utilized for the location and shape of residential and commercial buildings with their characteristics linked with CD data. The building types in the RCDB dataset were harmonized based on their usage (e.g., home, work, home and work) and functionality (e.g., industry, agriculture, hospitality). The secondary facility locations were imported from OSM as a result from the F1.1 operation. The utilization purpose of each facility was carefully assigned; for example, a shopping mall may have shops, a post office, restaurants, doctor offices, and pharmacies, among others; thus, multiple activity purposes are considered during the assignment process. To achieve this, OSM data points matching the location of the building areas imported from RCDB were utilized.

Facility occupation was estimated based on the activity demands related to each facility’s purpose. The activity demand was estimated based on floorspace of facilities and their floorspace productivity (number of workers and visitors per floorspace unit). The RCDB included information about the number of floors in each building, and this information was integrated with building footprints provided in the ZABAGED digital map [39] to estimate the floorspace of each building. The productivity of a floorspace unit was calculated in order to assign the number of workers and visitors based on facility purposes. The productivity of floorspaces was derived from the normative indicators provided by EDIP [40], and by using various sources of information: legal prescriptions, normative guidelines, institutional annual records and sociodemographic statistics. The total number of workers and visitors for each facility was assigned by multiplying the total floorspace of each building with the productivity of a floorspace. It is important to note that open-air facilities were defined to have unlimited visitor capacity, and not to have residents and employees. Lastly, the results of the assignment operation for the residents, visitors, and facility capacities are visualized below by utilizing ArcGIS Pro. Figure 5 demonstrates the geoposition of all assigned locations in the study area based on home, work, educational, and secondary activity categories (free-time, shopping, errands). Figure 6 quantifies the number of residents in each building and the capacities of the work and educational facilities, while Figure 7 visualizes the number of visitors for the secondary activity locations (free-time, shopping, errands) in the study area.

Step D1.2 defines work and educational zones for each person in the synthetic population by utilizing the OD proportions outputted in the D1.1 operation, as in the São Paulo scenario [19]. For each origin home zone o and destination primary zone d, we used multinomial distribution to estimate the quantity of trips f_od (and assigned citizens to these trips) until the last destination D, based on the OD proportions p_od and the number of people traveling from the origin zone n_o:

(f_{o, 1}, \dots, f_{o, D}) ~ M u l t i n o m i a l (n_{o}; p_{o, 1}, \dots, p_{o, D})

(5)

In step P2, residential and primary activity locations of the citizens are assigned by utilizing two different algorithms, as demonstrated in Figure 8. First, the framework assigns the citizens by using algorithm 1 to the residential buildings located in the zone where they live according to CD-2016 data. Then, once the home location of every person is defined, the framework assigns the primary activity locations (i.e., work and educational) of the citizens by applying algorithm 1 if the home and primary activity location are in the same zone, and when the home location and primary activity are in different zones, then algorithm 2 is applied. The assignment process is conducted in order, where each arrangement corresponds to a portion of the population and a subset of facilities to be assigned. For instance, a citizen working in the retail sector based on the CD-2016 is to be considered for all possible work locations classified as either events, commercial, or multipurpose in the RCDB. For another example, students within the age group 8–14 have their educational locations filtered to schools offering the grades of a primary school. However, citizens over 29 years old were additionally placed in this category, because parents are likely to escort children to primary schools. Algorithm 1 runs several times, once for each zone and selecting only the facilities located inside the zone. Initially, the algorithm checks if any relevant facility j is present in the zone with the attributes matching with a specific portion of the population (e.g., age group 8–14 for primary schools), and if not, random locations with unlimited occupancy O_j in the zone are generated. The occupancy O_j of a facility is defined according to the number of inhabitants (in the case of residential assignment), work places (in the case of worker assignment), and study places (in the case of student assignment). In the case of residential locations, the assigned facility is simply defined by the one with the maximum available occupancy, and then once a person i is assigned to a building j, then one unit of occupancy, i.e., F_i = j, one unit of occupancy is dropped from O_j. In the case of the assignment of primary locations, for every person i, the distance between home and each facility d_j is calculated. A cost to each facility x_i is set as the absolute difference between the calculated commute distance and the declared commute distance of the inhabitants (straight-line distance to the primary location) c_j. Afterward, facilities with no remaining occupants are marked with infinity cost. If there are still facilities with finite cost, the algorithm chooses the one with the minimum cost. On the other hand, if all filtered facilities have infinite cost, the assigned facility is the one with the highest occupancy. Algorithm 2 runs only once, but separately for work and education locations. The first step assigns a radius to each person p_i based on the cumulative distribution function (cdf) of all trips in the HTS data (selected based on home location), and defines facility j and occupancy O_j as in Algorithm 1. The facility order is based on the proximity to the assigned radius Δd_i. If there is not a facility found within the given radius, the algorithm assigns the nearest facility available and reduces the capacity of the assigned facility. In the case of unavailability of the nearest facility because of full occupancy, the algorithm drops them from the search set and tries to find another one. In the case of an absent facility with available occupancy, the algorithm still assigns the citizen into the nearest facility.

In step D2, travel-distance distributions from both HTS datasets were generated for each journey transport mode according to the weight of sample citizens and declared trip time. For some transport modes only used in several samples, a bin size of 20 is applied. In step P3, the relaxation–discretization algorithm [41] was used as in the São Paulo scenario [19] to assign the secondary activity locations for the citizens. We contributed here by removing facilities from the list of possible locations when they have no available capacity (similar to Algorithm 2 of P2 operation). At first, the algorithm finds the available secondary-activity facilities with at least occupied ones (the expected number of visitors for secondary locations is utilized here). After that, the algorithm selects all activity chains containing secondary trips. Then, sampling of travel distances for each trip is conducted based on declared travel time and journey transport mode. Afterward, possible destinations to assign secondary activity locations (from a primary activity location as origin) are determined to match travel distances between the activities based on the sampled distances. Following the order of possible destinations, if the assigned location is fully occupied, the algorithm removes it from the list of possible locations and tries to assign the next possible destination. This occurs until a facility is assigned to a person and the visitor capacity (i.e., occupancy) of this location is reduced by one after each successful assignment. Lastly, the final synthetic population files were created in MATSim format. The resulting synthetic population for the study area is presented.

3. Results

In this section, the results of travel demand model are analyzed with the OD bundling technique to evaluate the generated trips based on locational, transport mode, and sociodemographic characteristics. Thereafter, model results are presented together with several distribution analyses to understand the matching results on a detailed level by utilizing frequency charts, histograms, and cumulative distribution plots, as in other MATSim demand synthesis studies [9,19]. ArcGIS Pro program was utilized for the visualization of OD flows (Figure 9, Figure 10, Figure 11 and Figure 12), while the Matplotlib package of Python is utilized for the demonstration of data distributions (Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17).

Figure 9 shows origin–destination pairs for the primary activity trips in the study area. The OD visualization demonstrates how work- and education-related trips are concentrated based on the density of destination locations and their capacity, which is presented in the last subsection. The figure also indicates how inbound and outbound trips are included by originating and ending journeys at the synthetic gates, which is one of the main contributions of this study. The synthetic gate located in the south of the study area has higher demand than other gates, which is expected considering the concentration of living areas along with the south side of Elbe River (taking the city center as base). However, utilizing Dijkstra’s algorithm, which is based on the shortest path in the framework, might estimate south trips somewhat more than in real life. In case of data availability, the extension of the algorithm considering travel speed as path weights would provide more balanced gate demands. Figure 10 presents origin–destination pairs for the secondary activity trips (shopping, leisure, errands) in the study area. As seen in the figure, secondary trips originating from home are much more homogeneously distributed within the study area than primary trips, an observation that is expected by examining the locational distribution of home and secondary activities presented above. The figure also shows how secondary trips originating from work and educational locations (in the middle and right figures, respectively) and their destination locations are more concentrated in the center. This situation is anticipated considering the visitor concentration of facilities demonstrated in the previous subsection. Figure 11 demonstrates origin–destination pairs for the primary and secondary activity trips based on main transportation mode (car and public transport). The figure indicates that car trips are taken more for the secondary activities, while public transport trips are much more dominant than car trips for the primary activities, which is reasonably accurate because half of the commuting trips are taken with public transport in Usti nad Labem city [42]. In both cases, long (outbound) trips are taken more by car than public transport, which is an expected result. Figure 12 presents origin–destination pairs for the primary activity trips based on main transportation mode (car and public transport), age groups (18–24, 25–39, ≥40), and gender. As seen in the figure, the frequency of the primary activity trips increases by an increase in age, which is consistent with greater participation in economic activities expected for the elder age groups. Additionally, the figure shows that public transport usage is slightly more dominant in female population than male; a rational result considering transport literature [43].

Figure 13. Socio–demographic characteristics.

Figure 14. Travel characteristics (note that PT = public transport, Com = commuter, P = passenger).

The distribution of the sociodemographic attributes is shown in Figure 13 as a comparison of the utilized data sources for Ústí nad Labem and its surrounding municipalities. The gender distributions in both HTS datasets are similar to the CD-2016 dataset. Age is one of the most significant sociodemographic attributes affecting transport-related decision-making; therefore, the accuracy of this parameter is important for the simulation models. However, there were large age distribution differences between the HTS data and the actual situation (i.e., census) for the city study area, while the difference was less for the surrounding areas. The city’s population was much higher than the surrounding areas, while City HTS only reflected a small fraction of the real population when compared to the sample size in Czech Republic HTS. Therefore, Czech Republic HTS reflected a more realistic age distribution because there was a higher balance between the real population and sample size. We thus recommend that household travel surveys always should be supported by census data, especially because as a population grows in size, it is difficult to reflect the real situation with only a limited sample population. Figure 13 also demonstrates the matching of the sociodemographic characteristics (i.e., the actual population versus the synthetic population). As seen, the actual sociodemographic structure (CD-2016) was directly reflected in the synthetic population as it was in the actual census, therefore the accuracy assessment of sociodemographic variables is not necessary.

Figure 14 demonstrates the distribution of some travel characteristics (average travel times and main transport modes) by comparing data sources for the city study area and its surrounding area. The mismatch of travel characteristics for both HTS survey datasets and the census data is indicated. As expected, both HTS observations overestimated or underestimated the distribution of average travel times and journey transport modes in the total population. For instance, there is a large difference between census and both HTS datasets for the distribution of citizens who traveled around 90 min or more (≥90). Travel time is a major parameter in transport models, so the accuracy of these parameters is important. This inconsistency, in our case, was alleviated by involving CD-2016 data, where the travel-time characteristics of the synthetic population were directly sampled from the CD-2016 data, as seen in Figure 14; therefore, it is not necessary to discuss the accuracy of this variable. The distribution of the main transport modes was also improved with the algorithm by providing a balance between HTS data and census data, though there are no fine adjustments seen. This mismatch is because of the fact that journey transport modes were used as a preferential attribute in the algorithm, which is preferred because of the original data incompleteness in terms of travel characteristics, as noted in the data section. However, according to a CIVITAS report [42], the transport modal share of the city area for commuting is estimated for public transport, car, car share, walking, train, other modes as 50.6%, 15.3%, 2.7%, 13.4%, 1.5%, and 16.5% respectively. Apparently, our algorithm might converge the modal share of city area better than in the HTS and census datasets.

Figure 15. Activity-chain characteristics (note that h = home, w = work, e = education, s = shopping, l = leisure, er = errands).

Figure 16. Activity-chain characteristics for the mean of 20% sampling seeds.

Figure 17. Travel-distance characteristics.

Figure 15 shows the distribution of activity-chain characteristics for both travel surveys and the synthetic population. There was a satisfactory match for the distribution of trip counts between the datasets, and we observed a lower than 5% difference between datasets for almost all chains, while only the difference of “one trip” count probability for the city area was slightly higher than 5% in the synthetic population. The matching process for travel characteristics and the accurate assignment of resident numbers, facility capacities, and visitor numbers mainly caused this difference. The observed differences between the HTS datasets and the synthetic population for the main activity chain (home–work–home, accounting for 34% of trips) was around 1%, while there was an exact match for the activity-chain types representing 16% of the trips such as h–er–h (home–errands–home), h–w–s–h (home–work–shopping–home), h–er–s–h (home–errands–shopping–home). These were successful matches. The h–e–h (home–education–home) trips were estimated to be 6% higher than in the HTS surveys, and the reason for this was the consideration of escorting activities (such as bringing children into a school) to be education-related trips in our algorithm. Therefore, this result was expected. The count of h–s–h and h–l–h chains was underestimated in the synthetic population, because the accurate assignment of capacities and visitors changed the chain variety for shopping and leisure activities in the population. The shopping and leisure activities were distributed as a “leg” in the remaining minor activity chains, which represented more than 10% of the trips; therefore, the results were reasonable. To further confirm the rank-order matches of activity chains, Spearman’s correlation is checked between HTS data and the synthetic data by utilizing Scipy package in Python. Considering the leading 20 activity-chain types for the entire area in the datasets, Spearman’s correlation coefficient of chain type rank-order is 0.897 (p value < 0.001) which is very strong evidence of the successful match. Additionally, Spearman’s correlation coefficient of trip count rank-order is 0.983 (p value < 0.001) and 0.949 (p value < 0.001) for the city area and the surrounding area, respectively, which confirms the successful rank-order match. Furthermore, we analyzed the synthetic output at a different sampling rate, shown in Figure 16. Population downscaling is applied in almost all simulation studies because of the requirement of higher computational resources, which is beyond the existing software abilities, such as in MATSim [44]. The effect of different population downscaling rates on the simulation results of MATSim is categorized and reviewed in a study (please see [44] for the details), which 1% to 25% sampling rates are preferred in almost all MATSim studies. Reckoning this information and population size of the study area, we tested a moderate value of 20% here. Two hundred synthetic data configurations (as similarly applied in [9]), equivalent of 20% of population size, were derived from the synthetic population with the random seeds. Figure 16 demonstrates the mean percentages of activity chains and trip counts in the 200 derived populations. Additionally, the range between the minimum and maximum percentages of the variants in the derived populations is presented with a black strip attached to the bars. Even if the population downsampling is applied, the mean distribution matches of activity-chain characteristics are at a similar level with the main data, as seen in Figure 16. The mean standard deviation of the distribution of all activity-chain types in the 200 derived populations varies between 0–2.8%, which is reasonable. The mean standard deviation of the distribution of trip counts in the 200 derived populations for the city area varies between 0–1%, except “one trip” and “four trips”, which are 6.96% and 3.34%, respectively, while the standard deviation is between 0–2.2% for the surrounding area except “one trip”, which is 7.37%. There are substantial variance differences for several trip counts. Higher sampling rates, over 25%, are required, as suggested in [44], for larger confidence margins to preserve the major demand characteristics. This is noted for the scholars while utilizing this data model.

Figure 17 presents the probability histogram of the average straight-line travel distance for trips in the city area and the surrounding area by comparing the data from the household travel surveys and the synthetic population. Weighted population is considered for the visualization in Figure 17, because a weighted sampling process for the distances is performed. The pattern of travel-distance distribution exhibited in both datasets is similar, but there are up and down spikes in the travel-distance distributions for both areas. The main reason for this inconsistency is that the variation of distance in the HTS data was quite limited, due to the low sample size, while our contributions resulted in more realistic patterns. Nonetheless, considering the matching results for travel characteristics and locations, the travel-distance match results appear reasonable. As expected, the number of trips was higher in the city area because the population was larger than the surrounding area. Figure 17 also shows the average cumulative distribution of travel distance along with the entire area for a holistic overview, illustrating differences in travel-distance distribution for the entire study area. The distribution pattern is similar and a moderate level of matching was observed. Furthermore, the cumulative distribution of travel distance along with the working population and education-related population for the city area and the surrounding area was analyzed to understand the travel-distance distribution more deeply. Figure 17 demonstrates that the work trips were longer than the educated-related trips in the synthetic data for both areas. There was a significant distribution difference in travel distances for both trip purposes. As expected, the distribution difference was lower in the surrounding areas than in the city, while the travel-distance distribution of the education-related population matched better than the working population for both areas. The reason for this mismatch, as noted above, was a discrepancy in travel characteristics in the source datasets. Thus, matching travel-distance distribution from both HTS surveys into the synthetic population was affected. The mismatch of travel-distance distribution was the highest for the working population in the city area, as seen in Figure 17. It is clear from this working population mismatch that the distance distribution in the HTS datasets was weak and related to how the applied algorithm contributed to the synthetic data, resulting in more reasonable and lifelike patterns.

The original CD-2011 dataset was incomplete in terms of travel characteristics, residential locations, and primary activity locations, though the dataset was completed as a result of the matching process for the demographic transition, as noted above. The demographic transition and data completion processes were expected to converge with the actual 2016 conditions. Considering that the travel and activity attributes of citizens were only marginally reflected in the household travel surveys, and the original census was incomplete, our intent was not to create an exact match as a result of the synthesis between the HTS data and the synthetic data or between the census data and the synthetic data as in other studies [9,19], but rather, we aimed at creating reasonable output by utilizing datasets while eliminating data deficiencies by the main contributions presented. The activities and travel characteristics of citizens were observed to be distributed in a more realistic manner than in the utilized datasets after synthesis, because we considered the inclusion of inbound and outbound travel demand of the study area. A more compatible match between travel destination and activity location state was provided as a result of the assignment of the capacity, visitors, and residents in each facility.

4. Conclusions

This paper presented an advanced process of travel demand synthesis for creating a MATSim activity model. As a result of the study, an activity-based travel demand model for the Czech city of Ústí nad Labem and its surroundings was created. The Eqasim framework was extensively modified and extended with several algorithms in order to utilize multiple data points for increasing realism in mobility for travel demand models. Two major extensions are provided. First, the pipeline framework is improved to estimate inbound and outbound trips of the study area by assigning synthetic gates for the study area as hubs. Second, the pipeline framework is improved to provide a more compatible match of travel destination and activity location state by assigning a capacity for each facility identified for the study area, the expected number of visitors to each facility, and the number of residents in each building.

The generated trips of the model are evaluated based on locational, transport mode, and sociodemographic characteristics with OD bundling. Additionally, distribution analyses of the present model are conducted to understand the matching results on a detailed level. The results demonstrate a more reasonable pattern of mobility behavior than in the utilized datasets presented. The provided extensions helped to reduce implausible reflections of the distribution of travel and activity characteristics in household travel surveys having significant data deficiencies, thus realism in mobility patterns is increased. Freight mobility and transit trips linked to the study area would enrich the present model, making the urban traffic patterns even more realistic if a sufficient source for these data were available. This extension will be considered in the future.

Policymakers and city administrators must cope with the growing complexity of urban processes; therefore, sustainable transport policies depend on understanding mobility patterns which emerge or change through time and space. This study is an important resource for urban planners and scholars to use in understanding the preparatory and methodological backgrounds necessary for creating complex activity-based models and simulations. The study also contributes to further improvement of synthetic travel synthesis by providing an open-source playground and code repository for the provided data model. The model presented here opens the gate to more evidence-based policy-making for testing cycling infrastructure systems, micromobility systems, and other sustainable transport regulations for the study area, and that will be the focus of our ongoing research endeavors.

Supplementary Materials

The open-code repository for the framework can be downloaded at: https://github.com/Lab-LAMbDA/Usti-nad-Labem-synthetic (accessed on 10 June 2022).

Author Contributions

Conceptualization, A.E.D., A.M.P. and O.P.; methodology, A.M.P., J.V. and V.M.; software, A.E.D., A.M.P., J.V. and V.M.; literature investigation, A.E.D. and A.M.P.; data curation, A.M.P., J.V. and V.M.; analysis, A.E.D.; writing—manuscript draft preparation, A.E.D.; writing—review and editing, A.E.D., A.M.P., J.V., M.K. and O.P.; visualization, A.E.D., A.M.P. and V.M.; supervision, A.E.D. and O.P.; project administration, A.E.D., A.M.P. and O.P.; funding acquisition, O.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the Czech national project: “Smart City–Smart Region–Smart Community” (CZ.02.1.01/0.0/0.0/17_048/0007435) financed by the Czech Operational Program “Research, Development and Education” in the Czech Ministry of Education, Youth and Sports, supported by EU funds. The linguistic revision of this article was prepared by Stephanie Krueger.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The open datasets used in this study are available at: https://github.com/Lab-LAMbDA/Usti-nad-Labem-synthetic/tree/main/data (accessed on 10 June 2022).

Acknowledgments

This study is supported by the Czech national project: “Smart City–Smart Region–Smart Community” (CZ.02.1.01/0.0/0.0/17_048/0007435) financed by the Czech Operational Program “Research, Development and Education” in the Czech Ministry of Education, Youth and Sports, supported by EU funds. The linguistic revision of this article was prepared by Stephanie Krueger.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kane, L.; Del Mistro, R. Changes in transport planning policy: Changes in transport planning methodology? Transportation 2003, 30, 113–131. [Google Scholar] [CrossRef]
Bhat, C.R.; Koppelman, F.S. Activity-Based Modeling of Travel Demand. In Handbook of Transportation Science; Hall, R.W., Ed.; International Series in Operations Research & Management Science; Springer: Boston, MA, USA, 1999; Volume 23. [Google Scholar] [CrossRef]
Delhoum, Y.; Belaroussi, R.; Dupin, F.; Zargayouna, M. Activity-based demand modeling for a future urban district. Sustainability 2020, 12, 5821. [Google Scholar] [CrossRef]
Malayath, M.; Verma, A. Activity based travel demand models as a tool for evaluating sustainable transportation policies. Res. Transp. Econ. 2013, 38, 45–66. [Google Scholar] [CrossRef]
Ortega, J.; Hamadneh, J.; Esztergár-Kiss, D.; Tóth, J. Simulation of the daily activity plans of travelers using the park-and-ride system and autonomous vehicles: Work and shopping trip purposes. Appl. Sci. 2020, 10, 2912. [Google Scholar] [CrossRef]
Dingil, A.E.; Esztergár-Kiss, D. The influence of the Covid-19 pandemic on mobility patterns: The first wave’s results. Transp. Lett. 2021, 13, 434–446. [Google Scholar] [CrossRef]
Padmakumar, A.; Patil, G.R. COVID-19 effects on urban driving, walking, and transit usage trends: Evidence from Indian metropolitan cities. Cities 2022, 126, 103697. [Google Scholar] [CrossRef] [PubMed]
Horni, A.; Nagel, K.; Axhausen, K.W. (Eds.) The Multi-Agent Transport Simulation MATSim; Ubiquity Press: London, UK, 2016. [Google Scholar] [CrossRef]
Hörl, S.; Balac, M. Synthetic population and travel demand for Paris and Île-de-France based on open and publicly available data. Transp. Res. Part C: Emerg. Technol. 2021, 130, 103291. [Google Scholar] [CrossRef]
Chow, J.Y.J.; Ozbay, K.; He, B.Y.; Zhou, J.; Ma, Z.; Lee, M.; Wang, D.; Sha, D. Multi-Agent Simulation-Based Virtual Test Bed Ecosystem: MATSim-NYC. C2SMART Project Report. 2020. Available online: https://rosap.ntl.bts.gov/view/dot/59184 (accessed on 10 May 2022).
Balac, M.; Hörl, S. Simulation of intermodal shared mobility in the San Francisco Bay Area using MATSim. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 3278–3283. [Google Scholar] [CrossRef]
Martins-Turner, K.; Grahle, A.; Nagel, K.; Göhlich, D. Electrification of Urban Freight Transport—A Case Study of the Food Retailing Industry. Procedia Comput. Sci. 2020, 170, 757–763. [Google Scholar] [CrossRef]
Kim, J.; Lee, S.; Lee, S. An evacuation route choice model based on multi-agent simulation in order to prepare Tsunami disasters. Transp. B: Transp. Dyn. 2017, 5, 385–401. [Google Scholar] [CrossRef]
Zhu, Y.; Xie, K.; Ozbay, K.; Yang, H. Hurricane Evacuation Modeling Using Behavior Models and Scenario-Driven Agent-based Simulations. Procedia Comput. Sci. 2018, 130, 836–843. [Google Scholar] [CrossRef]
Capodieci, N.; Cavicchioli, R.; Muzzini, F.; Montagna, L. Improving emergency response in the era of ADAS vehicles in the Smart City. ICT Express 2021, 7, 481–486. [Google Scholar] [CrossRef]
Hörl, S.; Balac, M. Introducing the eqasim pipeline: From raw data to agent-based transport simulation. Procedia Comput. Sci. 2021, 184, 712–719. [Google Scholar] [CrossRef]
Ziemke, D.; Kaddoura, I.; Nagel, K. The MATSim Open Berlin Scenario: A multimodal agent-based transport simulation scenario based on synthetic demand modeling and open data. Procedia Comput. Sci. 2019, 151, 870–877. [Google Scholar] [CrossRef]
Hörl, S.; Balac, M. Open synthetic travel demand for Paris and Île-de-France: Inputs and output data. Data Brief 2021, 39, 2021. [Google Scholar] [CrossRef]
Sallard, A.; Balać, M.; Hörl, S. An open data-driven approach for travel demand synthesis: An application to São Paulo. Reg. Stud. Reg. Sci. 2021, 8, 371–386. [Google Scholar] [CrossRef]
Ilahi, A.; Balac, M.; Li, A.; Axhausen, K.W. The first agent-based model of greater Jakarta integrated with a mode-choice model. Procedia Comput. Sci. 2019, 151, 272–278. [Google Scholar] [CrossRef]
Hörl, S.; Becker, F.; Axhausen, K.W. Simulation of price, customer behaviour and system impact for a cost-covering automated taxi system in Zurich. Transp. Res. C 2021, 123, 102974. [Google Scholar] [CrossRef]
Balac, M.; Hörl, S. Synthetic population for the state of California based on open-data: Examples of San Francisco Bay area and San Diego County. In Proceedings of the 100th Annual Meeting of the Transportation Research Board (TRB), Washington, DC, USA, 24–28 January 2021. [Google Scholar] [CrossRef]
SMART ITI. Smart City—Smart Region—Smart Community. 2018. Available online: https://smart-mateq.cz/en/smart-iti/ (accessed on 1 February 2022).
CSO. Czech Statistical Office: Population of Municipalities. 2022. Available online: https://www.czso.cz/csu/czso/population-of-municipalities-1-january-2016 (accessed on 1 January 2022).
ČÚZK. Czech Office for Surveying, Mapping and Cadastre. File of Administrative Boundaries and Cadastral Units Boundaries of the CR. 2022. Available online: https://geoportal.cuzk.cz/(S(1ruo3e2hjafc11qqu1qhhzav))/Default.aspx?lng=EN&mode=TextMeta&side=dsady_RUIAN&metadataID=CZ-CUZK-SH-V&mapid=5&menu=252 (accessed on 15 January 2022).
CSO. Czech Statistical Office: Population and Housing Census. 2011. Available online: https://www.czso.cz/csu/czso/population-and-housing-Census (accessed on 1 February 2022).
CSO. Czech Statistical Office: Birth Rate and Fertility 2011–2015. 2015. Available online: https://www.czso.cz/csu/czso/porodnost-a-plodnost-2011-2015 (accessed on 1 February 2022).
CSO. Czech Statistical Office: Life Tables & Methodology. 2015. Available online: https://www.czso.cz/csu/czso/life-tables-methodology (accessed on 1 February 2022).
CSO. Czech Statistical Office: Population Information. 2015. Available online: https://www.czso.cz/csu/czso/cenik-informacnich-sluzeb-a-produktu-bwut?skupina=13 (accessed on 1 February 2022).
Transport Research Centre (CDV). Czech Republic in Motion: The First Nationwide Survey of Traffic Behavior. 2017. Available online: https://www.ceskovpohybu.cz/ (accessed on 10 June 2022).
NMS Research Centre. Sustainable Urban Mobility Plan of the City of Ústí Nad Labem: B1—Social Transport Surveys; NMS Research Centre: Usti Nad Labem, Czech Republic, 2018. [Google Scholar]
Fiorello, D.; Zani, L. EU Survey on Issues Related to Transport and Mobility; Publications Office of the European Union: Luxembourg, 2015; JRC96151. [Google Scholar] [CrossRef]
CSO. Czech Statistical Office: Register of Census Districts and Buildings. 2020. Available online: https://www.czso.cz/csu/rso/-registr-scitacich-obvodu-a-budov (accessed on 1 February 2022).
MEYS. Ministry of Education, Youth, and Sports in Czech Republic. Register of Schools and Educational Facilities. 2022. Available online: https://rejstriky.msmt.cz/rejskol/ (accessed on 15 March 2022).
Eqasim. Synthetic Population Pipeline Code for Eqasim. 2021. Available online: https://github.com/eqasim-org/synpp/ (accessed on 15 January 2022).
Tqdm. A Fast, Extensible Progress Bar for Python and CLI. 2021. Available online: https://github.com/tqdm/tqdm (accessed on 15 January 2022).
Conti, P.L.; Marella, D.; Scanu, M. Statistical Matching Analysis for Complex Survey Data with Applications. J. Am. Stat. Assoc. 2016, 111, 1715–1725. [Google Scholar] [CrossRef]
OSM. OpenStreetMap. 2020. Available online: https://wiki.openstreetmap.org/wiki/Tags#Keys_and_values (accessed on 15 March 2022).
ČÚZK. The Czech Office for Surveying, Mapping and Cadastre (ČÚZK), Fundamental Base of Geographic Data of the Czech Republic (ZABAGED). 2022. Available online: https://geoportal.cuzk.cz/(S(jn3buos2irlnbjs5cvw53b5x))/Default.aspx?lng=EN&mode=TextMeta&side=zabaged&metadataID=CZ-CUZK-ZABAGED-VP&mapid=8&head_tab=sekce-02-gp&menu=241 (accessed on 1 February 2022).
Šindlerová, V.; Bartoš, L.; Mužík, J.; Martolos, J.; Kreml, J.; Wichsová, M. Metody Prognózy Intenzit Generované Dopravy; EDIP s.r.o.: Prague, Czech Republic, 2013; ISBN 978-80-87394-08-3. [Google Scholar]
Hörl, S.; Axhausen, K.W. Relaxation–discretization algorithm for spatially constrained secondary location assignment. Transp. A Transp. Sci. 2021. [Google Scholar] [CrossRef]
City-Vitality-Sustainability (CIVITAS). SUTP Development in Ústí nad Labem, Report. 2010. Available online: https://www.usti-nad-labem.cz/files/civitas/R39.1-Study-of-Public-Transport-Users-in-UL.pdf (accessed on 10 June 2022).
Ng, W.-S.; Acker, A. Understanding Urban Travel Behaviour by Gender for Efficient and Equitable Transport Policies. International Transport Forum Discussion Paper. OECD. 2018. Available online: https://www.itf-oecd.org/sites/default/files/docs/urban-travel-behaviour-gender.pdf (accessed on 15 September 2022).
Ben-Dor, G.; Ben-Elia, E.; Benenson, I. Population downscaling in multi-agent transportation simulations: A review and case study. Simul. Model. Pract. Theory 2021, 108, 102233. [Google Scholar] [CrossRef]

Figure 1. The city of Ústí and Labem and the other municipalities (left), and the location of study area in the territory of the Czech Republic (right).

Figure 2. E–R diagram of the data model used in the modified Eqasim framework.

Figure 3. Data and process flowchart of the synthesis pipeline based on the modified Eqasim framework.

Figure 4. Population of municipalities in the Ústí nad Labem district in 2011 (top image) and percentage of population changes between 2011 and 2016 (bottom image).

Figure 5. Geoposition of all locations.

Figure 6. The number residents in houses (left figure) and the capacity of work and educational facilities (middle and right figures, respectively).

Figure 7. The visitors of the facilities (free-time, shopping, and errands).

Figure 8. Assignment process for primary activity locations.

Figure 9. Origin–destination (OD) pairs of the primary activity trips (from home to work location (left) and from home to education location (right)).

Figure 10. Origin–destination (OD) pairs of the secondary activity trips (from home, from work location, and from educational location).

Figure 11. Origin–destination (OD) pairs of the primary activity trips (left) and the secondary activity trips (right) based on main transport mode (car—blue bundles and public transport—yellow bundles).

Figure 12. Origin–destination (OD) pairs of the primary activity trips based on main transport mode (car—blue bundles and public transport—orange bundles), gender (top images: male—left and female—right), and age groups (bottom images: 18–24, 25–39, ≥40).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pereira, A.M.; Dingil, A.E.; Přibyl, O.; Myška, V.; Vorel, J.; Kříž, M. An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem. Appl. Sci. 2022, 12, 10032. https://doi.org/10.3390/app121910032

AMA Style

Pereira AM, Dingil AE, Přibyl O, Myška V, Vorel J, Kříž M. An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem. Applied Sciences. 2022; 12(19):10032. https://doi.org/10.3390/app121910032

Chicago/Turabian Style

Pereira, André Maia, Ali Enes Dingil, Ondřej Přibyl, Vojtěch Myška, Jakub Vorel, and Milan Kříž. 2022. "An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem" Applied Sciences 12, no. 19: 10032. https://doi.org/10.3390/app121910032

APA Style

Pereira, A. M., Dingil, A. E., Přibyl, O., Myška, V., Vorel, J., & Kříž, M. (2022). An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem. Applied Sciences, 12(19), 10032. https://doi.org/10.3390/app121910032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Advanced Travel Demand Synthesis Process for Creating a MATSim Activity Model: The Case of Ústí nad Labem

Abstract

1. Introduction

1.1. Background

1.2. Practice

1.3. Reasoning and Contributions

2. Methodology

2.1. Study Area and Data

2.2. Synthesis Framework

3. Results

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI