Previous Article in Journal
Study on Safety Mining Technology of Gob in Stopping Face by Replacing Pressure Equalization with Gob Pumping—A Case Study of Sitai Mine
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Automatic Definition of Traffic Analysis Zones Based on Big Data

Department of Transport, Széchenyi István University, Egyetem tér 1., 9026 Győr, Hungary
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2024, 14(13), 5964; https://doi.org/10.3390/app14135964
Submission received: 5 June 2024 / Revised: 24 June 2024 / Accepted: 4 July 2024 / Published: 8 July 2024

Abstract

:
The planning process for any transport system can be considered complete if it is accompanied by a modeling system to evaluate the intervention. The study area should always be divided into traffic zones. Correct zoning is the key to any transport system study. The basic principles of zone creation require a thorough understanding of the area and local traffic conditions. However, this is not always a given, especially if a universally applicable assessment system is to be developed. This has led to the need to develop an algorithm that is able to provide an estimate for the definition of traffic zones based on some automatically observable or measurable phenomena or sequence of events. The aim of this research is to identify the observable events that are suitable for characterizing the area, so that an automatic zone definition procedure can be developed based on these. In this paper, automatic WAZE-generated congestion data were processed in a selected district of Budapest. During the processing, the area was divided into a grid network and time series were developed that show the traffic flow on the grid network as a function of the congestion level. The area subdivisions were then clustered using spectral clustering to create spatially distinct districts with identical traffic behavior.

1. Introduction

The foundation of traffic modeling lies in the traffic analysis zone (TAZ), which simplifies the mathematical calculations involved in the transport model [1]. In other words: “…a traffic analysis zone or transportation analysis zone (TAZ) is the unit of geography most commonly used in conventional transportation planning models…” [2], in other words, TAZs are areas with homogeneous traffic behaviors, which serve as the foundation for any transportation models.
A TAZ is a single point in the model that represents a specific area, such as a residential or industrial zone, or a significant entity such as a hospital or shopping center.
A TAZ has two parts:
  • Area with boundaries;
  • Centroid.
The “area with boundaries” denotes the specific physical area depicted on the map. Its main purpose is to visually illustrate the designated TAZ, and it can also assist in collecting initial data such as the size of TAZs, which may be useful in estimating the population size, for instance.
The “centroid”, which serves as the center of gravity within a TAZ, acts as the sole point that symbolizes the TAZ in subsequent mathematical computations. It is crucial to accurately delineate the boundaries of the TAZ in order to determine the precise location of the “centroid”, which is determined by the area enclosed by these boundaries.
The definition of TAZs is usually a long and hard process that requires strong local knowledge on land use and the transport network. As was stated, the TAZ system is the core of a transport model, so without the precise definition of the TAZs’ boundaries and centroids it is not possible to start the model building. This paradox (high importance vs. long work with local knowledge) needs to be solved by an automatic solution that is fast, reliable, and works without local knowledge.
In this paper, we will briefly introduce the theory of TAZ definition and earlier methods on automatic TAZ definitions for road traffic or public transport. Afterward, a novel method will be described. Finally, the usage of the novel method will be proved on a Hungarian use case.

2. TAZ Definition

As written before, well-defined TAZs are essential to traffic modeling. Modeling real transport networks with travel demand would be a huge mathematical task; therefore, the task has to be simplified. The tool for the simplification is the system of TAZs. A TAZ represents an area, the size of which depends on the study area. The principles of the definition are well described in the book “Közlekedéstervezés” (Transport planning) [1]; here we give just a short summary to build a solid foundation of the understanding. To observe the whole process from the viewpoint of the mathematical calculations, we have to state that a TAZ is just one point representing its surroundings in all of the equation. In this sense, the first task at the TAZ definition is to define the number of necessary TAZs, and then, the shape of them.

2.1. Number of TAZs

The classic method said 50 TAZs for an area with 100,000 inhabitants, 100 TAZs for 1 million inhabitants. This approach has already been overwritten by modern computers as we have enough resources for the calculations and data collection. Today the number of TAZs is much higher, in a city with 100,000 inhabitants the average number of TAZ is around 100, or even higher. In the Unified Transport Model of Budapest, there are almost 1500 TAZs. The number of TAZs are in good correlation with the desired level of detail of the model; regarding the level of detail, there is no direct rule. Here, we follow two guidelines:
  • Cost/benefit;
  • Level of error.
Based on the first, cost and benefit, we can draw the following connection; see Figure 1.
On the other hand, there is the risk of error. In a modeling task, we could have two types of errors: model error and data error. Model error means that the simplification of the model imparts some error on the modeled process, while data error means the collected data are maybe not precise enough. These two also lead to an optimum (Figure 2).
Based on both approaches, we can state that the desired number of TAZs is somehow a golden middle.

2.2. Shape of the TAZs

The other question is the shape of a TAZ. The definition of a TAZ inherits areas where the transport behavior of the population (inhabitants, workers, etc.) is homogeneous, as described in Prileszky’s book [1].

3. Earlier Solution to Automatic TAZ Definition

The definition of TAZs needs deep local knowledge, although the need for automatic TAZ definition was already realised in the last century. Due to the lack of computer resources, the methods were not successful enough, so the number of published solutions is rather limited.
There are some examples for public transport, but for car traffic and general multimodal transport TAZ definitions are very rare.
Nagy [3,4] suggested a comprehensive method to produce TAZs for public transport, based on the daily timeline of the boarding data set. The method is able to produce TAZs fully automatically; the resulting TAZ shapes were almost identical to the shapes of the TAZs defined manually.
In addition, a limited range of methods have been proposed for defining traffic analysis zones (TAZs). Martínez [5] introduced a new algorithm that uses a smoothed density surface of travel demand data to minimize information loss and improve the precision of the estimates of the origin–destination (OD) matrix. Yang [6] combined a traffic volume distribution model with traffic simulation, using Voronoi diagrams for the TAZ partition, to accurately predict macro and micro traffic flow. Li [7] focused on pedestrian TAZs, proposing a division method based on pedestrian-attracting/producing points and a method to estimate the intensity of pedestrian travel. Gang [8] also used Voronoi diagrams for the TAZ partition and applied a multiplicative competitive interaction model to estimate the new OD matrix, demonstrating the validity and precision of the method through a numerical test.
However, some researchers started their activities from data sets provided by the GIS system. One of the early solutions comes from Ding [9]. Ding suggested in his work, in 1998, a method to define TAZs based on a GIS data set.
Similarly to the previous sork, You [10] built a method to generate TAZs from Geographic Information System (GIS) for Transportation (GIS-T) data.
The paper of Xue [11] presents a method for dividing urban areas into small zones for connectivity and accessibility analysis using the local depth value in space syntax theory, which was found to align with people’s cognition of regions when applied in Shanghai. The study solved the problem of dividing urban areas into small zones for the analysis of connectivity and accessibility using the topological structure of the road network. The local depth value in space syntax theory was identified as suitable for assisting in zone division, and the zones divided by this value aligned well with people’s cognition of regions.
The study of Chen [12] focused on the automated division of traffic zones in urban areas using spatial statistical analysis, with an application to Shanghai’s transport network. The study proposed a method of spatial statistical analysis for the automated division of traffic zones, developed a model of spatial cluster analysis algorithm, and successfully achieved the automated division of traffic zones in Shanghai.
The paper by Dong [13] proposed the use of the semantic concept of traffic to extract the OD information of commuters from the CDR data from the mobile phone CDR. Based on the use of the extracted data, a traffic zone division was proposed. A K-means clustering algorithm was used to classify cell areas and tag them with four relevant attributes.
Another K-means method was suggested by Zheng et al. [14]. Their aim was to provide a method for traffic zone division, the results of which are clear, accurate, and reduce the cost of division. The paper proposed a novel grid-based K-Means cluster method for traffic zone division by using GPS data. The classification method is effective and gives a good reference value for the analysis of the flow and trends of city traffic.
Another approach was presented by Lv [15] and Liao [16]. In both cases, the authors developed a method to predict a traffic flow directly from the location of the important points, like shops. In this method, the TAZ definition is only secondary. However, these methods and most of the previous ones, focus on public transport as a proof of our primary statement that there is a lack of TAZ definition methods based on road and multimodal transport.
Based on this, the method developed by us is a real novelty, because it works just with automatically collected data and does not need any manual operation from the planner.

4. Methodology

The proposed algorithm for automated traffic zone determination uses data from the WAZE platform [17], a popular crowd-sourced navigation application. WAZE provides real-time information on traffic congestion, accidents, road closures, and other incidents. The algorithm is based on automatically generated congestion values only. The research is based on the WAZE platform, as this was the data source used by the project that initiated the research. It is possible to use any other data source that can provide data of a similar quality and quantity. From this perspective, the research is intended to demonstrate the feasibility of the theory.
The WAZE-generated data set contains the data generated and collected for a one-year period. The data is in a .json file and covers the whole year 2019. Territorially, it covers the 11th district of Budapest, Hungary. During the data filtering process, entries from regions and time periods outside the target area were also included and had to be excluded for subsequent analysis.
The data set consists of two parts. On the one hand, it contains information reported by the users, on the other hand, it contains data automatically generated by WAZE. The analysis utilized the latter, which means that only the data automatically generated by WAZE were taken into account. The data set is organized as follows (Table 1).
Using these data, the algorithm aims to identify groups of areas of similar behavior and to delineate traffic zones. The program consists of three distinct parts, the first sub-program is responsible for data processing, the second sub-program contains the clustering procedures, while the third sub-program helps in visualizing the result. The subprograms are divided into different modules which perform logically separable tasks within the subprograms. The key steps of the methodology are the following.
Pre-processing: The raw data obtained from WAZE are preprocessed to remove outliers, filter irrelevant information, and aggregate incidents into meaningful clusters of congestion. This preprocessing step helps clean the data and ensures that only relevant information is used for zone determination.
Temporal and spatial analysis: From the preprocessed data, the algorithm creates time series of traffic volumes, based on the congestion data. The data are then subjected to spatial analysis techniques to identify clusters of time series based on their coordinates.
Zone delineation: Based on the results of the spatial analysis, the algorithm delineates the traffic zones by grouping spatially contiguous clusters. The size and shape of these zones are determined based on the area of similar time series next to each other, with the goal of creating homogeneous zones that capture similar traffic patterns.
The program is written in Python 3.9 [18].

5. Process Flow

5.1. Sorting to Coordinates

In the first step, the application loads the following libraries and modules: matplotlib, numpy, scipy, and sklearn. The first module starts by reading the data. The .json file is loaded, from which the “jams” part is extracted, and then, records the individual congestion signals as separate lines. The data are then generated in a way that Python can process them properly, and corrections are made if necessary. The module then organizes the data into coordinates. Each congestion is associated with a “revisions” category, which contains the precise location of the congestion (based on GPS coordinates). The location essentially represents a segment, with its coordinates sequentially linked to form the congestion. Congestion can persist for a longer period, indicated by the “created at” timestamp. Based on this, multiple points at different times are associated with a congestion identifier.

5.2. Filtering the Days

In the subsequent part of the algorithm, it determines which days are needed. Tuesday, Wednesday, and Thursday have been designated as equivalent types of days. The separation of days is necessary because the individuals appearing here may exhibit different behavioral patterns on weekdays and weekends. The dominant commuting patterns during weekdays would distort the results drawn from the data set for weekdays. When calculating the days, consideration was given to when a particular day begins in terms of traffic, as traffic immediately after midnight typically still belongs to the previous day for a few hours. To decide this, the hour with the lowest traffic was identified and considered the end of a given day. Thus, instead of midnight, a new day begins at 3 am. During the measurements, timestamps were recorded on the basis of dates, allowing the algorithm to determine the specific days using a function.

5.3. Creating the Sectors

The algorithm operates with sectors, each representing a 30 by 30 m area, which can be adjusted by modifying the values. In the case of raster division, the detection times of coordinates within a specific raster determine the behavioral characteristics of the particular area unit. The first step is to establish the sectors, which proceeds as follows. The algorithm identifies the boundaries of the area, both north–south and east–west, and then, divides the enclosed area according to the desired area size. This results in a scale where the individual x- and y-values can be grouped. Each sector is assigned a unique identifier for future reference. The created sectors are assigned coordinates on the basis of their location. Each coordinate is classified into x- and y-intervals, and based on these, the corresponding sector can be determined. As a result, each record contains the following: the number of the respective sector, the hour of measurement, the level of congestion strength.

5.4. Creating the Time-Series

The module responsible for generating the time series iterates through the filtered data, taking into account the values of the individual sectors. The sector can be conceptualized as a matrix, where each cell element is assigned a time series (except, of course, if no measurement occurred in a given sector). In addition, the number of measurement points is recorded. The formation of the time series proceeds as follows. If a particular sector contains GPS coordinates where a “jam” value has been recorded, the value calculated based on the congestion strength is added to the appropriate hour value of the time series. The procedure for calculating the curve value for a given hour is as follows:
curve value = 1 1 2 jam level
where:
  • jam level: the automatically recorded jam level in a scale from 1 to 5.
The algorithm assigns appropriate congestion level values to each hour, which are then added to the corresponding hour of the time series. At the onset of congestion, a lower level of congestion results in a higher value being added, and as congestion intensifies, higher levels of congestion are represented by lower values. This attempts to account for the relationship in which each additional vehicle exponentially increases the strength of congestion. Finally, to ensure comparability, the module standardizes individual results on a unified scale (0–1000). This enables the distance measurement module to gauge the behavioral “distance” value between these time series.

5.5. Filtering the Records by the Number of Measurements

The following module filters out sectors where the number of measurements is less than 20 due to data reliability concerns. This means that only one or two measurements occurred throughout the year for the given types of day in those sectors, making it impossible to reliably establish a time series. Of course, this value is freely adjustable and does not affect the further operation of the program.

5.6. Euclidean Distance Measurement

In this module, the distance measurement between individual time series is performed. The distance does not denote physical distance, but only examines the behavioral differences between the respective sectors, evaluating how similar the two time series are to each other. The calculation method of the distance measurement algorithm is based on the following formula [19]:
d E U K = i = 0 23 ( a i b i ) 2
where:
  • a: first time series;
  • b: second time series;
  • i: hours of the day [0..23].
The algorithm traverses the time series of individual sectors, and then, compares each curve pairwise. During the comparison, it juxtaposes the corresponding hour values and measures their distance, summing them up throughout the program. As a result of sector comparisons, a distance matrix is obtained which serves as the basis for clustering in subsequent steps.

5.7. Clustering

The clustering module reads the results of the previously calculated Euclidean distance measurements and computes which sectors should belong to a cluster based on this information. During clustering, there is no need for pretrained curves, so the curves are assigned to the appropriate groups through a chosen clustering procedure “on their own”. The choice among clustering procedures was made by the spectral clustering method [20] due to its speed, efficiency, and accuracy. During the clustering process, it can be configured to use a precomputed distance matrix. When the module starts, the program prompts the user to specify the number of clusters into which they wish to divide the distance matrix of the time series. As a result, the program assigns a cluster value to each sector.

5.8. Environmental Assessment Algorithm

The task of the environmental assessment algorithm is to expand the representation of sectors from point-like (30 by 30 m) to zones, while also cleaning up measurements that deviate from their surroundings in an island-like manner. The operational principle of the algorithm is as follows. First, it identifies all sectors that do not have cluster values. In Figure 3, these are denoted by the “x” symbols.
The algorithm examines the 300 m radius environment of the given sector, indicated by the colored area. In this case, the blue areas do not have cluster numbers, meaning that there were no measurements in the respective sector. The other colors represent the behavioral properties associated with the cluster ID. The algorithm then assesses all such sectors found in the environment of the given point, calculates their distance from the point, and adds the reciprocal of the distance to the respective cluster group. Each cluster will have a value, and the cluster with the highest value will be assigned to the examined point. In Figure 3, it can be seen that in the environment of the examined point, the highest number of sectors belongs to cluster 2. However, due to its proximity, sector 3 has the strongest influence on the examined point, so the examined “x” point will belong to cluster 3.
Next, the algorithm examines the island-like characteristics of individual points. It traverses through each point and, similar to the description above, examines the surroundings of each point. The difference is that it now examines areas with cluster IDs instead of points without clusters. The result of the environmental assessment function is illustrated in Figure 4. The expansion of individual point-like clusters occurs during the module’s operation, resulting in contiguous areas with different behavioral properties.

5.9. Creation of the Traffic Assignment Zones

The next step of the program involves the creation of zones. Zones are formed based on the principles outlined in the previous module, where the previously established separate areas are organized into zones. Essentially, the zones correspond to the areas formed in the previous step; however, this step is necessary to ensure that the program can treat each separate area independently. The result is illustrated in Figure 5. As visible in the figure, the previously identified cluster two (red) splits into two separate regions due to physical separation. Consequently, the program does not generate the predetermined number of clusters but also considers the spatial distribution. If these regions do not overlap, the program displays them as separate zones.

6. Results

The display interface (Figure 6) divides the screen into four distinct sections:
  • Map, districts, and measuring points;
  • Display of the selected measuring point time series;
  • Graph showing the time series of the selected zone(s);
  • Various information for the selected zone.
The proposed algorithm demonstrates promising results in automating the process of determining the traffic zone using real-time data from the WAZE platform. Using crowd-sourced information on traffic conditions, the algorithm effectively identifies congestion groups and efficiently delineates traffic zones.
One of the key advantages of the proposed algorithm is its ability to adapt to dynamic changes in traffic patterns and incidents. Unlike traditional methods that rely on static traffic data or manual surveys, the algorithm can update traffic zone boundaries based on fresh information, without city knowledge.
Furthermore, the algorithm offers scalability and applicability to different urban environments, thanks to its data-driven approach and flexibility in parameter settings. By adjusting parameters such as cluster density thresholds and zone merging criteria, the algorithm can be tailored to specific cities or regions with varying traffic characteristics and infrastructure layouts.
However, several limitations and challenges should be considered when interpreting the results of the algorithm.
The primary limitation of this type of data processing is the quality and quantity of the raw data, which can only be guaranteed if the observed traffic is sufficiently high and the ratio of system users (e.g., WAZE) is adequate. Consequently, the process is primarily applicable in large urban areas.
Data quality: The accuracy and reliability of the algorithm depend on the quality of the data obtained from the WAZE platform. Although WAZE provides valuable information on traffic conditions, the data can be subject to biases and inaccuracies, particularly in areas with low user coverage or sparse reporting.
Computational complexity: The spatial analysis techniques used in the algorithm may involve computationally intensive operations, especially when processing large volumes of traffic data. Efficient implementation and optimization strategies are essential to ensure scalability and performance.
Validation and ground truth: Validating the results of the algorithm requires ground truth data on traffic conditions, which may be challenging to obtain, especially for historical comparisons. Although historical traffic data and manual surveys can provide some validation, they may not fully capture the dynamic nature of traffic patterns.
User privacy and data security: The use of crowd-sourced data from platforms such as WAZE raises concerns about user privacy and data security. Ensuring compliance with privacy regulations and implementing robust data anonymization techniques are essential to address these concerns.
Despite these challenges, the proposed algorithm represents a significant advancement in the field of transport modeling. By harnessing the power of real-time data and advanced analytics, the algorithm has the potential to revolutionize the way traffic zones are determined and managed, leading to improved traffic models and, through that, enhanced overall mobility in urban areas.

7. Discussion and Conclusions

Based on the findings of our study in district XI of Budapest, we can confidently state that automatically gathered traffic flow information, such as the data obtained from WAZE in our study, can be utilized as an input to generate TAZs for transportation modeling needs, without the need for specific local expertise.
Hence, we have indicated that it is feasible to generate a zonal system for macroscopic transport models without manual intervention in the computation procedure. This automated process can create a zonal system using the transportation network (road network) and traffic flow information.
Although the algorithm demonstrates promising results, more research is needed to address the challenges related to data quality, computational complexity, and validation. With continued advancements in data analytics and transportation technology, automated traffic management systems such as the one proposed here have the potential to evolve traffic models.

Author Contributions

Conceptualization, V.N. and B.H.; methodology, V.N.; software, V.N.; writing—original draft preparation, V.N. and B.H.; writing—review and editing, B.H.; visualization, V.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received external funding from the project “Digitálisan összekapcsolt adatforrásokra alapozott, dinamikus, hangolt és adaptív városi forgalomirányítási rendszer szolgáltatások és beavatkozási, értékelési közlekedéspolitikai eszköztár kifejlesztése”, financed by the NKFIH (National Research, Development and Innovation Authority, Hungary).

Data Availability Statement

Data is contained within the article.

Acknowledgments

Authors thank the kind help of FŐMTERV Inc. and especially Zsolt Berki by providing the necessary data set.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
TAZTraffic analysis zone

References

  1. Koren, C.; Prileszky, I.; Horváth, B.; Tóth-Szabó, Z. Közlekedéstervezés; Universitas-Győr Nonprofit Kft.: Győr, Hungary, 2007; p. 196. [Google Scholar]
  2. Miller, H.; Shaw, S.L. Geographic Information Systems for Transportation (GIS-T): Principles and Applications; Oxford University Press: Oxford, OH, USA, 2001; p. 248. [Google Scholar]
  3. Nagy, V.; Horváth, B.; Horváth, R. Land-use zone estimation in public transport planning with data mining. Transp. Res. Procedia 2017, 27, 1050–1057. [Google Scholar] [CrossRef]
  4. Nagy, V.; Horváth, B. Hidden content of passenger data in public transport. Procedia Comput. Sci. 2017, 109, 506–512. [Google Scholar] [CrossRef]
  5. Martínez, L.M.; Viegas, J.M.; Silva, E.A. A traffic analysis zone definition: A new methodology and algorithm. Transportation 2009, 36, 581–599. [Google Scholar] [CrossRef]
  6. Zhen Yang, Z.; Wang, L.; Chen, G. Method of Traffic Analysis Zone Partition for Traffic Impact Assessment. J. Highw. Transp. Res. Dev. (Engl. Ed.) 2007, 2, 80–83. [Google Scholar] [CrossRef]
  7. Li, Y.; Zhang, Y. Research on Method of Pedestrian Traffic Analysis Zone Division and Traffic Intensity Estimation. In Proceedings of the CICTP 2015, Beijing, China, 24–27 July 2015; pp. 3440–3448. [Google Scholar] [CrossRef]
  8. Gang, C. Method of Traffic Analysis Zone Partition for Traffic Impact Evaluation. J. Highw. Transp. Res. Dev. 2007, 24, 102–106. [Google Scholar]
  9. Ding, C. The GIS-Based Human-Interactive TAZ Design Algorithm: Examining the Impacts of Data Aggregation on Transportation-Planning Analysis. Environ. Plan. B Plan. Des. 1998, 25, 601–616. [Google Scholar] [CrossRef]
  10. You, J.; Nedović-Budić, Z.; Kim, T.J. A GIS-based traffic analysis zone design: Technique. Transp. Plan. Technol. 1998, 21, 45–68. [Google Scholar] [CrossRef]
  11. Xue, Y.; Duan, Z. An accessibility oriented traffic analysis zone division method. In Proceedings of the 2010 2nd International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010), Wuhan, China, 6–7 March 2010; Volume 1, pp. 516–519. [Google Scholar] [CrossRef]
  12. Li, X.D.; Yang, X.G.; Chen, H.J. Study on traffic zone division based on spatial clustering analysis. Comput. Eng. Appl. 2009, 45, 19–22. [Google Scholar]
  13. Dong, H.; Wu, M.; Ding, X.; Chu, L.; Jia, L.; Qin, Y.; Zhou, X. Traffic zone division based on big data from mobile phone base stations. Transp. Res. Part C Emerg. Technol. 2015, 58, 278–291. [Google Scholar] [CrossRef]
  14. Zheng, Y.; Zhao, G.; Liu, J. A Novel Grid Based K-Means Cluster Method for Traffic Zone Division. In Proceedings of the Cloud Computing and Big Data, Shanghai, China, 4–6 November 2015; Qiang, W., Zheng, X., Hsu, C.H., Eds.; Springer: Cham, Switzerland, 2015; pp. 165–178. [Google Scholar]
  15. Lv, W.; Lv, Y.; Ouyang, Q.; Ren, Y. A Bus Passenger Flow Prediction Model Fused with Point-of-Interest Data Based on Extreme Gradient Boosting. Appl. Sci. 2022, 12, 940. [Google Scholar] [CrossRef]
  16. Liao, C.; Dai, T.; Zhao, P.; Ding, T. Weighted Centrality and Retail Store Locations in Beijing, China: A Temporal Perspective from Dynamic Public Transport Flow Networks. Appl. Sci. 2021, 11, 9069. [Google Scholar] [CrossRef]
  17. Galeso, M. Waze: An Easy Guide to the Best Features, 1st ed.; CreateSpace Independent Publishing Platform: North Charleston, SC, USA, 2016. [Google Scholar]
  18. Van Rossum, G.; Drake, F.L., Jr. Python Reference Manual; Network Theory Ltd.: Surrey, UK, 2011. [Google Scholar]
  19. Tan, P.N.; Steinbach, M.; Kumar, V. Introduction to Data Mining; Pearson Education Limited: London, UK, 2014. [Google Scholar]
  20. Chen, M.S. Literature Review on Spectral Clustering; University of California: Los Angeles, CA, USA, 2013. [Google Scholar]
Figure 1. Cost to benefit ratio as a function of the level of detail.
Figure 1. Cost to benefit ratio as a function of the level of detail.
Applsci 14 05964 g001
Figure 2. Model and data error as a function of the level of detail.
Figure 2. Model and data error as a function of the level of detail.
Applsci 14 05964 g002
Figure 3. Theoretical operation of the environmental assessment algorithm.
Figure 3. Theoretical operation of the environmental assessment algorithm.
Applsci 14 05964 g003
Figure 4. Results of the environmental assessment algorithm.
Figure 4. Results of the environmental assessment algorithm.
Applsci 14 05964 g004
Figure 5. Creation of the TAZ.
Figure 5. Creation of the TAZ.
Applsci 14 05964 g005
Figure 6. Visualization in Python—screenshot from the prepared application.
Figure 6. Visualization in Python—screenshot from the prepared application.
Applsci 14 05964 g006
Table 1. Automatically generated data types.
Table 1. Automatically generated data types.
IDNameMeaning
01uuidCongestion identifier
02countryName of the country
03typeCongestion
04turn_typeType of turn (right, left, straight, none)
05publish_dateCongestion start date
06last_seenLast detection of congestion
07cityName of the city
08streetName of the road
09start_nodeStart of congestion (nearest road/node/-)
10end_nodeEnd of congestion (nearest road/node/-)
11road_typeType of the road
12alert_uuidAlert identifier (if any)
13revisions
13.1levelCongestion severity (0—free flow, 5—completely stopped traffic)
13.2lengthCongestion length [m]
13.3speedAverage speed [m/s]
13.4delayDelay due to congestion (compared to free flow) [s]
13.5lineGPS coordinates of the congestion section
13.6created_atTimestamp
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nagy, V.; Horváth, B. Automatic Definition of Traffic Analysis Zones Based on Big Data. Appl. Sci. 2024, 14, 5964. https://doi.org/10.3390/app14135964

AMA Style

Nagy V, Horváth B. Automatic Definition of Traffic Analysis Zones Based on Big Data. Applied Sciences. 2024; 14(13):5964. https://doi.org/10.3390/app14135964

Chicago/Turabian Style

Nagy, Viktor, and Balázs Horváth. 2024. "Automatic Definition of Traffic Analysis Zones Based on Big Data" Applied Sciences 14, no. 13: 5964. https://doi.org/10.3390/app14135964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop