1. Introduction
Road traffic accidents cause the death of approximately 1.19 million people each year, according to the World Health Organization (WHO) [
1]. With the rapid development of urban and commercial sectors, transportation infrastructures have also increased, leading to an upsurge in road traffic accidents. Several geospatial studies showed road traffic accidents to be caused by various factors, such as weather conditions that cause slippery roads [
2], the absence of proper road authorities and the management of necessary safety procedures [
3,
4], and low investment in safety precautions [
5]. Additionally, human elements, such as the reckless driving behavior of young drivers and pedestrian negligence, were identified as significant contributors to road traffic accidents [
6]. Knowing the causes of road traffic accidents can assist in devising preventive measures such as behavior modification and post-injury management [
7]. Moreover, safety-deficient regions must be identified to propose precautionary measures in these deprived areas [
8].
Geographic Information Systems (GISs) have become a popular tool for observing and analyzing worldwide road traffic accidents over the past few decades [
5,
9,
10]. Having the ability to store, analyze, and manage vast amounts of data, GISs are used to visually explore the relationships between accidents and contributing factors from a spatial and non-spatial perspective [
11]. GISs employ various techniques for road traffic data, such as hotspot analysis for identifying high-risk locations, space–time cube analysis to visualize spatio-temporal trends in accident data, the Kernel Density Estimation (KDE) method for spatial density analysis, and various statistical and network analyses. Moreover, geospatial techniques, such as heat maps of events, time-series analysis, buffer analysis, and seasonal mapping, may also provide essential ways to determine road traffic accident causes and establish proper precautions.
The United States of America (USA) is one of the most developed countries. Everyday, mobility makes road safety management necessary. According to recent data from the National Highway Traffic Safety Administration (NHTSA), traffic accidents caused approximately 42,795 people to die in 2022 [
12]. Many researchers, such as those in [
13,
14,
15,
16,
17], have made an effort to understand and reduce road traffic accidents; however, gaps remain in the comprehensive analysis of such accidents, particularly in areas like California. This state experienced an increase in fatalities of about 7.6% from 2020 to 2021, leading to a Mileage Death Rate of 1.38 [
18].
While significant research has explored road traffic accidents [
13,
14,
15,
16,
17], this study integrates advanced GIS techniques (space–time cube analysis) alongside non-parametric statistical and spatial techniques (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Kernel Density Estimation (KDE), the Getis-Ord Gi* method) to conduct a spatio-temporal analysis of traffic accidents across the four major Californian cities: Los Angeles, Sacramento, San Diego, and San Jose. Our objectives were as follows: (1) To observe and analyze the spatio-temporal dynamics of road traffic accidents in these developed cities, capturing patterns across different seasons and years; (2) to study the diverse impacts of the traffic density, weather conditions, and population density on accident frequency and severity; and (3) to provide detailed spatio-temporal analyses of accident hotspots and locations. Integrating advanced GIS techniques and statistical analysis in this study provides an understanding of traffic accident dynamics. Also, it offers valuable spatial insights crucial for effective urban safety planning and management.
The remaining parts of the paper are organized as follows:
Section 2 of this study explains the literature review of past research on road accidents.
Section 3 explains the study area and dataset alongside the methodology used in this research. Next,
Section 4 presents the results and analysis of the GIS techniques on road traffic accidents.
Section 5 includes a discussion about the study’s findings and importance.
Section 6 summarizes the conclusion and proposes a few policy recommendations to reduce the likelihood of future traffic accidents.
2. Related Work
The assessment of traffic accidents through spatial analysis provides precise information to analysts regarding unsafe locations in a region, which has been a heated topic for many years. Researchers apply GIS-aided spatial analysis using spatial data to identify safety-defective locations and take actions to prevent accidents. The first subsection reviews studies on geospatial methods, such as hotspot detection and spatial clustering, to understand the geographical distribution of accidents. The second subsection focuses on studies that used advanced GIS-based techniques that integrate both spatial and temporal dimensions to analyze road traffic accidents. These sections provide a comprehensive overview of recent advancements and diverse methodologies in GIS applications for analyzing road traffic accidents.
2.1. Road Traffic Accidents and Geospatial Techniques
Using geospatial methods and techniques to analyze road traffic accidents enable us to explore spatial patterns and hotspots. This subsection reviews recent studies implementing various GIS methods, such as hotspot detection and spatial clustering, which offer insights into the geographical distribution of traffic road accidents. A study by Wang et al. (2021) in Harbin, China, showcased the accident density distribution to be more spatially scattered during the summer and winter seasons while not considering the road network density [
19]. Different density analysis methods and comap technology were combined to help divide accident severities into three levels, enabling a better understanding of risk management. While the study gave valuable insight into risk zones and accident severities, data limitations and unaccounted factors, such as road characteristics and infrastructure, are still to be considered. Furthermore, a study of the Lokoja–Abuja–Kaduna highway in Nigeria by Afolayan et al. (2022) used the KDE method and weighted mean center analysis to show the hotspots of accidents between the Sabon Gida and Yangoji curves [
20]. The analyses of the years 2013, 2014, and 2017 showed significant accidents to occur at the Abaji Bridge, Gen. hospt. Abaji, and Abaji U-turn, with policy recommendation implications in these regions. Ben-Hamouche, Al-Janahi, and Al-Madani (2011) analyzed traffic accidents in Bahrain through mapping, spatial, and statistical techniques from 2002 to 2004. They found young males (20 to 25 years) to be the most vulnerable to injuries due to poor driving attitudes, overspeeding, and carelessness [
21]. Studies on longer periods and bigger areas focusing more on geospatial perspectives can be made using this study.
Another study by Yunus (2021) geospatially analyzed the road traffic crashes in the Kano metropolis, Nigeria, and the response of emergency services to it through quantitative and geospatial datasets obtained through GPS and ambulance surveying, and the digitization of high spatial resolution satellite imagery (30 cm) [
22]. It was observed that there was minimal to no connection between Emergency Healthcare Facilities (EHCF), ambulances, and crash locations using closest-neighbor and network analysis. Moreover, it was seen that ambulances took an average of six minutes to reach the crash place, proving to be a part of an ineffective emergency response system and threatening the lives of victims. In addition to GIS methods, Al-Mistarehi et al. (2022) combined machine learning models with GIS analyses, such as hotspots, to predict future traffic accidents and their severity [
23]. They found the random forest model to be the best predictor for injuries and their severity, specifically considering highways, vehicles, and the environment. The study is unique and provides a different way of analyzing road crashes, which can further be used to understand how residential, commercial, educational, and industrial factors affect traffic accidents.
2.2. Road Traffic Accidents and Spatio-Temporal Analysis
This subsection includes studies that applied and integrated spatio-temporal dimensions to analyze road traffic accidents. Advanced GIS-based techniques and time-based analytical methods were deployed to comprehensively understand how accident patterns evolve and interact with urban environments over time. Spatio-temporal analysis techniques such as Bayesian spatio-temporal modeling, time-series analysis, and GIS-based temporal mapping were covered to provide valuable insight into the dynamic nature of traffic accidents.
Al Hamami and Matisziw (2021) studied the spatial and temporal relationships among traffic accidents, in which they applied several algorithms, such as K-means and DBSCAN, to partition the incidents into similar groups based on proximity [
24]. Morphological changes were tracked, and prevention strategies were presented. The authors encourage future research on the investigation and evolution of accident clusters over large regional areas that will assist in evaluating morphological changes based on events. Bíl, Andrášik, and Sedoník (2019) analyzed the traffic crash hotspots in the Czech road network from 2010 to 2018 [
25]. They used a 3-year time window incremented with a one-day step. The researchers observed that the hotspots covered about 8% of the road network. The road network was split into road segments with an overall length of 3933 km. The KDE and clustering methods showed that, while certain hotspots evolve over time, others diminish with better security measures.
Another study by Liu and Sharma (2017) on the spatio-temporal trends of traffic crashes showed vehicle-miles traveled (VMT) to have a significantly fatal effect using linear Bayesian spatio-temporal models across the 99 counties of Iowa [
13]. They found that neither the weather nor socio-economic parameters significantly affected these crashes from 2006 to 2015, highlighting the importance of the spatio-temporal analysis of road crashes and further advising model modification. The study can be used to study the impact of a smaller time scale on road traffic accidents. New research combining space and time interactions can be made to evaluate spatio-temporal and within-variable interaction better. Feizizadeh et al. (2022) very interestingly examined the traffic accidents in the urban areas of Tabriz during the COVID-19 pandemic [
26]. They also used the KDE method to monitor the severity index of these accidents from April 2018 to November 2020. The researchers observed a significant and noteworthy reduction in car accidents during the pandemic lockdown, especially in highly affected areas with a large concentration of children and older people. Hotspot analysis and urban planning management indicated future policies to reduce these car accidents further to avoid as many injuries as possible.
Furthermore, a case study by Shahi et al. (2023) of Rotterdam using Network Kernel Density Estimation (NKDE) analysis observed poor management of safety measures in the area, which resulted in a high amount of road accidents [
27]. They also demonstrated that, more than measuring the hotspot and cold spots of a road network, it is required to determine the severity and frequency of road accidents, and more data are required for better understanding and yielding accurate results. Mohaymany et al. (2013) also used the KDE method to geospatially detect the road segments with high risks of traffic crashes [
28]. Additionally, the influence of traffic parameters such as the average traffic speed and traffic volume on the spatial distribution of road traffic accidents on a 56 km freeway between Qazvin and Abyek in Qazvin Province (Iran) using KDE and hotspot analysis was studied by Zandi et al. (2023). They found fatal accident clustering to occur [
29]. Many other researchers have studied risk assessments and crashes using geospatial techniques, which gives us an ulterior motive to consider the safety of drivers and pedestrians as necessary [
30,
31,
32].
Several researchers have focused explicitly on traffic accidents in the four cities of California. González et al. (2019) examined crashes in the Los Angeles region and the Bay Area, noting significant increases in crashes associated with commercial gentrification [
14]. Hasani et al. (2019) explored the rise in fatalities among bicyclists and pedestrians, while in San Jose, Kasmalkar and Suckale (2021) identified an increase in accident rates during peak hours [
15,
16]. Khaghani (2020) reported over 3000 traffic accidents per year in Sacramento [
33]. While providing valuable insights into traffic accidents, these studies often face methodological scope, geographic focus, or temporal coverage limitations. In contrast, our study extends this foundational research by employing advanced GIS techniques such as space–time cube analysis, DBSCAN, Kernel Density Estimation, and the Getis-Ord Gi* method on a large-scale dataset over five years. This methodological diversity allowed us to conduct cross-city comparisons and uncover shared and unique traffic accident patterns across Los Angeles, Sacramento, San Diego, and San Jose. These comparative insights provide a more comprehensive understanding of the factors that lead to traffic accidents within California’s major cities. In addition, the result of this study will help decision-makers to make more informed decisions on enhancing road safety.
3. Materials and Methods
3.1. Study Area and Data Collection
California, a state located in the United States at 36.7783° N and 119.4179° W, was used as the study area for this research. This study focuses on road traffic accidents in four major cities of California: Los Angeles, Sacramento, San Diego, and San Jose. These cities experienced a high rate of traffic collisions in the past years, as stated in recent studies [
15,
16,
33], which makes them essential locations for our study.
Figure 1 shows a graphical representation of the regions under study.
For this research, road traffic accident data from the four cities of California were used from 2018 to 2022. The dataset provided by Sobhan [
34,
35] comprises 7.7 million records documenting road traffic accidents across 49 states with more than 30 attributes; only the accidents for the four cities in this study were extracted. Examples of attributes are accident severity (1 to 4), time of accident, location, description, street, city, county, state, zip code, and meteorological variables (temperature, humidity, visibility, precipitation, and wind speed).
Table 1 shows the total number of road accidents per city throughout the study.
3.2. Methods
This section describes the methodology, which utilized GIS and statistical techniques to analyze and observe road traffic accidents. The study began with spatio-temporal and statistical analyses to quantify accident occurrences, followed by an accident severity analysis to identify high-risk areas. Then, the study advanced to visualize hotspot locations using space–time cube analysis. Additionally, it employed the DBSCAN algorithm for clustering accident data and concluded with population density correlation and Getis-Ord Gi* techniques to examine the interplay between population density and accident frequencies. Integrating diverse GIS and statistical methods provided a multidimensional perspective to explore the complex pattern of road traffic accidents that will lead to more informed decision-making.
Figure 2 shows the flowchart of the methodology adopted for this study.
3.2.1. Spatio-Temporal and Statistical Analyses
In this study, spatio-temporal and statistical analyses were conducted to investigate the annual and seasonal patterns in accident occurrences. Initial steps involved the extracting of variables such as the time of occurrence and location coordinates of each accident. This enabled the analysis to be at a day-by-day granularity. Two principal equations were applied to calculate the total number of accidents for the quantitative assessment. Total Accidents () is defined as the aggregated count of accidents for each location per day and Daily Accidents () represents the cumulative accidents across the entire region per day.
For this study, the seasons were defined based on the division of months as follows:
- 1.
Winter consists of December, January, and February;
- 2.
Spring consists of March, April, and May;
- 3.
Summer consists of June, July, and August;
- 4.
Fall consists of September, October, and November.
This approach allowed for the graphical representation of both annual trends and seasonal fluctuations in accident frequency.
3.2.2. Accident Severity Analysis
Our analysis utilized a pre-defined, four-level severity classification present within the dataset to evaluate the severity of road traffic accidents. This classification provided a metric for quantifying the impact of each accident as follows:
- 1.
Level 1 (Least Severe): minor accidents may cause property damage;
- 2.
Level 2 (Moderate Severity): accidents resulting in slight to moderate injuries;
- 3.
Level 3 (Severe): serious accidents leading to significant injuries;
- 4.
Level 4 (Most Severe): fatal or catastrophic events may cause death.
By employing this existing severity index, our study conducted a detailed spatio-temporal analysis to map the distribution and identify patterns of accident severity across the studied regions.
3.2.3. Identification of Hotspots
In identifying hotspots, our study employed space–time cube (STC) analysis to visualize and analyze the dynamics of accident occurrences across both spatial and temporal dimensions. An STC consists of cells in three dimensions, portraying both space and time [
36,
37]. The focus on transitions between areas of high (hotspots) and low (cold spots) accident concentrations reveals changing patterns in accident occurrences. The space–time cube analyses were performed using the
scatterplot.3d package in RStudio. Many researchers have used this method to geo-visualize the spatial and temporal patterns across different regions [
10,
36,
37,
38,
39].
3.2.4. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Our study utilized the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to categorize spatial clusters within traffic accident data. This algorithm is renowned for its effectiveness in grouping data points based on density criteria [
40,
41,
42,
43,
44,
45]. Moreover, DBSCAN has a unique capability to identify clusters of arbitrary shapes and to distinguish noise and outliers from significant data points [
44,
45].
The DBSCAN algorithm needs two parameters: the scan radius (
) and the minimum number of points (
MinPts). These parameters are essential for defining the density threshold for core points within a cluster, with
indicating the neighborhood radius for density estimation and
MinPts establishing the minimum cluster size [
44]. The DBSCAN algorithm measures the distance between two points using Equation (
1):
The
neighborhood of a randomly selected object
p in a dataset
D is determined by Equation (
2):
Moreover, point class
is classified as follows as defined by Hahsler, Piekenbrock, and Doran (2019) [
42]:
A core point if has high density, i.e., where MinPts is a density threshold defined by the user;
A border point if p is not a core point, but is in the neighborhood of a core point , i.e., ;
A noise point.
DBSCAN for this research was performed using the Python coding environment.
3.2.5. Population Density Correlation and Getis-Ord Gi*
The concluding phase of our analysis focused on exploring the relationship between population density and the frequency of road accidents, by utilizing the Kernel Density Estimation (KDE) and the Getis-Ord Gi* technique. The population data in each region were presented from lowest to highest locations, and KDE was implemented on top of the density plot. This gave valuable insight into the relationship between population density and accident frequencies. KDE was calculated using Equation (
3) [
46]:
where
are the input points in the sum within the radius distance of the location;
is the population field value of point i;
is the distance between point i and the location.
The KDE method was selected due to its robust application in spatial analysis and the ability to represent spatial distributions as stated in several studies [
46,
47,
48,
49,
50]. As KDE is a non-parametric density estimator to visualize and analyze spatial data, it requires no assumption that the underlying density function is from a parametric family [
51,
52]. KDE automatically detects the shape of the density according to the data given. Moreover, the cell size and search radius vary by study area and data distribution. For this study, the cell size was determined to be 0.004, whereas the search radius was determined to be 0.02 for all cities.
Following the KDE analysis on the population density, this study employed the Getis-Ord Gi* technique to identify hotspots. This technique uses defined z-scores and
p-values where features with high or low values are spatially clustered [
53]. This statistical model is commonly used to analyze the spatio-temporal hotspots, which are used to observe the local spatial autocorrelation of datasets that identify the spatial clustering patterns of hotspots and cold spots with statistical significance [
38]. This method interprets the spatial clustering of high values as a positive value, whereas negative values indicate spatial clustering of low values [
54]. This is yet again a commonly used hotspot identification tool used by many researchers [
5,
20,
36,
49,
50,
55]. Utilizing KDE alongside the Getis-Ord Gi* technique provides insight into how the population density correlates with accident hotspots. The KDE method highlights areas of high population density as potential accident hotspots, which Getis-Ord Gi* then statistically validates. This integration enhanced our analysis by providing details of critical parts of the roads that require intervention.
4. Results and Analysis
4.1. Spatio-Temporal and Statistical Analyses
The number of accidents that occurred in the major cities of California from 2018 to 2022 is given in
Figure 3. In this study, data were distributed at a daily temporal resolution that defined the total number of accidents in one day over a whole region, which was summed every month. From the figure, we observe that Los Angeles had the highest number of accidents throughout the years and the highest number of accidents occurred in April 2022, totaling 3858 accidents. Moreover, the least number of accidents were observed to have occurred in San Diego in July 2020, with a total number of 101 accidents.
From
Figure 4, we were able to see a seasonal distribution of road accidents across the major cities of California. Studies have shown that weather is also a valuable factor in affecting traffic accidents [
17,
56,
57]. From the graphs, we can observe various accidents that occurred over the regions. From
Figure 4a, we see Los Angeles had the highest number of accidents among all the regions, as shown in
Figure 3. We can observe that most accidents occurred during the spring of 2018 and 2022, whereas winter exceeded the rest of the years. From
Figure 4b, we again see spring and winter to have had the highest number of accidents in Sacramento, except in 2019, when an abundant number is observed during fall. Similarly, San Diego and San Jose also experienced many accidents during the spring and winter seasons (
Figure 4c,d), except for 2019. According to Potoglou et al. (2018), the use of motorcycles increases during the spring season, hence the increased number of road traffic accidents [
58]. Moreover, Glagolev et al. (2018) stated that ice on the road causes a drastic increase in accident rates during the winter [
59].
4.2. Severity Analysis of Road Traffic Accidents
Figure 5,
Figure 6,
Figure 7 and
Figure 8 show the severity maps of California’s major cities from 2018 to 2022. Road accident severity is measured based on various factors, and the assessment can involve a combination of physical damage, injuries sustained, and potential long-term consequences. Severity was divided into four levels, indicating the amount of physical and mental damage done, as described in the Method Section. In this study, we observed that most accidents that occurred across the regions were of a level 2 severity. In contrast, level 1 (least severe) and level 4 (most severe) accidents occurred less frequently across all the studied cities. From
Figure 5, we can observe that Los Angeles had the highest number of level 2 road accidents, indicating slight to moderate injuries. In Los Angeles, we observed the lowest number of accidents with level 1 severity.
Similarly,
Figure 6,
Figure 7 and
Figure 8 for other cities also show the highest number of accidents to be of a level 2 severity, whereas levels 1 and 4 vary in terms of representing the least number of accidents. Researchers have proposed different thresholds for the severity index of road traffic accidents, dividing them into various levels [
60,
61,
62].
4.3. Hotspot Identification
Hotspots locations were identified using space–time cube (STC) analysis. This cube was plotted between the latitude, longitude, and years to investigate how different spatial and temporal unit sizes affect the outcome [
36]. This method effectively visualized the distribution of accidents, highlighting areas with concentrated incidents over time.
Figure 9 displays the STC and shows that each city exhibits distinct patterns of accident occurrences, with the number of accidents varying annually and geographically. In Los Angeles, a dense clustering of accidents is apparent across all years, suggesting a persistent presence of high-risk areas that warrant continuous monitoring and intervention (
Figure 9a). In Sacramento, as in
Figure 9b, a high number of accidents occurred in 2021 and 2022, indicating emerging hotspot locations.
Conversely, from
Figure 9c, we observe San Diego to have a comparatively lower accident rate as compared to the two cities mentioned before, in which a smaller number of accidents were observed in 2018. Then, it gradually increases throughout the years, with 2022 having the highest number of accident occurrences. In contrast, San Jose shows the lowest density of accidents, whereas 2018 had the lowest number of accidents, and yet there is an observable spike in 2021, which necessitates further investigation to understand the causes behind this rise (
Figure 9d).
4.4. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
DBSCAN involves the determination of two parameters, as stated in the Method Section: the epsilon
value and
MinPts, which are both based on a user-defined neighbor radius and the existing number of points related to the radius [
40].
indicates the radius of the neighborhood considered for density estimation, which is also an elbow point (a point where the distances start to increase at a slower rate), and (
MinPts) represents the density threshold to identify core points [
42]. For this study, different values of epsilon were determined for all four cities, as shown in
Figure 10. The epsilon
value for Los Angeles was determined to be 0.002, Sacramento was determined at 0.004, and San Diego and San Jose were both determined at 0.003. The
MinPts values of all cities were set to be 5.
Figure 11,
Figure 12,
Figure 13 and
Figure 14 show the DBSCAN clustering of all cities from 2018 to 2022. As clusters are defined by grouping similar data, and since we plotted these clusters between the longitude and latitude, we can say that accidents occurring at neighboring coordinates to one another with the same characteristics (such as accident severity) were grouped into clusters.
From
Figure 11, we observe Los Angeles to portray high clustering in all years. DBSCAN also proved our prior analyses, where we observed Los Angeles to have had the highest accident rates in 2021 and 2022, in which the data distribution is presented in four different clusters. Similarly, Sacramento had the highest clustering in 2021 and 2022 (
Figure 12). The number of accidents, hence the clustering of Sacramento, was far less than that of Los Angeles, which could be attributed to many factors, such as a lower population and less vehicle use.
Figure 13 and
Figure 14 also show the highest clustering in 2021 and 2022 for San Diego and San Jose. All our DBSCAN results are supported by the time-series and hotspot analysis discussed before. The rise in the population throughout the years caused a rapid increase in the accident rate, which is why denser clusters are observed in the ending years [
56,
63].
4.5. Population Density Correlation and Hotspot Identification
The integrated result of the population density correlation and hotspot identification is given in
Figure 15. From the figure, the population density is defined by the green color, with the darkest green representing the areas with the highest population densities. From our analysis, we observed Los Angeles to have the highest population, whereas San Jose had the lowest (
Table 2).
Moreover, we also observed the highest KDE in Los Angeles and the lowest KDE in San Jose. San Diego also showed hotspot clustering at various locations in the region. Apart from demonstrating a positive relationship between the population density and KDE analysis in a particular region, we also observed high-density estimation hotspots within the highly populated areas of each city. Furthermore, we also determined high-density regions to have high hotspot clustering, which demonstrates a positive relationship between the two. Our results are supported by past research, which also observed a high population to be linked with an increase in accident rates [
49,
64,
65].
5. Discussion
The integration of several GIS techniques and statistical analyses in this study provides insight into the spatial and temporal patterns, accident severity, and population density correlations of road traffic accidents. This combination included statistical analyses, severity mapping, space–time cube analysis, DBSCAN, Kernel Density Estimation (KDE), and the Getis-Ord Gi* method.
The statistical analyses alongside severity mapping and the space–time cube method, which provided a novel perspective in visualizing accident hotspots over time, used in this study explained the spatial and temporal distribution of road traffic accidents within the examined cities. In Los Angeles, the analysis captured the seasonal variations with high concentrations of accidents during spring and winter, which was also consistent in other cities like Sacramento and San Diego. This seasonal trend may suggest the correlation between human activities associated with different times of the year and environmental factors such as weather conditions. In addition, across all cities, most accidents fell within a level 2 severity, indicating the occurrence of moderate injuries. Looking at these seasonal patterns and severity levels of accidents is essential for preventing and reducing traffic road accidents during these periods of the year by using seasonal traffic management strategies or enhancing public awareness.
Furthermore, the persistent and emerging hotspots, particularly in Los Angeles, indicate long-standing issues in some city regions over the five years. The continuing hotspot locations were identified through different GIS techniques, showing the importance of spatio-temporal analysis to point to critical areas that require more safety measures to reduce the number of road traffic accidents.
Similarly, the DBSCAN clustering results support the hotspot analysis by identifying areas with high accident rates. The variation in clustering patterns across different years indicates the dynamic nature of road traffic accidents. Furthermore, the population density correlation analysis using KDE and the Getis-Ord Gi* method provided valuable insights into the relationship between the population density and accident hotspots. For instance, in densely populated Los Angeles and San Diego areas, we observed a higher frequency of accidents. In addition, the clustering patterns in certain Los Angeles and Sacramento zones were more susceptible to frequent and severe traffic accidents, which indicates the need for enhanced safety measures in densely populated areas. Lastly, to illustrate the significance of our methodological choices, it is essential to highlight the integration of well-established GIS techniques—space–time cube visualization, DBSCAN clustering, Kernel Density Estimation, and the Getis-Ord Gi* statistic—to analyze spatial-temporal patterns of road traffic accidents in California’s major cities. This multi-method approach allowed for cross-validation, where insights from one technique, such as hotspot identification through KDE, were verified by another, like clustering patterns identified by DBSCAN, enhancing the robustness of our conclusions. In addition, this combination enabled a multi-layered analysis of traffic accident data that pointed out the hotspot locations, the seasonal trends, severity variations, and the correlations with population density. Finally, the insights from this study can guide policy decisions to reduce and mitigate road traffic accidents.
6. Conclusions
In conclusion, using various GIS techniques, our study presented a comprehensive analysis of road traffic accidents in major Californian cities. Spatio-temporal patterns were observed, accident severity was analyzed, hotspot locations were identified, and a technique that displays the correlation using population density, KDE analyses, and accident hotspots was integrated. Through our analysis, several key findings emerged:
Los Angeles was found to have the highest number of accidents throughout the years, highlighting the importance of targeted safety measures in this city. In addition, seasonal analysis showed varying accident patterns across different cities and years, with spring and winter emerging as high-risk seasons for road accidents;
The severity map indicated that most accidents fell within a level 2 severity, which impacts the injured party’s physical and mental well-being to a low to medium severity;
Hotspot visualization with space–time analysis showed high clustering in Los Angeles throughout the years. Moreover, the hotspots identified were supported by the time-series and spatial distribution analyses, showing the need for safety measurements in the cities;
Moreover, our analysis using DBSCAN and KDE techniques provided valuable insights into clustering patterns and population density correlations. We identified high-density regions with a corresponding increase in accident hotspots, emphasizing the need for more measures to mitigate risks in densely populated areas.
However, our study acknowledges some limitations that can be used as directions for future research. (1) Include human factors: In this study, our analysis did not explore the critical role of human factors in road traffic accidents. Adding data, such as driver attitudes and pedestrian movement patterns, may provide more understanding of the causes and factors that lead to accidents. Additionally, this will lead to more prevention strategies that are built on different aspects. (2) External factors: Since we only incorporated factors such as space, time, weather, and population density, other factors such as urban infrastructure, policy changes, and socio-economic conditions can be studied to provide more insight into understanding the patterns of accidents. (3) Population movement and temporal aspect: Studying the dynamic population movements within each city can assist in understanding how population densities impact the accidents rate. Furthermore, extending the time-frame aspect for studying and analyzing road traffic accidents can provide a more comprehensive understanding of the factors leading to road accidents.