5.1. Discussion
In this study, before performing the identification and classification of accident black spots, the hidden accident road sections were first determined based on the number of accidents occurring within a given time period. Then, model calculations were performed for the hidden accident road sections, and the specifics were listed strictly according to the model construction process. Then, the identified hidden accident road sections were screened and identified to determine accident black spots and non-accident black spots. Finally, the identified accident black spots were graded (at two, three, four and five levels) according to the proposed accident black spot assessment method.
The higher the quality of the data collected, the closer the model calculation results were to the actual situation. The accident location data collected in this paper were obtained from the Jiangbei District, Ningbo, China. The accident records were obtained through the cell phone application of the Jiangbei District Traffic Accident Collection Platform, and the data statistics were collected from 0:00 on 1 March 2020 to 0:00 on 1 January 2021, a period of 10 months. A total of 29,623 accident records were collected through the cell phone application, for which the data sample of valid GPS coordinates and accident locations totaled 29,275 coordinates, as 348 invalid samples were excluded, and the accident collection efficiency of this cell phone application reached 98.83%. To further verify the accuracy and quality of the collected accident data, the effective police data of the Jiangbei Traffic Police Brigade Command Center and Accident Squadron were retrieved for comparison, and it was found that the effective police data for the same period contained 32,330 cases, and the sample volume of effective accident data collected based on the mobile application accounted for 90.55% of the actual police cases, which satisfied the accuracy and quality requirements of the accident data analysis.
Since the accuracy of accident prediction is directly related to the accuracy of accident black spot identification, the overall accuracy of accident prediction in this paper was relatively good. From
Table 8, it can be concluded that (a) the average relative errors of the prediction results of the 10 accident hazards sections were all between 0.01 and 0.05 (class II), and the overall relative error was small, indicating that the prediction results were relatively good. Among them, the maximum value of the average relative error was 4.74% (0.049) for Century Avenue 2; this was due to the small sample size regarding the number of accidents relative to other prediction objects (population, economy, etc.). (b) The absolute correlations of the predicted results for the 10 accident-hazardous road sections were all greater than 0.9 (level I), indicating a high degree of correlation between the predicted number of accidents and the actual number of observed accidents. (c) The mean–variance ratios of the prediction results for the 10 accident-hazardous road sections were all less than 0.35 (level I), indicating that the residuals between the predicted values and the original values were small, and the smaller the mean–variance ratio is, the higher the prediction accuracy. It is undeniable that the sample size of accident data in this paper was small (with a single month as the statistical time period), and the number of accidents at each accident hazard location was almost less than 100. This also leads to the fact that the expected numbers of accidents at the accident hazard locations in the prediction results must be accurate to two decimal places to facilitate the accuracy test; otherwise, the test results would produce large errors. When the number of accidents was used as the object of study in previous works, a year was used as the statistical time, which of course prevented the problem of the prediction results needing to be accurate to two decimal places.
Among the traditional accident black point identification methods, the quality control method is closer to the empirical Bayesian method in terms of black point identification ranking results compared to the accident rate method, and this phenomenon can be explained from the principle of the method [
39,
40]. Both the quality control method and the empirical Bayesian method consider the number of traffic accidents occurring at a point as a random variable, which is consistent with reality, while the accident rate method does not consider the random characteristics of accident occurrence and differs significantly from the other two methods. The difference between the empirical Bayesian method and the quality control method is that the empirical Bayesian method combines two aspects of information (i.e., the mean accident value of similar accident objects and site-specific historical accident data) for black spot identification, while the quality control method only refers to one aspect of information, namely, historical accident data. It should be noted that, due to the random fluctuation of accident occurrence, using historical accident data alone cannot accurately reflect the characteristics of accident distribution at a particular location, while the empirical Bayesian method can eliminate this effect and enable more accurate results to be obtained, see
Table 13.
Tian et al. (2019) [
4] used the optimized value of empirical Bayesian accident number and the predicted value of accident number derived from the accident prediction model to construct the ranking index of accident black spots after accident black spot identification, and the difference and ratio between them reflected the severity of accident black spots. However, the method is still not free from the shackles of quantifying the accident severity and cannot provide managers with the specific ranking of accident black spots. We discussed the SI of accident black spots in a graded manner, and within the range (−1, 1), the macro level can be divided into accident black spots and non-accident black spots, and the micro level can be divided into secondary, tertiary, quaternary and quintuple accident black spots according to the size of the given interval. When the SI is less than 0, according to the accident black spot discrimination rule, the hidden accident location is not an accident black spot; when the SI is greater than 0.6, it means that the PSI value of the accident black spot is extremely large, and this type of situation generally occurs less often. However, it should be noted that while the accident black spots identified by the model need more attention, accident prevention and management for non-accident black spots should also receive the same attention.
In this study, the grey Verhuls method was chosen for accident prediction based on the fact that domestic road traffic accidents have exhibited an S-shaped process with a saturation state in recent years [
41]. Grey prediction aims to grasp the development law of a system and make scientific quantitative predictions regarding the future development of the system through the processing of the original data and the establishment of a grey model. The most commonly applied models in the grey prediction field are the GM (1, 1) model and grey Markov prediction model, which can be used to predict the number of traffic accidents, the number of deaths, the number of injuries, the amount of property damage and other indicators. However, the GM (1, 1) model is applicable to sequences with strong exponential laws and can only describe monotonic change processes. However, the road traffic system is a dynamic time-varying system, and traffic accidents, as a behavioral characteristic quantity [
42], have a certain level of stochastic volatility and present a non-smoothly varying stochastic trend. GM (1, 1) cannot solve this situation. Furthermore, for traffic accident prediction involving S-shaped processes with saturation states, the grey Markov prediction model cannot perform state classification. Thus, in this paper, the grey Verhuls method was chosen for accident occurrence prediction, as it provides sufficient conditions for accident black spot identification and delineation [
43,
44].
5.2. Conclusions
We proposed a method that can quickly identify urban road accident black spots and classify their ranks. The model combined the grey Verhuls method and the empirical Bayesian method and took the numbers of accident occurrences at urban road hidden accident locations as the original data. After the calculation of the combined model, the proposed method obtained the expected numbers of accident occurrences at the hidden accident locations and used the calculation results of the safety improvement space and SI as the discriminating indexes of the accident black spots to discriminate whether the hidden accident locations are accident black spots or not. Finally, the accident black spots were divided into the following levels:
(a) In terms of accident black spot identification, accident black spots and non-accident black spots were identified for the accident data of 10 hidden accident sections over 10 months, and from the results, 12 non-accident black spots and 88 accident black spots were detected for a total of 100 data points, which means that the model can be used for accident black spot identification.
(b) In terms of accident black spot classification, the rationality of the proportions of the identified accident black spots with grades I, II and III in the two-grade assessment, three-grade assessment, four-grade assessment and five-grade assessment were discussed, and the analysis results showed that the five-grade assessment > the four-grade assessment > the three-grade assessment > the two-grade assessment and that the five-grade assessment of accident black spots is most suitable for the actual situation, which provides a new way of thinking regarding accident black spot classification.
(c) The grey Verhuls–Empirical Bayesian combination method not only adapted to the current law of road traffic accidents in China but also achieved good test results for the three tested accuracy indices, the average relative error, absolute correlation, and mean square error ratio, indicating that the model was highly accurate in terms of accident black spot identification.
However, this paper has some limitations. First, the sample of this paper included only 10 months of accident data records collected by a cell phone application, and to expand the sample size, this paper fit the model according to half-month data. If accident data records for longer time periods are added appropriately as the initial data, the average relative error, absolute correlation and mean square deviation ratio of the expected number of accidents may be measured, and the test results might improve. Second, the accident black spot identification approach in this paper was mainly based on the number of accidents, and factors such as the geometric conditions of hidden accident locations and traffic volumes were not considered due to the difficulty of cell phone application-based information collection. If these factors are added appropriately, the results of accident black spot identification recognition may improve. Finally, the accident black spot identification approach in this paper was actually based on long-term cumulative accident data for classification purposes, which meant that only after one month of accidents occurred could we identify whether these hidden locations were accident black spots or not, which is not good for accident prevention. If we can consider daily traffic accident prediction or hourly traffic accident prediction, we can prevent accident occurrences at these accident black spots in a timely manner.
In future studies, we will conduct a study of accident black spot identification with a longer statistical time period and consider factors such as the geometric conditions of hidden accident locations to make the relationships between the model and the actual traffic conditions more relevant. Furthermore, the short-term prediction of traffic accidents can also be included in the study when the statistical time period of accident data is sufficiently long.