Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm

Guido, Giuseppe; Haghshenas, Sina Shaffiee; Haghshenas, Sami Shaffiee; Vitale, Alessandro; Gallelli, Vincenzo; Astarita, Vittorio

doi:10.3390/su12176735

Open AccessArticle

Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm

by

Giuseppe Guido

^*

,

Sina Shaffiee Haghshenas

,

Sami Shaffiee Haghshenas

,

Alessandro Vitale

,

Vincenzo Gallelli

and

Vittorio Astarita

Department of Civil Engineering, University of Calabria, Via Bucci, 87036 Rende (CS), Italy

^*

Author to whom correspondence should be addressed.

Sustainability 2020, 12(17), 6735; https://doi.org/10.3390/su12176735

Submission received: 31 July 2020 / Revised: 13 August 2020 / Accepted: 17 August 2020 / Published: 20 August 2020

(This article belongs to the Special Issue Algorithms, Models and New Technologies for Sustainable Traffic Management and Safety)

Download

Browse Figures

Versions Notes

Abstract

Evaluating road safety is an enduring research topic in Infrastructure and Transportation Engineering. The prediction of crash risk is very important for avoiding other crashes and safeguarding road users. According to this task, awareness of the number of vehicles involved in an accident contributes greatly to safety analysis, hence, it is necessary to predict it. In this study, the main aim is to develop a binary model for predicting the number of vehicles involved in an accident using Neural Networks and the Group Method of Data Handling (GMDH). For this purpose, 775 accident cases were accurately recorded and evaluated from the urban and rural areas of Cosenza in southern Italy and some notable parameters were considered as input data including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed; and the number of vehicles involved in an accident was considered as output. In this study, 581 cases were selected randomly from the dataset to train and the rest were used to test the developed binary model. A confusion matrix and a Receiver Operating Characteristic curve were used to investigate the performance of the proposed model. According to the obtained results, the accuracy values of the prediction model were 83.5% and 85.7% for testing and training, respectively. Finally, it can be concluded that the developed binary model can be applied as a reliable tool for predicting the number of vehicles involved in an accident.

Keywords:

road safety; transportation system; neural network; GMDH; binary model

1. Introduction

Since the 20th century, road safety researchers considered accidents as unexpected and unpredictable events [1]. This fatalist notion was overcome by the scientific concept that tries to detect the potential influencing factors that affect the likelihood of road accident occurrence [2]. Traffic safety analysis was traditionally based on historic crash data which presents several lacks due to the limited availability, unreliability and poor quality of collision data [3,4]. Many scientists have spent considerable efforts in analyzing the impacts of various risk factors [5,6,7,8,9] and road safety measures [10,11,12]. For this reason, they have developed a great number of statistical methodologies to approach crash prediction problems [13]. Mathematical models have been the most popular technique in analyzing crash data [14]. The most commonly-used methods are based on Logistic Regression [15,16,17,18,19,20,21], Ordered Choice Models for the purpose of severity modeling of the crash injury data [22,23,24,25], Bayesian Hierarchical Models [26,27,28,29,30,31,32], Bivariate Models [33], Nested Logit Models [34], Multinomial Logit Models [35,36,37,38,39], in order to address the heterogeneity of the crash outcomes or Mixed Logit Models [40,41,42,43,44,45], to analyze the crash injury severities.

Even though good methodological progresses were made over the years, nowadays, it is difficult to use statistical models to investigate efficiently the factors related to injury severity. Numerous impediments related to the statistical analysis of crash data remain [46]: the need to satisfy some statistical hypotheses [47] or the difficulty in managing several variables with many categories [48,49]. To overcome the deficiencies of these methods, road safety researchers have proposed new Non-Parametric Models [50], such as the Classification and Regression Tree (CART), widely used for the analysis of crash outcomes [51,52,53,54], and the Support Vector Machine (SVM) models which are normally utilized for the classification of crash injury severity [55,56,57,58,59]. Recently, Artificial Neural Network (ANN) has also been used to carry out the classification of crash severity and their applications have grown extraordinarily [60,61,62,63,64]. All these models have shown excellent analytical capabilities, leading researchers to several useful conclusions, but the ever-increasing amount of data requires the development of novel efficient algorithms which are able to handle these traffic crash records. For this reason, Genetic Algorithm techniques can be applied as new optimal factors’ searching algorithms to improve the performance of the analysis [55,65]. For example, the Nondominated Sorting Genetic Algorithm (NSGA-II), a fast multi-objective genetic algorithm, has been used to explore the process of significant factors’ identification of traffic crashes from a multi-objective optimization (MOP) standpoint by Li et al. [66]. They have also defined the index of Factor Significance (Fs) for quantitative evaluation of the significance of each factor and have individuated the top five significant factors for a better fatal injury crash identification: (1) Driver Conduct, (2) Vehicle Action, (3) Roadway Surface Condition, (4) Driver Restraint and (5) Driver Age. Amiri et al. [67] have investigated the severity of Run-Off-Road (ROR) crashes where elderly drivers, aged 65 years or more, hit a fixed object by applying two types of Artificial Intelligence (AI) techniques: Intelligent Genetic Algorithm (IGA) and Artificial Neural Network (ANN). The authors identified Average Annual Daily Traffic (AADT), number of involved vehicles, age, road surface condition, and gender as the most important variables in the developed ANN, respectively. Zeng and Huang [68] have proposed, instead, a convex combination (CC) algorithm to fast and stably train a neural network (NN) model for crash injury severity prediction, and a modified NN pruning for function approximation (N2PFA) algorithm to optimize the network structure employing a two-vehicle crash dataset provided by the Florida Department of Highway Safety and Motor Vehicles. Delen et al. [69] have used a series of artificial neural networks to model the potentially non-linear relationships between the injury severity levels and crash-related factors to identify the prioritized importance of these ones. The technique used by Wang et al. [70] is more complex. They have implemented a linear regression model and two machine-learning algorithms, including a back-propagation neural network (BPNN) and a least squares support vector machine (LSSVM), to explore the distance and time gap between the initial and secondary accidents, using inputs of crash severity, violation category, weather condition, tow away, road surface condition, lighting, parties involved, traffic volume, duration, and shock wave speed generated by the primary accident. Similarly, Liu et al. (2018) have studied an automated way of predicting the crash rate levels for each carrier using three different classification models (Artificial Neural Network, Classification and Regression Tree (CART), and Support Vector Machine) and three separate variable selection methods (Empirical Evidence, Multiple Factor Analysis, Garson’s algorithm) [71]. Furthermore, there is growing interest in using traffic safety analysis techniques. Recently, Formosa et al. [72] have presented a centralized digital architecture and employed a Deep Learning methodology to predict traffic conflicts. Traffic conflicts have been identified by a Regional–Convolution Neural Network (R-CNN) model which has detected lane markings and tracks vehicles from images captured by a single front-facing camera of an instrumented vehicle. Afterwards, this data has been integrated with traffic variables and calculated safety surrogate measures (SSMs) via a centralized digital architecture to develop a series of Deep Neural Network (DNN) models in order to predict these traffic conflicts. With recent developments in data collection techniques, big data infrastructure and machine learning algorithms can be utilized to provide appropriate solutions for the highway traffic safety system [73,74]. For example, Huang et al. [75] have explored the feasibility of using deep learning models to detect crash occurrence and predict crash risk. For this purpose, they used Artificial Intelligence and Volume, Speed, and Sensor Occupancy data collected from roadside radar sensors along an Interstate in Iowa. Similar to this research, Xie et al. [76] utilized rich information generated from connected vehicles to obtain surrogate safety measures (SSMs) for risk identification. In particular, they have proposed time to collision with disturbance (TTCD) for risk identification in order to capture rear-end conflict risks in various car-following scenarios, even when the leading vehicle has a higher speed.

As could be seen from a literature review, the Artificial Neural Network (ANN) methodology represents a robust tool used to investigate complex phenomena without assuming any preliminary hypotheses on the model. The main aim of this research is to develop a binary model for predicting the number of vehicles involved in an accident through the use of Neural Networks and the Group Method of Data Handling (GMDH). The authors, applying a multi-scale approach, collected and evaluated 775 accident cases from urban and rural areas in the Province of Cosenza, in southern Italy. Several notable parameters were considered as input data of the model, including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed. Obviously, the number of vehicles involved in an accident was considered as output. In this study, for the training stage, 581 accident cases were selected randomly from the dataset. The rest were used to test the developed binary model. A confusion matrix and a Receiver Operating Characteristic curve were used to investigate the performance of the proposed model.

The paper is organized as follows: The methodology is presented in Section 2, with the theoretical description of GMDH type of neural network and the binary model functional form and a correlation analysis among data; in Section 3, a case study is described, Binary classification models are constructed and then the best model is selected; the obtained results of the best model are discussed in Section 4; and in Section 5, the conclusion is presented and some recommendations for future studies are suggested.

2. Methods

To predict the number of vehicles involved in an accident, a binary model was developed. The model is based on Neural Networks and, in particular, makes use of the Group Method of Data Handling (GMDH) technique. In this study, 775 accident cases were analyzed, employing a portion of the database for the training phase and the rest for the setup of the binary model. The performance of the proposed model was investigated using a confusion matrix and a Receiver Operating Characteristic. The flowchart of steps of conducting the research is shown in Figure 1.

2.1. Group Method of Data Handling (GMDH) Type of Neural Network

In order to assess the complex problems and systems, artificial intelligence and machine learning methods can be applied as an alternative powerful tool instead of classical methods. These methods are widely used in a variety of scientific fields and had a vital role in the development of sciences [77,78,79,80,81,82]. As one of the most important artificial intelligence and machine learning methods, the Group Method of Data Handling (GMDH) type of neural network is a reliable tool for identifying and assessing complex phenomena which is computer-based mathematical modeling. GMDH is one of the families of Inductive Algorithms, which was first introduced by Ivakhnenko [83,84]. He proposed a new idea that an iterative and incremental algorithm could be used instead of building estimation models together. This approach has a suitable capability to tolerate imprecision, uncertainty and deal with vagueness of complex and unstructured systems to reach a reliable modeling. In this approach, polynomial neurons are produced as simple structures and added step by step and then a complex system is formed by combining these simple structures. Natural selection patterns like evolutionary algorithms and gradual model construction indicate the capability of this approach in comparison with classical regression methods in obtaining a high-order input and output relationship [85]. The Polynomial Neural Network (PNN) is known as one of the most basic and important algorithms for building a GMDH model. The general form of GMDH works based on a map of input and output data which is a self-organized and a unilateral neural network, and also, it is called the polynomial of Ivakhnenko equation. The basic neural network map is based on Equation (1) [86,87].

y = a + \sum_{i = 1}^{m} b_{i} x_{i} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} c_{i j} x_{i} x_{j} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{k = 1}^{m} d_{i j k} x_{i} x_{j} x_{k} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{k = 1}^{m} \sum_{l = 1}^{m} e_{i j k l} x_{i} x_{j} x_{k} x_{l}, . .

(1)

where m indicates the amount of data for values X₁, X₂, X₃, …., X_m for an output such as y. By combining the quadratic polynomials of all the neurons based on Equation (2), output

\hat{y}

with an approximate function

\hat{f}

for a set of inputs such as X = (X_i₁, X_i₂, X_i₃, …, X_im) with the least possible error compared to output y was obtained [88].

\hat{y} = \hat{f} (x_{i 1}, x_{i 2}, x_{i 3}, \dots, x_{i m}), i = (1, 2, 3, \dots, m)

(2)

GMDH is made up of several layers; data initially is entered in the first layer and, after processing and combination of data, it is entered in the second layer as a new input. This process continues and when the algorithm reaches an optimal convergence in layer (n + 1) compared to layer (n), the process will be finished. According to Figure 2, data set is divided randomly into two parts including training and testing (checking) parts [89].

GMDH have been used successfully for complex system modeling, pattern recognition, and knowledge discovery, hence, in this study, GMDH was applied to assess safety in a road transportation system.

2.2. Correlation Analysis

Before the binary classification modeling, it should be noted that the parametric correlation of each independent input data set should be calculated and controlled, because, although in this study, the input data set were considered by contribution of experts and the literature review, the correlation analysis is necessary to prevent misleading results. Hence, the Pearson correlation coefficient was used as one of the popular and practical approaches to measure the linear correlation between two variables. It is also called the Pearson product-moment correlation coefficient or bilateral correlation coefficient. Equations (3)–(6) demonstrate the mathematical relations of Pearson’s correlation coefficient [90,91].

ρ = r = \frac{S P_{D x y}}{\sqrt{S S_{X} . S S_{Y}}}

(3)

S P_{D x y} = \sum x y - \frac{(\sum x) (\sum y)}{n}

(4)

S S_{X} = \sum_{i}^{n} x_{i}^{2} - \frac{{(\sum x_{i})}^{2}}{n}

(5)

S S_{Y} = \sum_{i}^{n} y_{i}^{2} - \frac{{(\sum y_{i})}^{2}}{n}

(6)

in which X and Y are the independent parameters. SS_X and SS_Y are the standard deviation of X and Y, respectively. SP_DXY is the covariance of X and Y.

ρ (r)

is called Pearson’s correlation coefficient which is in the interval of −1 and +1. The absolute value of these coefficients is used for

ρ (r)

and the positive and negative signs of these coefficients only show a direct and reverse relation between the two independent variables, respectively. If the value of correlation is close to 1, it is clear that there is a strong relation between two independent parameters and if the value of the correlation is close to 0, it is clear that there is a weak relation between them. In addition, negative correlation demonstrates that as one variable increases, so the other reduces, and vice versa [92,93].

2.3. Binary Modeling

The main goal of the binary classification model is to recognize a pattern and relation between input dataset including daylight, type of accident, weekday, location, speed limit and average speed, and the number of vehicles as a dependent variable (output). In order to construct an optimum binary model for prediction of the number of vehicles, determining the control parameters and performance indices of the algorithm contribute greatly to increasing the convergence speed of the algorithm. Hence, at first, in this study, the confusion matrix is considered as one of the practical performance indices for determining the accuracy and reliability of binary classification results analysis for learning with or without an observer. Figure 3 shows the general form of the confusion matrix for a two-cluster problem. In addition, according to the parameters defined in Equations (7) and (8), the value of accuracy (ACC) and error are calculated, respectively.

A c c = \frac{T P + T N}{T P + F P + T N + F N}

(7)

E r r o r = \frac{F P + F N}{T P + F P + T N + F N} = 1 - A c c

(8)

As mentioned earlier, determining control parameters is the most notable section for increasing the convergence speed of the algorithm. It should be noted that there are no special equations and some of these parameters are determined by previous studies and others are usually determined based on the experience of experts and trial and error [94,95,96]. Hence, in the second step, the binary classification models are constructed based on three of the most important control parameters of the algorithm, including selection pressure (SP), maximum number of layers (MNL) and maximum number of neurons in a layer (MNNL). The SP is considered equal to 0.6 based upon previous studies [85,97]. This parameter influences the sensitivity of the modeling error, which is dimensionless; while the maximum number of layers and maximum number of neurons in a layer are selected according to the experience of experts and trial and error. The MNL is considered 5, 10, 15, 20 and 30 and the MNNL includes 5, 10, 20 and 30, and totally, 20 models were constructed for predicting the number of vehicles. It is worth mentioning that there are some recommendations for the ratio of training and testing data from the whole dataset.

3. Case Study

3.1. Data Collection and Preparation

The dataset was extracted from the Italian ACI-ISTAT database [98] with reference to the years 2017 and 2018. ISTAT is the Italian National Institute of Statistics, the main supplier of official statistical information in Italy. It collects and produces information on the Italian economy and society and makes it available for study and decision-making purposes. ISTAT works in cooperation with the Automobile Club of Italy (ACI) to standardize the accident data, collecting police reports. Statistical information on accidents is collected by ISTAT by means of a total monthly survey of all road accidents occurring in the entire national territory that have caused injuries to people (dead or injured). The ACI actively collaborates in this investigation. The survey takes place by filling in the ISTAT CTT/INC model called “Road accidents” by the authority that intervened on the site (traffic police, carabinieri, municipal police) for each road accident involving a vehicle circulating on the network road and causing injuries. Therefore, accidents from which no injuries to people have resulted, accidents that have not occurred in public traffic areas and accidents in which vehicles are not involved are excluded from the survey.

In order to parameterize the contents of the survey, the following definitions are used:

-: Road accidents: those that occur in a road open to public traffic, as a result of which, one or more people were injured or killed and in which at least one vehicle was involved;
-: Dead: people who died instantly (within 24 h) or those who died from the second to the thirtieth day, starting with that of the accident included;
-: Injured: people who suffered injuries as a result of the accident. Given the difficulty of defining objective criteria on the level of severity of the injuries suffered, there is no distinction between serious or light injuries.

A total of 775 accident cases were accurately recorded and evaluated from urban and rural areas of Cosenza in southern Italy (Figure 4). These accidents have been grouped, taking into account several categories (Table 1).

The ISTAT database was matched up with a traffic surveys on the same rural and urban roads considered, deriving average vehicle speeds and average traffic volumes. The surveys were carried out in October 2019 by using Bluetooth radar sensors to acquire vehicle speed and traffic volumes (Figure 5). Radar sensors were located on the road sections with observed crashes. After the analysis of traffic volumes and speed values’ statistical trends over a ten-year period, and considering social, economic, demographic and travel demand characteristics of the study area, traffic volumes and vehicle speed values were considered invariant over the last five years. Radar sensors were positioned in a segment where it could be assumed that homogeneous flow and speed conditions were present for the entire length. For example, when a sensor was positioned on a link with homogeneous geometric characteristics greater than 2 miles in length, a circular buffer of 2 miles diameter around the location of the radar sensor (1 mile upstream and downstream) was traced [99]. The geometric homogeneity of a road segment was defined, taking into account number of lanes, lane and shoulder width, speed limit, median type, and median width.

3.2. Correlation Analysis

In this section, after selecting and preparing a dataset including daylight, type of accident, weekday, location, speed limit and average speed, a correlation analysis was conducted based on Pearson’s correlation coefficient by statistical package for the social sciences (SPSS) software. The obtained results are shown in Table 2.

According to Table 2, there is a weak correlation between the input data, which is therefore suitable for modeling; as it is known, if |

ρ

| > 0.85, the correlation coefficient is defined as “strong”, which is inappropriate for modeling. For example, the value of Pearson’s correlation between Daylight and Average speed is 0.01, and it means that not only are they independent of each other, but also, they have a direct relation; hence, by increasing or decreasing one of them, another will increase or decrease, respectively. Additionally, Daylight and Weekday are independent of each other with a correlation equal to −0.12 and they have an inverse relation. In addition, Type of accident is independent from other variables and it has an inverse relation with other variables. It is worth mentioning that although there is a high correlation coefficient between Speed limit and Average speed of about 0.85, this value can be acceptable by considering their nature. Consequently, it can be concluded that the value of

ρ

is acceptable for all variables in this study and it shows that they were properly selected.

3.3. Binary Modeling

In this study, 775 accident cases were accurately evaluated and recorded from the urban and rural areas of Cosenza in southern Italy, and based on the suggestion proposed in Looney’s research study, 0.75 of dataset (581 cases) were selected randomly to train, and the rest (0.25 of dataset) were used to test the developed binary model [100]. As mentioned before, there are considered three control parameters for constructing models that the SP is considered equal to 0.6, based upon previous studies [85,97], and also, the values of MNL are considered 5, 10, 15, 20 and 30 and the values of MNNL include 5, 10, 20 and 30, hence, a total of 20 models were constructed for forecasting the number of vehicles. The obtained results of 20 models are shown in Table 3.

Finally, after constructing the models to select the best model, a simple ranking method was used for ranking each model which was introduced by Zorlu et al. [101]. The results of this ranking are shown in Table 4.

According to the obtained results from Table 4, the 16th and 19th models have the highest and lowest ranks among other developed models, which includes SP, MNL and MNNL of 0.6, 20 and 30, and also 0.6, 30 and 20 respectively.

4. Results and Discussion

As mentioned above, the 16th model indicates the best performance among the 20 developed models, whose MNL value of optimum is 30. Figure 6 shows the value of root mean square error (RMSE) in each layer. Although the deference of RMSE between consecutive layers from the second layer to end shows the desired precision level, this value is fixed from the 28th layer to the 30th, which demonstrates the suitable speed of convergence and flexibility of the algorithm.

According to Figure 6 and Equations (7) and (8), the obtained results of the confusion matrix for the 16th model is calculated and shown in Figure 7 for training (a), testing (b) and all data (c). For training data in the confusion matrix, the results explain that the optimum model could estimate 106 and 3 data of class “0” (number of vehicles involved in the accident = 1) as correctly and wrongly, respectively, whose accuracy was 97.2%, and also it could predict 392 and 80 data of class “1” (number of vehicles involved in the accident > 1) as correctly and wrongly, respectively, whose accuracy was 83.1%. It should be noted that the total accuracy of training data obtained was 85.7%. In addition, for testing data, 27 cases were correctly predicted and 1 case was wrongly predicted from class “0”, while, 31 data of class “1” were wrongly predicted in class “0” and 135 data in this class were correctly estimated. Finally, the confusion matrix of all data shows that the data of class “0” and “1” were predicted with 97.1% and 82.6% accuracy and, consequently, the accuracy of the total data is reached with highly acceptable degrees of accuracy at 85.2%.

For more evaluating, the results were assessed by another three performance indexes, namely, Precision, Recall, F1 score [102]. Figure 8 shows evaluation of confusion matrix by accuracy and in comparison with other techniques. In this analysis and evaluation, although the recall is lower than the other performance index, the results of this method should be considered together based upon the results of precision, of which, finally, the results show that the optimum developed model can provide the desired performance capability in estimating the number of vehicles involved in an accident.

In classification problems, using a receiver operating characteristic (ROC) curve can play a key role in analysis results which is a probability-based curve. Hence, the ROC curve was also used to evaluate the results provided by the 16th model. Figure 9 indicates the results for training, testing and all data based on the ROC curve. It should be noted that the threshold was considered at 0.5 which is a common value in this case. According to the performance of the 16th model, which was better than other developed models, the area under curve (AUC) of the 16th model is higher in comparison with other developed models. The value of the AUC obtained for evaluating the performance of the developed binary classification model ranges between 0 and 1. It is worth mentioning that the value of AUC equal and less than 0.5 shows that the performance of the developed model is not accepted, while this value is higher than 0.5 for the train, test and total ROC curve.

Furthermore, based on these analyses, the following remarks and results can be highlighted:

-: The correlation analysis showed that input data including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed were correctly considered for the binary classification;
-: Figure 6, Figure 7, Figure 8 and Figure 9 depict that the GMDH algorithm has a high capability to train and develop the model, which can correctly predict 661 data of the first and second classes from 775 data (total). Additionally, on the basis of the acquired results of confusion matrices, the results were assessed by the other three performance indexes and they indicated that the proposed model can provide higher performance capacity in evaluation of safety in transportation system;
-: Consequently, it can be concluded that the proposed binary classification model based on the GMDH algorithm was a reliable and alternative model instead of the classical model with a high appropriate acceptable degree to predict the number of vehicles involved in an accident, which may lead transportation engineers toward a greater accuracy and robustness of design and planning of roads by eventually investigating opportune countermeasures to reduce the safety risk;
-: It is worth mentioning that the binary classification model presented in this study is a model developed for the road network of the Cosenza area, which requires a more in-depth analysis to be transferred to other contexts;
-: In spite of the fact that the developed model was a reliable system model for evaluation of safety in transportation systems of this case study, it does not have capability for investigation of safety in transportation systems with incomplete data.

5. Conclusions

Assessing safety due to the ambiguity and uncertainty which exist in the effective parameters affecting accidents is not an easy task. Hence, artificial intelligence (AI) and machine learning (ML) are effective methods to evaluate some recurring problems in transportation engineering, especially in road safety assessment. In this study, the main aim is the prediction of the number of vehicles involved in an accident to assess safety using the GMDH algorithm. This was accomplished using 775 accident cases obtained from the urban and rural areas of Cosenza in southern Italy. Several important parameters such as Daylight, Weekday, Type of accident, Location, Speed limit and Average speed were selected as input data and the number of vehicles involved in an accident was considered as output. Generally, 20 developed models were constructed based on three control parameters of algorithms including selection pressure, maximum number of layers and maximum number of neurons in a layer. In addition, in this modeling, 75% of the whole data set were selected for training and the rest considered for the testing dataset and the accuracy of each model was determined according to the confusion matrix. Finally, the 16th model with 85.7% and 83.5% accuracy for the training and testing dataset was selected as the best developed binary classification model. Furthermore, the authors intend to compare the results obtained for the analyzed case study to those obtained for other contexts and to provide a robust analysis of the model transferability. More efforts need to be made to investigate other parameters affecting the number of vehicles involved in an accident based on the dataset available, also for other regions or other countries. It is worth mentioning that road safety depends on the concurrency of three main factors: human behavior, infrastructure and environment; so it is necessary to model, as well as possible, the complex relationships existing among latent and real variables by coupling AI and ML techniques with other classic techniques such as Structural Equation Models (SEM). In future works, it is recommended to see the effectiveness of other types of artificial intelligence and machine-learning methods in order to improve analysis for a binary classification such as Learning Vector Quantization (LVQ) and Naive Bayes (NB) algorithm, and then comparing results with a logit model.

Author Contributions

G.G., S.S.H. (Sina Shaffiee Haghshenas) and S.S.H. (Sami Shaffiee Haghshenas) were responsible for conceptualization and methodology. G.G. and A.V. analyzed the study context and extracted the dataset. A.V., V.A., and V.G. performed supervision, review and editing. S.S.H. (Sina Shaffiee Haghshenas) and S.S.H. (Sami Shaffiee Haghshenas) carried on the statistical analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We would like to express our deepest thanks to Mahdi Ghaem for his excellent advice.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mannering, F.L.; Bhat, C.R.; Shankar, V.; Abdel-Aty, M. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis. Anal. Methods Accid. Res. 2020, 25, 100113. [Google Scholar] [CrossRef]
Mannering, F.L. Temporal instability and the analysis of highway accident data. Anal. Methods Accid. Res. 2018, 17, 1–13. [Google Scholar] [CrossRef]
Imprialou, M.; Quddus, M. Crash data quality for road safety research: Current state and future directions. Accid. Anal. Prev. 2019, 130, 84–90. [Google Scholar] [CrossRef] [PubMed]
Schlögl, M.; Stütz, R. Methodological considerations with data uncertainty in road safety analysis. Accid. Anal. Prev. 2019, 130, 136–150. [Google Scholar] [CrossRef]
Gomes, S.V. The influence of the infrastructure characteristics in urban road accidents occurrence. Accid. Anal. Prev. 2013, 60, 289–297. [Google Scholar] [CrossRef]
Theofilatos, A.; Yannis, G. A review of the effect of traffic and weather characteristics on road safety. Accid. Anal. Prev. 2014, 72, 244–256. [Google Scholar] [CrossRef]
Papadimitriou, E.; Filtness, A.; Theofilatos, A.; Ziakopoulos, A.; Quigley, C.; Yannis, G. Review and ranking of crash risk factors related to the road infrastructure. Accid. Anal. Prev. 2019, 125, 85–97. [Google Scholar] [CrossRef]
Hossain, M.; Abdel-Aty, M.; Quddus, M.; Muromachi, Y.; Sadeek, S.N. Real-time crash prediction models: State-of-the-art, design pathways and ubiquitous requirements. Accid. Anal. Prev. 2019, 124, 66–84. [Google Scholar] [CrossRef]
Ziakopoulos, A.; Yannis, G. A review of spatial approaches in road safety. Accid. Anal. Prev. 2020, 135, 105323. [Google Scholar] [CrossRef]
Elvik, R.; Vaa, T.; Hoye, A.; Sorensen, M. (Eds.) The Handbook of Road Safety Measures; Emerald Group Publishing: West Yorkshire, UK, 2009. [Google Scholar]
Vaiana, R.; Iuele, T.; Gallelli, V.; Rogano, D. Demanded versus assumed friction along horizontal curves: An on-the-road experimental investigation. J. Transp. Saf. Secur. 2017, 10, 318–344. [Google Scholar] [CrossRef]
Lee, J.; Abdel-Aty, M.; De Blasiis, M.R.; Wang, X.; Mattei, I. International transferability of macro-level safety performance functions: A case study of the United States and Italy. Transp. Saf. Environ. 2019, 1, 68–78. [Google Scholar] [CrossRef]
Greibe, P. Accident prediction models for urban roads. Accid. Anal. Prev. 2003, 35, 273–285. [Google Scholar] [CrossRef]
Saeed, T.U.; Hall, T.; Baroud, H.; Volovski, M.J. Analyzing road crash frequencies with uncorrelated and correlated random-parameters count models: An empirical assessment of multilane highways. Anal. Methods Accid. Res. 2019, 23, 100101. [Google Scholar] [CrossRef]
Singleton, M.; Qin, H.; Luan, J. Factors Associated with Higher Levels of Injury Severity in Occupants of Motor Vehicles That Were Severely Damaged in Traffic Crashes in Kentucky, 2000-2001. Traffic Inj. Prev. 2004, 5, 144–150. [Google Scholar] [CrossRef]
Dissanayake, S.; Lu, J.J. Factors influential in making an injury severity difference to older drivers involved in fixed object-passenger car crashes. Accid. Anal. Prev. 2002, 34, 609–618. [Google Scholar] [CrossRef]
Hanrahan, R.B.; Layde, P.M.; Zhu, S.; Guse, C.E.; Hargarten, S.W. The Association of Driver Age with Traffic Injury Severity in Wisconsin. Traffic Inj. Prev. 2009, 10, 361–367. [Google Scholar] [CrossRef] [PubMed]
Kwon, O.H.; Rhee, W.; Yoon, Y. Application of classification algorithms for analysis of road safety risk factor dependencies. Accid. Anal. Prev. 2015, 75, 1–15. [Google Scholar] [CrossRef] [PubMed]
Cafiso, S.; D’Agostino, C. Assessing the stochastic variability of the Benefit-Cost ratio in roadway safety management. Accid. Anal. Prev. 2016, 93, 189–197. [Google Scholar] [CrossRef] [PubMed]
Han, C.; Huang, H.; Lee, J.; Wang, J. Investigating varying effect of road-level factors on crash frequency across regions: A Bayesian hierarchical random parameter modeling approach. Anal. Methods Accid. Res. 2018, 20, 81–91. [Google Scholar] [CrossRef]
Briz-Redón, Á.; Martínez, F.; Montes, F. Spatial analysis of traffic accidents near and between road intersections in a directed linear network. Accid. Anal. Prev. 2019, 132, 105252. [Google Scholar] [CrossRef]
Khattak, A.J.; Kantor, P.; Council, F.M. Role of Adverse Weather in Key Crash Types on Limited-Access: Roadways Implications for Advanced Weather Systems. Transp. Res. Rec. J. Transp. Res. Board 1998, 1621, 10–19. [Google Scholar] [CrossRef]
Kockelman, K.M.; Kweon, Y.-J. Driver injury severity: An application of ordered probit models. Accid. Anal. Prev. 2002, 34, 313–321. [Google Scholar] [CrossRef]
Kaplan, S.; Prato, C.G. Risk factors associated with bus accident severity in the United States: A generalized ordered logit model. J. Saf. Res. 2012, 43, 171–180. [Google Scholar] [CrossRef]
Mohamed, M.G.; Saunier, N.; Miranda-Moreno, L.; Ukkusuri, S.V. A clustering regression approach: A comprehensive injury severity analysis of pedestrian–vehicle crashes in New York, US and Montreal, Canada. Saf. Sci. 2013, 54, 27–37. [Google Scholar] [CrossRef]
Rivière, C.; Lauret, P.; Ramsamy, J.M.; Page, Y. A Bayesian Neural Network approach to estimating the Energy Equivalent Speed. Accid. Anal. Prev. 2006, 38, 248–259. [Google Scholar] [CrossRef]
Huang, H.; Chin, H.; Haque, M. Empirical Evaluation of Alternative Approaches in Identifying Crash Hot Spots. Transp. Res. Rec. J. Transp. Res. Board 2009, 2103, 32–41. [Google Scholar] [CrossRef]
De Oña, J.; López, G.; Mujalli, R.; Calvo-Poyo, F. Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid. Anal. Prev. 2013, 51, 1–10. [Google Scholar] [CrossRef] [PubMed]
Zeng, Q.; Huang, H. Bayesian spatial joint modeling of traffic crashes on an urban road network. Accid. Anal. Prev. 2014, 67, 105–112. [Google Scholar] [CrossRef] [PubMed]
Shi, Q.; Abdel-Aty, M.; Lee, J. A Bayesian ridge regression analysis of congestion’s impact on urban expressway safety. Accid. Anal. Prev. 2016, 88, 124–137. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Chang, F.; Zhou, H.; Lee, J. Modeling unobserved heterogeneity for zonal crash frequencies: A Bayesian multivariate random-parameters model with mixture components for spatially correlated data. Anal. Methods Accid. Res. 2019, 24, 100105. [Google Scholar] [CrossRef]
Oviedo-Trespalacios, O.; Afghari, A.P.; Haque, M. A hierarchical Bayesian multivariate ordered model of distracted drivers’ decision to initiate risk-compensating behaviour. Anal. Methods Accid. Res. 2020, 26, 100121. [Google Scholar] [CrossRef]
Lee, C.; Abdel-Aty, M. Presence of passengers: Does it increase or reduce driver’s crash potential? Accid. Anal. Prev. 2008, 40, 1703–1712. [Google Scholar] [CrossRef] [PubMed]
Shankar, V.; Mannering, F.L.; Barfield, W. Statistical analysis of accident severity on rural freeways. Accid. Anal. Prev. 1996, 28, 391–401. [Google Scholar] [CrossRef]
Shankar, V.; Mannering, F.L. An exploratory multinomial logit analysis of single-vehicle motorcycle accident severity. J. Saf. Res. 1996, 27, 183–194. [Google Scholar] [CrossRef]
Hu, S.-R.; Li, C.-S.; Lee, C.-K. Investigation of key factors for accident severity at railroad grade crossings by using a logit model. Saf. Sci. 2010, 48, 186–194. [Google Scholar] [CrossRef]
Hu, W.; Donnell, E. Severity models of cross-median and rollover crashes on rural divided highways in Pennsylvania. J. Saf. Res. 2011, 42, 375–382. [Google Scholar] [CrossRef]
Dimitriou, L.; Stylianou, K.; Abdel-Aty, M. Assessing rear-end crash potential in urban locations based on vehicle-by-vehicle interactions, geometric characteristics and operational conditions. Accid. Anal. Prev. 2018, 118, 221–235. [Google Scholar] [CrossRef]
Hamed, M.M.; Al-Eideh, B.M. An exploratory analysis of traffic accidents and vehicle ownership decisions using a random parameters logit model with heterogeneity in means. Anal. Methods Accid. Res. 2020, 25, 100116. [Google Scholar] [CrossRef]
Eluru, N.; Bhat, C.R. A joint econometric analysis of seat belt use and crash-related injury severity. Accid. Anal. Prev. 2007, 39, 1037–1049. [Google Scholar] [CrossRef]
Milton, J.C.; Shankar, V.N.; Mannering, F.L. Highway accident severities and the mixed logit model: An exploratory empirical analysis. Accid. Anal. Prev. 2008, 40, 260–266. [Google Scholar] [CrossRef]
Malyshkina, N.V.; Mannering, F.L. Empirical assessment of the impact of highway design exceptions on the frequency and severity of vehicle accidents. Accid. Anal. Prev. 2010, 42, 131–139. [Google Scholar] [CrossRef] [PubMed]
Christoforou, Z.; Cohen, S.; Karlaftis, M.G. Vehicle occupant injury severity on highways: An empirical investigation. Accid. Anal. Prev. 2010, 42, 1606–1620. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Siddiqui, C.; Abdel-Aty, M. Indexing crash worthiness and crash aggressivity by vehicle type. Accid. Anal. Prev. 2011, 43, 1364–1370. [Google Scholar] [CrossRef] [PubMed]
Ye, F.; Lord, D. Comparing three commonly used crash severity models on sample size requirements: Multinomial logit, ordered probit and mixed logit models. Anal. Methods Accid. Res. 2014, 1, 72–85. [Google Scholar] [CrossRef]
Mannering, F.L.; Shankar, V.N.; Bhat, C.R. Unobserved heterogeneity and the statistical analysis of highway accident data. Anal. Methods Accid. Res. 2016, 11, 1–16. [Google Scholar] [CrossRef]
Harrell, F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2015; pp. 1–11. [Google Scholar]
Cohen, J.; Cohen, P.; West, S.G.; Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates, Inc.: Mahwah, NJ, USA, 2003. [Google Scholar]
Tabachnick, B.G.; Fidell, L.S. Using Multivariate Statistics, 6th ed.; Pearson: Boston, MA, USA, 2012. [Google Scholar]
Siddiqui, C.; Abdel-Aty, M.; Huang, H. Aggregate nonparametric safety analysis of traffic zones. Accid. Anal. Prev. 2012, 45, 317–325. [Google Scholar] [CrossRef]
Chang, L.-Y.; Wang, H.-W. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accid. Anal. Prev. 2006, 38, 1019–1027. [Google Scholar] [CrossRef]
Yan, X.; Radwan, E. Analyses of Rear-End Crashes Based on Classification Tree Models. Traffic Inj. Prev. 2006, 7, 276–282. [Google Scholar] [CrossRef]
Pande, A.; Abdel-Aty, M. Assessment of freeway traffic parameters leading to lane-change related collisions. Accid. Anal. Prev. 2006, 38, 936–948. [Google Scholar] [CrossRef]
Chen, C.; Zhang, G.; Yang, J.; Milton, J.C.; Alcántara, A. “Dely” An explanatory analysis of driver injury severity in rear-end crashes using a decision table/Naïve Bayes (DTNB) hybrid classifier. Accid. Anal. Prev. 2016, 90, 95–107. [Google Scholar] [CrossRef]
Li, Z.; Liu, P.; Wang, W.; Xu, C. Using support vector machine models for crash injury severity analysis. Accid. Anal. Prev. 2012, 45, 478–486. [Google Scholar] [CrossRef] [PubMed]
Yu, R.; Abdel-Aty, M. Analyzing crash injury severity for a mountainous freeway incorporating real-time traffic and weather data. Saf. Sci. 2014, 63, 50–56. [Google Scholar] [CrossRef]
Chen, C.; Zhang, G.; Qian, Z.; Tarefder, R.A.; Tian, Z. Investigating driver injury severity patterns in rollover crashes using support vector machine models. Accid. Anal. Prev. 2016, 90, 128–139. [Google Scholar] [CrossRef] [PubMed]
Gu, X.; Li, T.; Wang, Y.; Zhang, L.; Wang, Y.; Yao, J. Traffic fatalities prediction using support vector machine with hybrid particle swarm optimization. J. Algorithms Comput. Technol. 2017, 12, 20–29. [Google Scholar] [CrossRef]
Wang, J.; Liu, B.; Fu, T.; Liu, S.; Stipancic, J. Modeling when and where a secondary accident occurs. Accid. Anal. Prev. 2019, 130, 160–166. [Google Scholar] [CrossRef]
Abdelwahab, H.T.; Abdel-Aty, M.A. Development of Artificial Neural Network Models to Predict Driver Injury Severity in Traffic Accidents at Signalized Intersections. Transp. Res. Rec. J. Transp. Res. Board 2001, 1746, 6–13. [Google Scholar] [CrossRef]
Lu, J.; Chen, S.; Wang, W.; Van Zuylen, H.J. A hybrid model of partial least squares and neural network for traffic incident detection. Expert Syst. Appl. 2012, 39, 4775–4784. [Google Scholar] [CrossRef]
Ali, G.A.; Tayfour, A. Characteristics and Prediction of Traffic Accident Casualties In Sudan Using Statistical Modeling and Artificial Neural Networks. Int. J. Transp. Sci. Technol. 2012, 1, 305–317. [Google Scholar] [CrossRef]
Deka, L.; Quddus, M. Network-level accident-mapping: Distance based pattern matching using artificial neural network. Accid. Anal. Prev. 2014, 65, 105–113. [Google Scholar] [CrossRef]
Mussone, L.; Bassani, M.; Masci, P. Analysis of factors affecting the severity of crashes in urban road intersections. Accid. Anal. Prev. 2017, 103, 112–122. [Google Scholar] [CrossRef]
Huang, H.; Han, C.; Xu, G.; Jiang, M.; Wong, S.; Haque, M. Incorporating safety reliability into route choice model: Heterogeneous crash risk aversions. Anal. Methods Accid. Res. 2020, 25, 100112. [Google Scholar] [CrossRef]
Li, Y.; Ma, D.; Zhu, M.; Zeng, Z.; Wang, Y. Identification of significant factors in fatal-injury highway crashes using genetic algorithm and neural network. Accid. Anal. Prev. 2018, 111, 354–363. [Google Scholar] [CrossRef] [PubMed]
Amiri, A.M.; Sadri, A.; Nadimi, N.; Shams, M. A comparison between Artificial Neural Network and Hybrid Intelligent Genetic Algorithm in predicting the severity of fixed object crashes among elderly drivers. Accid. Anal. Prev. 2020, 138, 105468. [Google Scholar] [CrossRef] [PubMed]
Zeng, Q.; Huang, H. A stable and optimized neural network model for crash injury severity prediction. Accid. Anal. Prev. 2014, 73, 351–358. [Google Scholar] [CrossRef] [PubMed]
Delen, D.; Sharda, R.; Bessonov, M. Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks. Accid. Anal. Prev. 2006, 38, 434–444. [Google Scholar] [CrossRef]
Wang, J.; Luo, T.; Fu, T. Crash prediction based on traffic platoon characteristics using floating car trajectory data and the machine learning approach. Accid. Anal. Prev. 2019, 133, 105320. [Google Scholar] [CrossRef]
Liu, J.; Boyle, L.N.; Banerjee, A.G. Predicting interstate motor carrier crash rate level using classification models. Accid. Anal. Prev. 2018, 120, 211–218. [Google Scholar] [CrossRef]
Formosa, N.; Quddus, M.; Ison, S.; Abdel-Aty, M.; Yuan, J. Predicting real-time traffic conflicts using deep learning. Accid. Anal. Prev. 2020, 136, 105429. [Google Scholar] [CrossRef]
Iranitalab, A.; Khattak, A.J. Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev. 2017, 108, 27–36. [Google Scholar] [CrossRef]
Tang, J.; Zheng, L.; Han, C.; Yin, W.; Zhang, Y.; Zou, Y.; Huang, H. Statistical and machine-learning methods for clearance time prediction of road incidents: A methodology review. Anal. Methods Accid. Res. 2020, 27, 100123. [Google Scholar] [CrossRef]
Huang, T.; Wang, S.; Sharma, A. Highway crash detection and risk estimation using deep learning. Accid. Anal. Prev. 2020, 135, 105392. [Google Scholar] [CrossRef] [PubMed]
Xie, K.; Yang, D.; Ozbay, K.; Yang, H. Use of real-world connected vehicle data in identifying high-risk locations based on a new surrogate safety measure. Accid. Anal. Prev. 2019, 125, 311–319. [Google Scholar] [CrossRef] [PubMed]
Geem, Z.W.; Chung, S.Y.; Kim, J.-H. Improved Optimization for Wastewater Treatment and Reuse System Using Computational Intelligence. Complexity 2018, 2018, 1–8. [Google Scholar] [CrossRef]
Park, S.H.; Jang, Y.-H.; Geem, Z.W.; Lee, S.-H. CityGML-Based Road Information Model for Route Optimization of Snow-Removal Vehicle. ISPRS Int. J. Geo-Inf. 2019, 8, 588. [Google Scholar] [CrossRef]
Hosseini, S.M.; Ataei, M.; Khalokakaei, R.; Mikaeil, R.; Haghshenas, S.S. Investigating the role of coolant and lubricant fluids on the performance of cutting disks (case study: Hard rocks). Rudarsko-Geološko-Naftni zbornik 2019, 34, 13–25. [Google Scholar] [CrossRef]
Dormishi, A.; Ataei, M.; Mikaeil, R.; Khalokakaei, R.; Haghshenas, S.S. Evaluation of gang saws’ performance in the carbonate rock cutting process using feasibility of intelligent approaches. Eng. Sci. Technol. Int. J. 2019, 22, 990–1000. [Google Scholar] [CrossRef]
Mikaeil, R.; Haghshenas, S.S.; Sedaghati, Z. Geotechnical risk evaluation of tunneling projects using optimization techniques (case study: The second part of Emamzade Hashem tunnel). Nat. Hazards 2019, 97, 1099–1113. [Google Scholar] [CrossRef]
Golafshani, E.M.; Behnood, A.; Arashpour, M. Predicting the compressive strength of normal and High-Performance Concretes using ANN and ANFIS hybridized with Grey Wolf Optimizer. Constr. Build. Mater. 2020, 232, 117266. [Google Scholar] [CrossRef]
Ivakhnenko, A.G. Polynomial Theory of Complex Systems. IEEE Trans. Syst. Man. Cybern. 1971, 1, 364–378. [Google Scholar] [CrossRef]
Ivakhnenko, A.G. Self-Organizing Methods in Modelling and Clustering: GMDH Type Algorithms; Systems Analysis and Simulation I; Springer: New York, NY, USA, 1988; pp. 86–88. ISBN 978-0-387-97091-2. [Google Scholar]
Fiorini Morosini, A.; Haghshenas, S.S.; Haghshenas, S.S.; Geem, Z.W. Development of a Binary Model for Evaluating Water Distribution Systems by a Pressure Driven Analysis (PDA) Approach. Appl. Sci. 2020, 10, 3029. [Google Scholar] [CrossRef]
Sezavar, R.; Shafabakhsh, G.; Mirabdolazimi, S. New model of moisture susceptibility of nano silica-modified asphalt concrete using GMDH algorithm. Constr. Build. Mater. 2019, 211, 528–538. [Google Scholar] [CrossRef]
Dag, O.; Karabulut, E.; Alpar, R. GMDH2: Binary Classification via GMDH-Type Neural Network Algorithms—R Package and Web-Based Tool. Int. J. Comput. Intell. Syst. 2019, 12, 649. [Google Scholar] [CrossRef]
Dag, O.; Kasikci, M.; Karabulut, E.; Alpar, R. Diverse classifiers ensemble based on GMDH-type neural network algorithm for binary classification. Commun. Stat.-Simul. Comput. 2019, 1–17. [Google Scholar] [CrossRef]
Mikaeil, R.; Haghshenas, S.S.; Ozcelik, Y.; Gharehgheshlagh, H.H. Performance Evaluation of Adaptive Neuro-Fuzzy Inference System and Group Method of Data Handling-Type Neural Network for Estimating Wear Rate of Diamond Wire Saw. Geotech. Geol. Eng. 2018, 36, 3779–3791. [Google Scholar] [CrossRef]
Feng, X.; Li, S.; Yuan, C.; Zeng, P.; Sun, Y. Prediction of Slope Stability using Naive Bayes Classifier. KSCE J. Civ. Eng. 2018, 22, 941–950. [Google Scholar] [CrossRef]
Hosseini, S.M.; Ataei, M.; Khalokakaei, R.; Mikaeil, R.; Haghshenas, S.S. Study of the effect of the cooling and lubricant fluid on the cutting performance of dimension stone through artificial intelligence models. Eng. Sci. Technol. Int. J. 2020, 23, 71–81. [Google Scholar] [CrossRef]
Noori, A.M.; Mikaeil, R.; Mokhtarian, M.; Haghshenas, S.S.; Foroughi, M. Feasibility of Intelligent Models for Prediction of Utilization Factor of TBM. Geotech. Geol. Eng. 2020, 38, 3125–3143. [Google Scholar] [CrossRef]
Pirouz, B.; Haghshenas, S.S.; Haghshenas, S.S.; Piro, P. Investigating a Serious Challenge in the Sustainable Development Process: Analysis of Confirmed cases of COVID-19 (New Type of Coronavirus) Through a Binary Classification Using Artificial Intelligence and Regression Analysis. Sustainability 2020, 12, 2427. [Google Scholar] [CrossRef]
Salemi, A.; Mikaeil, R.; Haghshenas, S.S. Integration of Finite Difference Method and Genetic Algorithm to Seismic analysis of Circular Shallow Tunnels (Case Study: Tabriz Urban Railway Tunnels). KSCE J. Civ. Eng. 2017, 22, 1978–1990. [Google Scholar] [CrossRef]
Aryafar, A.; Mikaeil, R.; Haghshenas, S.S.; Haghshenas, S.S. Application of metaheuristic algorithms to optimal clustering of sawing machine vibration. Measurement 2018, 124, 20–31. [Google Scholar] [CrossRef]
Mikaeil, R.; Haghshenas, S.S.; Hoseinie, S.H. Rock Penetrability Classification Using Artificial Bee Colony (ABC) Algorithm and Self-Organizing Map. Geotech. Geol. Eng. 2017, 36, 1309–1318. [Google Scholar] [CrossRef]
Mohammadi, D.; Mikaeil, R.; Abdollahi-Sharif, J. Implementation of an optimized binary classification by GMDH-type neural network algorithm for predicting the blast produced ground vibration. Expert Syst. 2020, e12563. [Google Scholar] [CrossRef]
ACI-ISTAT. Localizzazione Incidenti Stradali. Available online: http://www.aci.it/laci/studi-e-ricerche/dati-e-statistiche/incidentalita.html (accessed on 10 January 2020).
Dutta, N.; Fontaine, M.D. Improving freeway segment crash prediction models by including disaggregate speed data from different sources. Accid. Anal. Prev. 2019, 132, 105253. [Google Scholar] [CrossRef] [PubMed]
Looney, C.G. Advances in feedforward neural networks: Demystifying knowledge acquiring black boxes. IEEE Trans. Knowl. Data Eng. 1996, 8, 211–226. [Google Scholar] [CrossRef]
Zorlu, K.; Gokceoglu, C.; Ocakoglu, F.; Nefeslioglu, H.; Acikalin, S. Prediction of uniaxial compressive strength of sandstones using petrography-based models. Eng. Geol. 2008, 96, 141–158. [Google Scholar] [CrossRef]
Faradonbeh, R.S.; Haghshenas, S.S.; Taheri, A.; Mikaeil, R. Application of self-organizing map and fuzzy c-mean techniques for rockburst clustering in deep underground projects. Neural Comput. Appl. 2019, 32, 8545–8559. [Google Scholar] [CrossRef]

Figure 1. The flowchart of study. GMDH: Group Method of Data Handling.

Figure 2. Basic structure of GMDH algorithm [89].

Figure 3. Basic form of confusion matrix for a two-cluster problem.

Figure 4. Accident locations in Cosenza province (years 2017–2018). Source: CRISC Regione Calabria.

Figure 5. Bluetooth radar sensor used for the traffic surveys.

Figure 6. The value of RMSE (root mean square error) in each layer for 16th developed model using GMDH algorithm.

Figure 7. Results of the 16th developed model for training dataset (a), testing dataset (b) and total dataset (c).

Figure 8. The obtained results of the 16th developed model.

Figure 9. ROC curve of the 16th developed model for training dataset (a), testing dataset (b) and total dataset (c).

Table 1. Accident database fields considered.

Data Field Type	Data Field	Description
Human characteristic	Driver gender	Male or female
Vehicle characteristic	Vehicle type	Car, motorcycle, truck and other
Road environment	Road type	National rural road, provincial rural road, national and provincial rural road in urban context, urban road
Road environment	Geometric element	Straight, curve, crossroad, signalized intersection, traffic light
Other environment	Date	Date of the accident
	Light conditions	Daylight and nighttime
	Day of the week	Weekday and weekend
Location environment	Macro area location	Urban and rural
Accident characteristic	Number of vehicles	Number of vehicles involved
	Accident nature	Way out, collision with an accidental obstacle, side collision, front-side collision, rear-end collision, head-on collision, pedestrian collision, impact with parked vehicle, impact with stopped vehicle, fall from vehicle, sudden braking
	Accident severity	Injuries and deaths

Table 2. Accident database fields considered.

	Daylight	Type of Accident	Weekday	Location	Speed Limit	Average Speed
Daylight	1
Type of accident	−0.03	1
Weekday	−0.12	−0.01	1
Location	0.05	−0.16	0.01	1
Speed Limit	0.01	−0.21	0.01	0.29	1
Average Speed	0.01	−0.15	0.03	0.16	0.85	1

Table 3. The effect of control parameters on performance of GMDH algorithm.

Model No.	SP	MNL	MNNL	Accuracy of Training (%)	Accuracy of Testing (%)
1	0.6	5	5	81.2	76.2
2	0.6	5	10	80.6	76.8
3	0.6	5	20	81.9	78.9
4	0.6	5	30	81.1	77.8
5	0.6	10	5	81.6	76.8
6	0.6	10	10	81.4	77.5
7	0.6	10	20	82.6	80.9
8	0.6	10	30	82.8	77.8
9	0.6	15	5	81.4	81.2
10	0.6	15	10	82.8	82
11	0.6	15	20	82.6	78.5
12	0.6	15	30	81.6	80.9
13	0.6	20	5	80.4	80.2
14	0.6	20	10	82.5	81.9
15	0.6	20	20	81.1	79.9
16	0.6	20	30	85.7	83.5
17	0.6	30	5	81.6	80.9
18	0.6	30	10	79.4	78.8
19	0.6	30	20	81.1	75.3
20	0.6	30	30	83.5	80.9

Table 4. Ranking of each developed model.

Model No.	SP	MNL	MNNL	Ranking for Accuracy of Training	Ranking for Accuracy of Testing	Total Rank
1	0.6	5	5	12	10	22
2	0.6	5	10	10	11	21
3	0.6	5	20	15	16	31
4	0.6	5	30	11	13	24
5	0.6	10	5	14	11	25
6	0.6	10	10	13	12	25
7	0.6	10	20	17	18	35
8	0.6	10	30	18	13	31
9	0.6	15	5	13	12	25
10	0.6	15	10	18	13	31
11	0.6	15	20	17	14	31
12	0.6	15	30	14	18	32
13	0.6	20	5	9	17	26
14	0.6	20	10	16	19	35
15	0.6	20	20	11	16	27
16	0.6	20	30	20	20	40
17	0.6	30	5	14	18	32
18	0.6	30	10	8	15	23
19	0.6	30	20	11	9	20
20	0.6	30	30	19	18	37

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guido, G.; Haghshenas, S.S.; Haghshenas, S.S.; Vitale, A.; Gallelli, V.; Astarita, V. Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm. Sustainability 2020, 12, 6735. https://doi.org/10.3390/su12176735

AMA Style

Guido G, Haghshenas SS, Haghshenas SS, Vitale A, Gallelli V, Astarita V. Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm. Sustainability. 2020; 12(17):6735. https://doi.org/10.3390/su12176735

Chicago/Turabian Style

Guido, Giuseppe, Sina Shaffiee Haghshenas, Sami Shaffiee Haghshenas, Alessandro Vitale, Vincenzo Gallelli, and Vittorio Astarita. 2020. "Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm" Sustainability 12, no. 17: 6735. https://doi.org/10.3390/su12176735

APA Style

Guido, G., Haghshenas, S. S., Haghshenas, S. S., Vitale, A., Gallelli, V., & Astarita, V. (2020). Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm. Sustainability, 12(17), 6735. https://doi.org/10.3390/su12176735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm

Abstract

1. Introduction

2. Methods

2.1. Group Method of Data Handling (GMDH) Type of Neural Network

2.2. Correlation Analysis

2.3. Binary Modeling

3. Case Study

3.1. Data Collection and Preparation

3.2. Correlation Analysis

3.3. Binary Modeling

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Model No.	SP	MNL	MNNL	Ranking for Accuracy of Training	Ranking for Accuracy of Testing	Total Rank
1	0.6	5	5	12	10	22
2	0.6	5	10	10	11	21
3	0.6	5	20	15	16	31
4	0.6	5	30	11	13	24
5	0.6	10	5	14	11	25
6	0.6	10	10	13	12	25
7	0.6	10	20	17	18	35
8	0.6	10	30	18	13	31
9	0.6	15	5	13	12	25
10	0.6	15	10	18	13	31
11	0.6	15	20	17	14	31
12	0.6	15	30	14	18	32
13	0.6	20	5	9	17	26
14	0.6	20	10	16	19	35
15	0.6	20	20	11	16	27
16	0.6	20	30	20	20	40
17	0.6	30	5	14	18	32
18	0.6	30	10	8	15	23
19	0.6	30	20	11	9	20
20	0.6	30	30	19	18	37

Model No.	SP	MNL	MNNL	Ranking for Accuracy of Training	Ranking for Accuracy of Testing	Total Rank
1	0.6	5	5	12	10	22
2	0.6	5	10	10	11	21
3	0.6	5	20	15	16	31
4	0.6	5	30	11	13	24
5	0.6	10	5	14	11	25
6	0.6	10	10	13	12	25
7	0.6	10	20	17	18	35
8	0.6	10	30	18	13	31
9	0.6	15	5	13	12	25
10	0.6	15	10	18	13	31
11	0.6	15	20	17	14	31
12	0.6	15	30	14	18	32
13	0.6	20	5	9	17	26
14	0.6	20	10	16	19	35
15	0.6	20	20	11	16	27
16	0.6	20	30	20	20	40
17	0.6	30	5	14	18	32
18	0.6	30	10	8	15	23
19	0.6	30	20	11	9	20
20	0.6	30	30	19	18	37

Model No.	SP	MNL	MNNL	Ranking for Accuracy of Training	Ranking for Accuracy of Testing	Total Rank
1	0.6	5	5	12	10	22
2	0.6	5	10	10	11	21
3	0.6	5	20	15	16	31
4	0.6	5	30	11	13	24
5	0.6	10	5	14	11	25
6	0.6	10	10	13	12	25
7	0.6	10	20	17	18	35
8	0.6	10	30	18	13	31
9	0.6	15	5	13	12	25
10	0.6	15	10	18	13	31
11	0.6	15	20	17	14	31
12	0.6	15	30	14	18	32
13	0.6	20	5	9	17	26
14	0.6	20	10	16	19	35
15	0.6	20	20	11	16	27
16	0.6	20	30	20	20	40
17	0.6	30	5	14	18	32
18	0.6	30	10	8	15	23
19	0.6	30	20	11	9	20
20	0.6	30	30	19	18	37