1. Introduction
Road safety is a matter of major concern and significantly affects people worldwide. According to the World Health Organization (WHO), road accidents are the 8th leading cause of death for people of all ages and the 1st leading cause for people aged between 5 and 29 years old [
1]. Worldwide, approximately 1.3 million human lives are lost each year, with significant consequences for society. As a result, the European Union and World Health Organization have set a goal of reducing fatal road accidents by 50% for the decade 2021–2030, with a special emphasis on the contribution of new technologies in the field of road safety.
Generally, road safety is affected by many different risk factors such as the driver’s state and environmental and traffic conditions [
2]. However, human error still has a major contribution to traffic collisions [
3]. The continuous development in the field of automatic vehicles aims to improve road safety, excluding the human element from the task of driving [
4]. In addition, the use of intelligent driving behavior monitoring systems for real-time interventions proved to be particularly effective in improving road safety [
5].
In recent years the research community has had a crucial role in the evolution of Intelligent Transportation Systems (ITS) and specifically of Connected and Automated Vehicles (CAVs). Several published studies focus on understanding the effect of different characteristics on dangerous driving to develop the right models for recognizing risky driving behavior while setting the framework for in-vehicle interventions. Although a variety of in-vehicle and post-trip interventions have been proposed [
6,
7], there is a lack of intervention personalization and a direct link between real-time driving behavior and the triggering of interventions. In recent years, driving behavior analysis by utilizing machine learning techniques has been of high interest to the research community [
8].
The objective of the European Commission Horizon2020 project i-DREAMS (
https://idreamsproject.eu/) is to define, develop, test, and validate a ‘Safety Tolerance Zone’ (STZ) in order to ensure safe driving behavior [
9]. Through real-time monitoring of risk factors related to task complexity (e.g., traffic characteristics and weather) and coping capacity (e.g., driver’s mental state, driving behavior and vehicle current state), i-DREAMS aims to identify the level of STZ and to develop interventions in order to keep the driver within acceptable boundaries of safe operation. The STZ is divided into three levels: ‘Normal’, ‘Dangerous’ and ‘Avoidable Accident’. ‘Normal’ refers to the scenario that a crash is unlikely to occur, while ‘Dangerous’ concerns the increased possibility of crash occurrence, however, the accident is not inevitable. Lastly the ‘Avoidable Accident’ level of the STZ refers to a high possibility of crash occurrence but there is still time for the drivers to intervene in order to avoid a crash. The difference between the ‘Dangerous’ and Avoidable Accident’ level is that the need for action is more urgent in the ‘Avoidable Accident’ level.
However, linking driving characteristics to latent concept of risk or defining risk through different levels of driving behavior is a demanding task for road safety experts. Furthermore, the imbalance of road safety datasets is a well-documented problem and poses another obstacle to the correct identification of safety levels from driving behavior data. As a result, in the context of the present research, these challenges are attempted to be efficiently tackled.
Based on the aforementioned gaps in recent literature on real-time interventions and the prediction of driving behavior, this study aims to apply machine learning techniques to identify the level of the STZ concerning dangerous driving behavior and predict the duration of the time interval, which each driver spends at each level of risk, based on significant driving behavior indicators. In summary, this research proposes a framework for (a) defining the STZ levels, and (b) developing and evaluating machine learning algorithms to classify driving behavior and predict the duration that each driver spends at each risk level. This framework also exploits the most important features to identify driving behavior and takes care of dataset imbalance, which is a common problem in road safety analyses [
10]. The paper contributes to the current knowledge in a two-fold matter; initially by identifying the level of safety of drivers in real-time, which is a real-time classification problem, and consequently by predicting the duration of each safety level in real-time. In that way, practitioners and OEMs based on driving behaviour characteristics, weather and driver’s state could trigger the necessary real-time interventions according to the prevalent safety level and its corresponding duration and bring drivers back to safe conditions. Furthermore, the prediction of the duration at each STZ level is a new approach for real-time driving behaviour assessment and has not been developed in previous research. Finally, in this research, an extensive comparative analysis of the techniques used to deal with certain challenges of the driving behaviour analysis studies was performed.
It should be mentioned here, that although the authors and the project partners within i-DREAMS have already published papers on the project, and the use of imbalanced learning, the majority of the papers are either literature reviews or concerned with the single task of classifying driving conditions. The present paper is one of the first attempts to exploit a data-driven approach to define the STZ and predict both the level and the duration of each corresponding level.
The paper is structured as follows: after the introduction, an extensive literature review is conducted on driving behavior analysis using machine learning techniques. This is followed by the description of the research methodology, which includes the theoretical background of the models. Then, the collection and processing of the dataset are described. Finally, the results of the analysis are presented, in order to draw conclusions, related to road safety.
2. Literature Review
In recent years, the two main approaches that are widely used to analyze dangerous driving behavior are simulator studies and naturalistic driving studies (NDS) [
11]. According to [
12], the severity of dangerous driving is related to certain traffic, driving, vehicle, and environmental factors. Furthermore, recent studies focus on identifying driving behavior and classifying it as dangerous or safe since the real-time prediction of the safety level can trigger interventions and consequently improve road safety [
13]. In a more anthropocentric approach, studies have developed models to evaluate dangerous driving behavior based on the driver’s state [
14] and based on certain characteristics of the driver, such as demographics [
15]. Other studies have developed models of recognizing dangerous driving based on driving behavior parameters such as speed, time to collision, and time to headway [
13,
16,
17].
Risky driving behavior prediction models based on machine learning algorithms have become extremely popular, due to their high scoring accuracy. In relevant studies, the most utilized models with high performances were Random Forest (RFs; [
15]), Multilayer Perceptron (MLP; [
16]), Support Vector Machines (SVMs; [
13]) and eXtreme Gradient boosting (XGBoost; [
17]). For instance, [
16] proposed a methodology to predict and evaluate the risk of the driver in real-time, based on four safety levels of driving behavior. In this study, the proposed methodology includes feature extraction, clustering techniques, feature importance, and the development and evaluation of four machine learning algorithms (i.e., RF, XGBoost, SVM, and MLP) where accuracy is higher than 85%. [
13] applied a methodology to classify and evaluate different risk levels of driving behavior by analyzing a driving simulator dataset, developing clustering techniques in order to distinguish the different levels, and applying two classification algorithms (i.e., SVM and Decision Tree) with the highest accuracy to be 95%. Moreover, [
17] proposed a framework for risk prediction which includes applying feature selection techniques, risk levels labeling, developing methods to deal with imbalanced datasets, and evaluating a classification model (i.e., XGBoost) with an overall accuracy of 89%.
Labeling and distinguishing safety levels is a topic that has become of interest for many researchers in the past as it is a demanding process and an important one for the development of Advanced Driver Assistance Systems (ADAS). In previous studies, determination and evaluation of different risk levels of driving behavior have been accomplished based on several safety indicators, such as time to collision [
18]. However, it is difficult to set the right thresholds for different risk indicators making the process of defining safety levels problematic [
17]. As an alternative, some researchers have proposed a framework for determining the different risk levels by utilizing several clustering techniques, such as k-means and hierarchical clustering [
13,
16,
19].
Furthermore, since the analysis of driving behavior is based on a real-world dataset, there is a data imbalance problem in all previous studies in terms of their distribution in each class (i.e., safe and dangerous conditions). Specifically, in the relevant studies, dangerous behavior and the possibility of an accident are rarer in relation to safe driving behavior and non-accident, respectively. The class with the most data is called the majority class while the one with the least data is called the minority class. In real-time collision analysis problems, the ratio of the crash and non-crash ranges from 1:5 [
20] to 1:20 [
21]. The most common sampling techniques in the literature are the Synthetic Minority Oversampling Technique (SMOTE) [
16,
22,
23,
24] and Adaptive Synthetic (ADASYN) [
24]. In addition, based on the literature review in the field of road safety as well as different scientific areas, additional sampling techniques tend to be efficient methods such as the combination of SMOTE and Edited Nearest Neighbors (SMOTE-ENN) [
10], Random Oversampling, SVM-SMOTE and SMOTE-Tomek [
25].
In general, most previous studies on driving behavior analysis have focused on developing a specific framework for identifying risky driving behavior. An alternative approach is to predict the duration of driving at the different safety levels. In the framework of the research project i-DREAMS, [
9] propose the prediction of continuous indicators of risk such as the time spent at each safety level in order to tune the frequency of warnings triggered to the driver in real-time. Although to our knowledge, a similar development of the above approach has not been found in research, a similar methodology is applied to short-term traffic prediction problems [
26,
27,
28].
3. Methodology
3.1. Definition of STZ Level
As the primary aim of this research is to identify the risk level of driving behavior, i.e., the level of the STZ, it is important to identify the best way to define these different safety levels. After a brief literature review, the number of different driving safety levels was determined to be three, with labels ‘Normal’, ‘Dangerous’, and ‘Avoidable Accident’. The above three levels are defined using two groups of methods: (i) clustering methods (e.g., K-means, Hierarchical, etc.) and (ii) threshold-based methods (e.g., a threshold of Speed, Time to Collision, Time Headway, etc).
The main limitation is that the distribution of the dataset must comply with the available literature, in which dangerous driving behavior occurs less frequently. Specifically, the ‘Normal’ level must be the major class with the highest percentage of samples, while the ‘Dangerous’ and ‘Avoidable Accident’ levels must be the minority class with the lowest percentage of samples.
3.2. Feature Selection
An important step in the classification process is to perform a feature selection. Feature selection refers to the process of reducing the number of input variables to reduce computational complexity and prediction errors [
22]. Based on the literature review, two approaches are proposed, (i) correlation-based feature selection [
29], and (ii) permutation importance-based feature selection [
30].
The first approach concerns the determination of the correlation between the independent variables based on the Pearson correlation coefficient r. The values of the coefficient range between −1 and 1, where r = 0 refers to zero correlation, r = 1 to full positive correlation, and r = −1 to full negative correlation. The optimal subset consists of characteristics quite correlated with the predicted class but having minimal correlation between them [
29].
The second approach attempts to measure the importance of input variables in the classification process by permuting the feature and calculating the increase in the model’s prediction error.
3.3. Imbalanced Learning
As indicated in the literature review, dangerous driving behavior is a rarer phenomenon than normal driving behavior. In addition, the fact that classification algorithms work by considering the equal distribution of samples in different classes; the research has some limitations. In this study, the methods of improving the performance of the models will be discussed and analyzed in order to deal with the bias of algorithms towards the majority class.
After the brief literature review, many resampling methods were examined, such as SMOTE, SMOTE-ENN, etc. However, the Adaptive Synthetic (ADASYN) technique, an improved version of the Synthetic Minority Oversampling Technique (SMOTE), is considered the most suitable for handling imbalanced datasets and avoiding overfitting [
24,
31,
32]. The main idea behind the ADASYN algorithm is the use of difficulty in learning for different minority examples as a criterion to determine the appropriate number of synthetic samples that need to be generated for each minority data example [
33]. In addition, after examining individual resampling techniques, ADASYN contributed to the highest performance in the classification process compared to the rest (i.e., SMOTE, SMOTE-ENN, SVM-SMOTE, SMOTE-Tomek and Random Oversampling).
3.4. Multiclass Classification
As the objective of this study is to identify the driving behavior risk level between three classes (i.e., Normal, Dangerous, Avoidable Accident), the problem is a multi-class classification. The proposed method is based on certain risk-driving indicators as predictor variables and four different machine learning classification algorithms: (i) Support Vector Machines, (ii) Random Forest, (iii) AdaBoost, and (iv) Multilayer Perceptron.
The four classification algorithms were proposed due to their high performance and common use on literature for dangerous driving behavior identification, for real-time crash prediction and for other real-world problems.
To train and evaluate the performance of classification algorithms, the dataset is divided into a training dataset and a testing dataset. The form of a training dataset is Xtraining = {(xn, yn), n = 1, N}, where xn is a predictor variable and yn = {0,1,2} is the target variable. By training the model, it is given the ability to classify new data correctly. The performance of the classification model is easily illustrated through a confusion matrix, where one axis represents the actual class while the other the predicted class. The results demonstrated in this paper, were obtained after utilizing 10-fold cross validation.
The classification algorithms are evaluated using the accuracy, precision, recall, f1-score, and false alarm rate defined by Equation (1) to Equation (5).
where: True Positive (
TP) represents the instances which belong to class i and were correctly classified in it; True Negative (
TN) represents the instances which do not belong to class i and were not classified in it; False Positive (
FP) represents the instances which do not belong to class i but were incorrectly classified in it; False Negative (
FN) represents the instances which belong to class i but were not classified in it.
The accuracy metric calculates the percentage of instances which were correctly classified. In problems with an imbalanced dataset, the ‘Accuracy Paradox’ is observed where the calculated accuracy is affected by the major class without reflecting the actual situation [
34]. The precision metric shows the percentage of data that actually belongs to class i of all the data that the model classified in class i., while recall describes the percentage of data that actually belongs to class i and the algorithm was able to classify them correctly in class i. In this study, the effects of incorrectly classifying a risk class as less risky or safe would have significant consequences on road safety, making recall a powerful evaluation metric. Lastly, f1-score represents the harmonic measure between precision and recall while the false alarm rate resembles the probability of false detection.
3.5. Classification Algorithms
The four classification algorithms as described in
Section 3.4 are (i) Support Vector Machines, (ii) Random Forest, (iii) AdaBoost, and (iv) Multilayer Perceptron.
3.5.1. Support Vector Machines (SVM)
SVM are supervised learning models that can be useful for classification and regression problems [
35]. The key idea is that SVM tries to find the maximum margin hyperplane while minimizing the distance between misclassified instances and decision boundaries [
36]. Also using the kernel method, SVM can manage nonlinearly separable data. Based on literature, SVM algorithm has been used extensively in road safety studies and has been shown to achieve high performance [
13]. Furthermore, SVMs have the advantage to handle high-dimensionality datasets [
37].
Utilizing the hyperparameter tuning technique Grid Search, the optimal values of the SVM’s hyperparameters were obtained. The most important SVM’s hyperparameters emerged through GridSearchCV from scikit-learn python’s library, were: (a) kernel type = ‘rbf’; (b) regularization parameter C = 50 and (c) kernel coefficient gamma = ‘scale’.
3.5.2. Random Forest (RF)
RF classifier is an ensemble method, which trains multiple decision trees in parallel utilizing the bootstrapping and aggregation methods, commonly known as the bagging technique [
38]. The bootstrapping technique is described as the parallel training of multiple decision trees using different subsets of datasets. The final decision results from the aggregation of the decisions of the individual decision trees. RF classifier tends to perform efficiently on classification tasks and more specifically on identifying risky driving behavior. Furthermore, RF benefits from the fact that can overcome overfitting problem of decision trees [
16] and thus RF algorithm is considered a good choice for identification of risky driving behavior.
Grid Search was also used for the RF model, and the optimal hyperparameters that it obtained were: (a) the number of estimators/trees of the forest = 200 and (b) the function to measure the quality of a split (criterion) = ‘entropy’.
3.5.3. AdaBoost
AdaBoost model is an ensemble method, which trains several decision trees in series. A set of weak classifiers are connected in series where each weak classifier tries to improve the classification of the samples that were incorrectly classified from the previous one; the method is known as boosting [
38]. The weight of misclassified instances by the previous tree is boosted for the subsequent tree to classify them correctly. Based on the literature, AdaBoost is suitable for most types of data, and more specifically has high performance for imbalanced datasets avoiding overfitting issues. Furthermore, the training of multiple weak classifiers in order to form a synthetic classifier with high efficiency is much easier compared to the training of one strong classifier [
39]. Therefore, since the present study concerns imbalanced dataset, it makes AdaBoost a good alternative.
Through GridSearchCV the optimal of maximum number of estimators was set to be 500.
3.5.4. Multilayer Perceptron (MLP)
MLPs are neural network models and more specifically are a supplement of feedforward neural networks. Multilayer perceptron consists of three categories of layers: (i) the input layer which receives the input data that need to be processed, (ii) the hidden layers that are the computational power of the model and (iii) the output layer which perform the prediction of the classification process. MLP classifier is commonly used for pattern classification, recognition, prediction and approximation [
40] and as stated previously has proven to be effective algorithm in driving behavior analysis studies [
16].
The optimal hyperparameters that emerged from the Grid Search optimization for the MLP model were: (a) number of hidden layers = (500, 500, 500); (b) activation function = ‘relu’; and (c) alpha parameter of regularization term = 0.0001.
3.6. Multiple Linear Regression
After defining the driver’s behavior risk level for each time frame of 30 s, the duration that each driver spends in each risk level was calculated by summing these time frames. In multiple linear regression, the purpose is to estimate the statistical significance and the relationship between a dependent variable (y) and multiple independent variables (x
i) [
41]. The effect of each independent variable on the dependent is expressed through coefficients of regression. In this study, an attempt is made to develop regression models to predict the duration that a driver spends at each safety level using certain driving behavior factors as the dependent variables.
In order to evaluate the regression models, the coefficient of determination R
2 (Equation (6)) is used, which calculates the percentage of the variance of the dependent variable (
y) interpreted by the independent variables (x
i). The coefficient of determination (R
2) measures the ability of features to interpret a phenomenon and its values range from 0 to 1.
Note: n is the number of samples; yi is the actual values of dependent variable y; is the mean value of dependent variable y; is the predicted values of dependent variable y.
To evaluate the effect of the independent variables, the logical explanation of the coefficients as well as the statistical significance of the variables were examined. When the null hypothesis is rejected at a significance level (a), the sample is characterized as statistically significant and suggests that the influence on the occurrence of the phenomenon is not due to chance. The statistical significance is evaluated by using p-value and t-value. For a p-value lower than the significance level (a) and for a t-value greater than the t-student distribution, the null hypothesis is rejected.
It is also important to note, that the selection of independent variables is made based on their correlation as well as their statistical significance in the development of the models.
3.7. Regression Algorithms
This study is based on three regression algorithms: (i) Ridge Regression, (ii) Lasso Regression, and (iii) Elastic Net Regression. These models benefit from their ability to deal with multicollinearity and their ability to perform a type of feature selection. The key idea behind these models is the regularization of least-squares by utilizing a regularization parameter λ [
42]. The choice of the specific algorithms over machine learning regressors, such as Support Vector Regressor, was based on the need to investigate the influence of independent variables in the prediction process through coefficients.
3.7.1. Ridge Regression
Ridge Regression is a regularization model which can deal high multicollinearity of independent variables. As stated previously, a regularization parameter
λ is introduced to minimize the weight of regression coefficients (b) towards zero, reducing the variability of estimates. Through
λ parameter Ridge Regression model can reduce the impact of non-important features in the prediction process. The regularization technique that Ridge Regression utilizes is called L
2 regularization. The estimated coefficients (b) of Ridge Regression minimize the function Equation (7) [
43]:
3.7.2. Lasso Regression
Lasso Regression (Least Absolute Shrinkage and Selection Operator) has many similarities with Ridge Regression since it also regularizes the cost function using a regularization parameter
λ. However, Lasso Regression has the ability to select the most important independent variables ignoring those with minimal effect on the dependent variable. Using the L
1 regularization technique, the coefficients of the least important variables tend to zero performing a selection of the most important features and dealing with model’s overfitting [
44]. The estimated coefficients (b) of the Lasso regression minimize the function Equation (8) [
43]:
3.7.3. Elastic Net Regression
Elastic Net Regression [
45] is the combination of Ridge and Lasso regression. It is a highly efficient algorithm as it combines the abilities and the benefits of both Ridge and Lasso by utilizing two regularization parameters. The estimated coefficients (b) of the Lasso regression minimize the function Equation (9):
6. Conclusions
This paper aimed to propose a framework for identifying the risk level of driving behavior and predicting the duration of driving at each safety level. An important step was the definition of driving behavior risk levels. Among the techniques examined, the definition of levels based on specific thresholds of time headway provides results relevant to the literature regarding the distribution of samples in the classes. To avoid bias in the models, the variables of time headway and time to collision were not taken into account during the classification process, excluding two very important risk factors. In the future, it is necessary to examine alternative methods of determining the risk levels of driving behavior to examine more risk factors.
Through the identification of risky driving behavior level processes, four classification algorithms were developed of which the Random Forest and Multilayer Perceptron outperformed the Support Vector Machines and AdaBoost classifiers. The two models (RF & MLP) were found to have a high capability of identifying all risk levels of driving behavior.
In the effort of improving the performance of the models, a feature selection was performed utilizing the feature importance as well as their correlation. Through the process of calculating the feature importance, it emerged that distance travelled, speed, and the speed limit are significant in identifying the risk level of driving behavior. In contrast, the variables FatigueEvent and HandsOnEvent were not particularly important during the classification process. However, the driver’s condition and interaction with the steering wheel are directly related to other driving factors such as speed or distance traveled.
In addition to the development of classification models, this research also deals with the unequal distribution of samples in the classes using the ADASYN resampling method. The main advantage of ADASYN is that the algorithm doesn’t copy the same minority data; instead, more data is generated for examples that are harder to learn. This is the first time that ADASYN is combined with a variety of machine learning classifiers for the real-time safety assessment of highly disaggregated driving behavior data.
In the second part of the study, three regression algorithms were developed to predict the duration that each driver spends at each safety level. Through regression process, it was found that among all the examined variables, the maximum speed and total distance travelled provided statistically significant results. Based on the coefficients, maximum speed has the main, negative effect on driving duration at different safety levels. Ridge, Lasso, and Elastic Net Regression are using L1 and L2 regularization, reducing the size of coefficients for not useful variables, and performing some kind of ‘Feature Selection’. Therefore, maximum speed is particularly important in predicting driving duration at each level. It should also be mentioned that to the best of the knowledge of the authors, a combined approach for detecting not only the safety level of a driver but also the duration of each level, has not been published yet. This fact forms another novelty of the current study.
Nevertheless, future studies could examine deep learning models (such as Convolutional Neural Networks [
56,
57] and Long Short-Term Memory (LSTM) [
56,
58]) which, based on relevant research, tend to perform better. Furthermore, a larger dataset and naturalistic driving data would also enhance the study results. However, due to processing power and time limitations, these analyses could not be performed at the time of this research.