In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning

Al-refai, Ghaith; Elmoaqet, Hisham; Ryalat, Mutaz

doi:10.3390/app12188928

Open AccessArticle

In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning

by

Ghaith Al-refai

^*

,

Hisham Elmoaqet

^*

and

Mutaz Ryalat

^*

Department of Mechatronics Engineering, German Jordanian University, Amman 11180, Jordan

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(18), 8928; https://doi.org/10.3390/app12188928

Submission received: 15 August 2022 / Revised: 1 September 2022 / Accepted: 2 September 2022 / Published: 6 September 2022

(This article belongs to the Special Issue Machine Learning Applications in Transportation Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Many network protocols such as Controller Area Network (CAN) and Ethernet are used in the automotive industry to allow vehicle modules to communicate efficiently. These networks carry rich data from the different vehicle systems, such as the engine, transmission, brake, etc. This in-vehicle data can be used with machine learning algorithms to predict valuable information about the vehicle and roads. In this work, a low-cost machine learning system that uses in-vehicle data is proposed to solve three categorization problems; road surface conditions, road traffic conditions and driving style. Random forests, decision trees and support vector machine algorithms were evaluated to predict road conditions and driving style from labeled CAN data. These algorithms were used to classify road surface condition as smooth, even or full of holes. They were also used to classify road traffic conditions as low, normal or high, and the driving style was classified as normal or aggressive. Detection results were presented and analyzed. The random forests algorithm showed the highest detection accuracy results with an overall accuracy score between 92% and 95%.

Keywords:

machine learning; decision trees; random forests; SVM; supervised machine learning; road conditions prediction; driving style prediction; in-vehicle data; CAN

1. Introduction

Modern vehicles include electronic modules to control the vehicle’s subsystems. These modules are called Electronics Control Units (ECU). The number of ECUs in some vehicles can reach up to 70 [1]. Many vehicle networks were developed to allow vehicle ECUs to communicate with each others. Controller Area Network (CAN) was introduced as an automotive communication network protocol by Robert Bosch LLC in 1994 [2]. Flexray is another vehicle network protocol that provides more bandwidth than CAN [3]. Ethernet is also used as a network protocol in the automotive industry [4]. The transferred data within the vehicle network are called in-vehicle data.

In the recent years, new concepts of vehicle communications were introduced, such as Vehicle to Vehicle (V2V) and Vehicle to Infrastructure (V2I). In these approaches, a vehicle can transfer data with other road elements, such as other vehicles, pedestrians and infrastructure cloud systems. The transferred data between the vehicle and other road elements is called connected vehicles data. More information about V2V and V2I communication can be found in [5,6].

Many new features and applications were introduced to utilize vehicles data (Both in-vehicle and connected vehicles data) to improve vehicle and road safety. Ziebinski et al. [7] provided a review for the latest Advanced Driver Assistance System (ADAS) that uses in-vehicle data to introduce safety features such as lane detection, road object detection and traffic sign recognition. These systems require dedicated sensors such as cameras, radars and ultrasonic sensors to collect road information. Park et al. [8] proposed forward collision warning system using mono camera. A frontal object detection system based on sensor fusion of radar and mono vision camera was proposed by Hsu et al. [9]. A literature review for connected vehicles data and Internet of Things (IOT) to implement the smart cities approach can be found in [10].

Connected vehicles data systems require wireless devices to transfer the data. Moreover, the size of the transferred data is large and the data require advanced data storage and data processing systems. The ML approaches required to deal with the connected vehicles data are more complex than in-vehicle data. Therefore, in-vehicle data systems are usually less expensive and more readily available than connected vehicles data-based systems.

The main goal of this research is to enhance vehicle and road safety using a low-cost ML system that uses readily available in-vehicle data. Two design considerations were taken into account to reduce the ML system cost. The first consideration is that the ML system requires only basic in-vehicle CAN data. No special sensors, such as cameras and radars, are required by the ML system. Engine rpm, engine coolant temperature, manifold pressure, vehicle acceleration and fuel consumption are examples of the used data by the ML system. These data are available in the CAN for the main vehicle functionalities, and the proposed ML system uses this existing for predicting road conditions. This will significantly lower the cost of the data required by the ML system.

The second consideration is to use traditional ML algorithms, such as decision trees, random forests and SVM. These algorithms can achieve acceptable accuracy scores and allow real-time implementation with low cost. Deep learning algorithms may provide more accurate predictions than the traditional ML algorithms, but also require very expensive systems for real time implementation.

The proposed ML system handles three categorization problems; road surface conditions, road traffic conditions and the driving style. Road surface is characterized by three classes; full of holes, smooth or even. Road traffic is characterized as high, normal or low and the driving style is characterized as aggressive or normal.

In this paper, Section 2 explores some related work to our research. Section 3 provides an overview of the proposed system architecture. Section 4 explains the dataset we used for algorithms training and testing. Section 5 briefly explains the ML algorithms implementation. Section 6 defines the evaluation metrics. Section 7 presents the detection results. Section 8 provides a discussion about the system results, system limitation and future enhancement. Finally, conclusions are provided in Section 9.

2. Related Work

Since our proposed system uses in-vehicle data, this section explores more related work about in-vehicle data and ML applications. Lattanzi et al. [11] used two ML approaches and in-vehicle sensor data to identify unsafe driving behavior by the driver. They used SVM and neural network algorithms for classification. The input features to the ML system were the vehicle speed, engine speed, engine load, throttle position, steering wheel angle and Brake pedal pressure. Classification results of this study showed an average accuracy above 90% for both classifiers.

Alvarez-Coello et al. [12]. proposed a model for dangerous driving events using in-vehicle data. Random forests and Recurrent Neural Network were used to classify the data. The authors used features such as acceleration, brake pedal position, acceleration pedal position, engine RPM and torque. The danger level classified as normal, moderate and aggressive. Wang et al. [13] proposed k-means clustering-based support vector machine (kMC-SVM) method to classify drivers into two types: aggressive and moderate. Vehicle speed and throttle opening were treated as the feature parameters to reflect the driving styles.

Osman et al. [14] introduced a machine learning model for near-crash prediction from observed vehicle kinematics data. Vehicle kinematics data, such as speed, longitudinal acceleration, lateral acceleration, yaw rate and pedal position, were used as input features for multiple ML systems. The authors utilized several machine learning algorithms, such as K nearest neighbor (KNN), random forests, support vector machine (SVM) and adaptive boost (AdaBoost), to predict near-crash situations. The AdaBoost algorithm showed a better recall and F-score than other algorithms. A system which can identify the driver trip using historical trip-based data collected from in-vehicle data was proposed by Moreira-Matias et al. [15]. Decision trees obtained an accuracy between 75% and 100%.

Ghadge et al. [16] proposed a model to detect road potholes using vehicle accelerator information and GPS data. The k-means clustering algorithm was applied on the training data to build the model. Random forests classifier was used to evaluate this model on the test data for better prediction. Dhiman et al. [17] proposed a computer vision approach to detect potholes using stereo vision camera and deep learning algorithm. Kim et al. [18] provided a review for potholes techniques using machine learning. The paper summarised the different approaches for potholes detection using vibration sensors, accelerometer, 3D construction and 2D images.

Bernas et al. [19] provided a survey for low-cost techniques to detect road traffic using in-vehicle sensors. The techniques include applications of infrared and visible light sensors, wireless transmission, accelerometers, magnetometers, ultrasonic and microwave radars as well as acoustic sensing.

There are many other applications that uses in-vehicle data with ML. For example, a vehicle theft prevention and driver identification system was proposed by Martinelli et al. [20,21]. A system to predict the driver’s drowsiness based on the air quality presented in the cabin car was proposed by Goh et al. [22]. Bai et al. [23] proposed a system to address the problem of detecting traffic signals from a set of vehicle speed profiles.

The significance of our research is in proposing a low-cost prediction system for road conditions and driving style. In order to reduce the system cost, general CAN were was used as input features to the ML system. No additional cost for special sensors are required by the system. Furthermore, the chosen ML algorithms are inexpensive to implement and they do not require a complex computing system.

3. System Overview

This section describes how ML and vehicle network data can be used together to implement a full prediction system. It also explains how the predictions can be used in safety applications. The safety application can be implemented in the vehicle and in the infrastructure system by transferring the prediction results to the infrastructure. The proposed system block diagram is summarized in Figure 1. The proposed system includes the following components:

Vehicle network: The in-vehicle data to be fed to the ML system are collected through a vehicle network.
Data logging system: The data logging system collects data from the vehicle network.
Machine learning system: The machine learning system receives the data from the logging system and then classifies them. A training dataset is required to train the ML algorithms. The training dataset should be labeled correctly to the required classes.
Vehicle to Infrastructure communication (V2I): This network is used to transfer the result of the ML predictions to the infrastructure system.
Vehicle application system: This system uses the ML results to provide in-vehicle safety functions for the driver. For example, if the road traffic is classified as high, then a warning is issued to drive carefully.
Infrastructure application system: This system uses the ML classification result to provide functions in the infrastructure level. for example, a road maintenance request is issued if the road surface is detected as being full of holes.

Three algorithms were implemented for in-vehicle data classifications; decision trees, random forest and Support Vector Machine (SVM). A labeled dataset collected from the CAN network was used to train and test these algorithms. Results of the classification were analyzed with respect to algorithm accuracy, precision, recall and F-score.

4. The Dataset

The dataset used in this work was obtained from the Kaggle website under the title of Traffic, Driving Style and Road Surface Condition [24]. Two cars were used to collect the dataset, a Peugeot 207 1.4 HDi and an Opel Corsa 1.3 HDi. The dataset was collected from the vehicles On Board Diagnostics port (OBD) by using an OBD device that can be paired with a smartphone. Ruta et al. [25] used this dataset to propose machine learning models in Internet of Things (IOT).

The dataset includes 14 input features. They are summarized as follows:

Altitude change, calculated over 10 s.
Current speed value, which is the average speed in the last 60 s.
Speed variance in the last 60 s.
Speed variation for every second of detection.
Longitudinal acceleration.
Engine load, expressed as a percentage.
Engine coolant temperatures in degree celsius.
Manifold Air Pressure (MAP), a parameter used by the internal combustion engine used to compute the optimal air/fuel ratio.
Revolutions Per Minute (RPM) of the engine.
Mass Air Flow (MAF) Rate measured in g/s. This reading is used by the engine to set fuel delivery and spark timing.
Intake Air Temperature (IAT) at the engine entrance.
Vertical acceleration.
Average fuel consumption, calculated as liters per 100 km.

The dataset was labeled to three sub-problem categories, i.e., road surface conditions, road traffic conditions and driving style. The road surface condition was labeled as smooth, even or full of holes. The road traffic condition was labeled as low, normal or high, and the driving style was labeled as normal or aggressive style. The dataset includes 24,957 data points. Table 1 summarizes the input features, the categories and the labels for each category of the dataset.

The number of labels for each category are different. Smooth roads, normal traffic conditions and normal driving style represent the majority of the labels of each category. This is due to the nature of the roads used for data collection. Figure 2 shows the distribution of the labels for the classification categories we considered in this study.

5. ML Algorithms Implementation

The used ML algorithms in this work are common and widely used in classification problems. Decision trees is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test and each leaf node (terminal node) holds a class label. A decision tree typically starts with a single node, which branches into possible outcomes. Each of those outcomes leads to additional nodes, which branch off into other possibilities. This gives it a treelike shape. More information about decision trees can be found in [26].

Random forests is an ML algorithm constructed from many decision trees. The random forests algorithm establishes the outcome based on the predictions of the decision trees. It predicts by taking the average or mean of the output from various trees. More information about random forests can be found in [27].

The support vector machine is an algorithm that tries to find a hyperplane to separate the data based on the classes. SVM finds boundaries that maximize the distance between the support vector data of each class. More information about SVM can be found in [28].

As mentioned in the previous sections, three ML algorithms were implemented to classify the CAN data. The implementation was done in Python using Sklearn, Pandas, Scipy, Numpy and Matplot packages [29,30,31,32]. The packages were used for ML implementation, results analysis and visualization. The dataset was divided to 80% for training and 20% for testing. This yields 19,965 samples for training and 4992 for testing.

In the decision trees implementation, the minimum number of samples to split to an internal node is set to 2, while the minimum number of samples per leaf is set to 1. In total, 200 trees were used in the random forests implementation.

SVM was implemented with radial basis function (RBF) kernel as the dataset is highly non linear, and the Kernel is needed to create accurate boundary conditions. The scaling parameter (

γ

) and the cost parameter were adjusted to achieve the best classification accuracy.

6. Evaluation Metrics

The results of the detection are analyzed by showing the classification confusion matrix for each algorithm. The confusion matrix shows the true positives, false positives, true negatives and false negatives for each of the classification problems in this study. From the confusion matrix, accuracy, recall, precision and F-score are calculated.

Accuracy represents the number of the correct prediction as a ratio to the number of total predictions. Precision shows how many are predicted correctly from all the classes predicted as positive. Recall shows how many are predicted correctly from all the positive classes. F-score is the harmonic mean of precision and recall.

The above measures are given as follows:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F - s c o r e = 2 \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

The permutation feature importance approach is implemented to show the importance of the used features to the accuracy of detection. This algorithm works by shuffling the data of a single feature at time to destroy its quality while maintaining the rest of the features. If the quality of prediction is highly impacted, it means the feature is very important for the predictor. Feature ranking helps in understanding how the ML algorithms work and what data are more important to them. André et al. [33] showed more information about the permutation importance and implementation.

7. Results

7.1. Road Surface Conditions Classification Results

Table 2 shows the accuracy, the precision, the recall and the F-score of the road surface conditions classification and Table 3 shows the confusion matrix of the predictions.

Table 4 shows the top seven important features for road surface detection for the three algorithms. As shown in the results, engine coolant temperature is the most important feature for the three algorithms (decision trees, random forests and SVM). SVM was the only approach to have the longitudinal acceleration as one of the top seven important features for classification.

7.2. Road Traffic Conditions Classification Results

Table 5 shows the accuracy, the precision, the recall and the F-score of road traffic conditions classification, while Table 6 shows the confusion matrix of the predictions.

Table 7 shows the feature importance for road traffic classification using the permutation feature importance. SVM relied on vehicle instant speed for classification, while decision trees and random forests relied more on the average speed. Fuel consumption ranked as the third important feature for decision trees and random forests, while it was not in the top seven important features for SVM. Manifold absolute pressure was more important to SVM than the other two algorithms.

7.3. Driving Style Classification Results

Table 8 shows the accuracy, the precision, the recall and the F-score of driving style classification and Table 9 shows the confusion matrix of the predictions.

Table 10 shows the feature importance for driving style classification. Fuel consumption was more important for decision trees and random forests, while manifold air pressure was more important for SVM.

8. Discussion

In this work, in-vehicle data were used to make predictions for road conditions and driving style using supervised machine learning algorithms. The detest was collected from vehicle CAN network. It includes 14 features, such as vehicle speed, longitudinal acceleration, fuel consumption and engine rpm, as shown in Table 1. The data were labeled to three categories. The first one is road surface conditions, which classify the road as full of holes, smooth or even. The second category is road traffic conditions, which classifies the traffic as low, normal or high. Finally, The driving style which classifies the driving style into aggressive or normal.

A detailed overview for the system architecture is shown in Figure 1. The model includes a data logging system for in-vehicle data. A machine learning algorithm system was used for classification and prediction.

Three ML algorithms were implemented, i.e., decision trees, random forests and SVM. The detection results showed that random forests provided the best performance among the three algorithms. Decision trees came second, while the lowest performance algorithm was SVM. Figure 3, Figure 4 and Figure 5 show the detection results represented in charts for the three classifications topics we covered in this work.

Due to the nature of road conditions where the dataset was gathered, it was noticed that 61% of road surface is smooth and only 13% is full of holes. Moreover, 75% of the traffic is normal and only 12% is high traffic. Normal driving style is 89% of the data and the rest are aggressive. This imbalanced data distribution can impact the ML models and make them biased toward one class more than others. Although the recall results in this study were good, which means algorithms detection was accurate for the positive classes (low-sample data), it is always better to have balanced data. Future work may focus on solving this issue by increasing the amount of the training data to have more balanced data. Another solution is to use oversampling techniques to increase the positive classes’ samples.

The permutation feature importance technique was used to rank the input features based on its impact on the detection results. It was noticed that decision trees and random forests have almost the same rank for the features, while SVM showed different ranking. If we have to develop a voting system to choose between many ML detection, it is important to choose ML algorithms that build different classification models and think differently. Feature ranking showed that some features did not have high impact to the detection results. For example, engine load and manifold absolute pressure data have a very low impact on the driving style detection. Altitude variation and vehicle speed variation have a very low impact on driving style detection in SVM. Therefore, eliminating low-ranked features can improve the ML system performance and helps avoiding model over fitting; it also helps in the practical implementation of the system.

More work can be added in the future to this research. Collecting more data from other vehicle systems, such as suspension, brake and gear, can help improve the results. Extracting some statistics from the data, such as mean, standard deviation and median, can add more value to the input features. Ranking the features and eliminating the low impact features is also a good practice to reduce system complexity.

Fusing data from many resources provides a better understanding about the vehicle surrounding area and then yield to a better prediction system. Therefore, fusing in-vehicle data with connected vehicle data should boost the performance of the ML system. Adding data from sensors such as camera, radar and Lidar will improve the detection results.

A deep learning algorithm, such as neural network, can be suggested as a future work. Neural networks may have a better performance than the conventional ML algorithms, such as random forests and SVM. However, deep learning techniques require more computation and then a more expensive system. Therefore, choosing between deep learning and the traditional ML algorithms is a trade off between system accuracy and system cost.

9. Conclusions

In this study, an ML system is proposed to solve three categorization problems; road surface conditions, road traffic conditions and driving style. Decision trees, random forests and SVM were implemented in Python. In-vehicle CAN data were used to train and test the algorithms.

Random forests showed the best accuracy, precision, recall and F-score for all the classifications. The nature of the features and the amount of the training dataset is what gives an algorithm the advantage over another. From the results, we can conclude that random forests is the best algorithm to predict road surface conditions, road traffic conditions and driving style.

Feature importance of the algorithms was analyzed using permutation feature importance algorithm. It was noticed that decision trees and random forests have almost the same feature importance rank. SVM showed different feature importance rank. For example, SVM showed a high rank for longitudinal acceleration for road surface detection, while decision trees and random forests showed a low rank for this feature. Features ranking can help eliminate the low-ranked features to reduce system complexity, while maintaining the ML system performance.

Finally, this work shows that vehicle network carries rich information that can be analyzed and classified using ML to provide useful applications. In-vehicle data with traditional ML algorithms can provide a system with high accuracy and inexpensive implementation compared to more complex ML systems.

Author Contributions

G.A.-r. developed the detection system overview and defined the system elements. He also did the ML algorithm implementation in Python. G.A.-r. contributed to the related work and conclusion. H.E. contributed to data analysis, evaluation metrics, features ranking, discussion and general paper review and corrections. M.R. wrote the introduction, contributed to the related work section, results, discussion and conclusion. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The used dataset in this work is available on the Kaggle website under the title of Traffic, Driving style, road surface conditions: https://www.kaggle.com/datasets/gloseto/traffic-driving-style-road-surface-condition, The data was accessed on 18 July 2022.

Acknowledgments

Our gratitude to the people who prepared the dataset and made it available in Kaggle.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine learning
CAN	Controller area network
V2V	Vehicle to Vehicle communication
V2I	Vehicle to Infrastructure communication
ADAS	Advanced Driver Assistance System
OBD	On Diagnostics Board
FCW	Forward collision warning
AEB	Automatic emergency braking
ACC	Advance cruise control
ECU	Electronic control unit
GPS	Global positioning system
SVM	Support vector machine
RBF	Radial basis function
KNN	N nearest neighbor
ANN	Artificial Neural Network
TP	True positive
TN	True negative
FP	False positive
FN	False negative

References

Schmidgall, R. Automotive Embedded Systems Software Reprogramming. Ph.D. Thesis, Brunel University, London, UK, 2012. [Google Scholar]
Farsi, M.; Ratcliff, K.; Barbosa, M. An overview of controller area network. Comput. Control Eng. J. 1999, 10, 113–120. [Google Scholar] [CrossRef]
Makowitz, R.; Temple, C. Flexray-a communication network for automotive control systems. In Proceedings of the 2006 IEEE International Workshop on Factory Communication Systems, Torino, Italy, 27–30 June 2006; pp. 207–212. [Google Scholar]
Matheus, K.; Königseder, T. Automotive Ethernet; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
Zeadally, S.; Guerrero, J.; Contreras, J. A tutorial survey on vehicle-to-vehicle communications. Telecommun. Syst. 2020, 73, 469–489. [Google Scholar] [CrossRef]
Dey, K.C.; Rayamajhi, A.; Chowdhury, M.; Bhavsar, P.; Martin, J. Vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication in a heterogeneous wireless network–Performance evaluation. Transp. Res. Part C Emerg. Technol. 2016, 68, 168–184. [Google Scholar] [CrossRef]
Ziebinski, A.; Cupek, R.; Grzechca, D.; Chruszczyk, L. Review of advanced driver assistance systems (ADAS). In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2017; Volume 1906, p. 120002. [Google Scholar]
Park, K.-Y.; Hwang, S. Robust range estimation with a monocular camera for vision-based forward collision warning system. Sci. World J. 2014, 2014, 923632. [Google Scholar] [CrossRef] [PubMed]
Hsu, Y.W.; Lai, Y.H.; Zhong, K.Q.; Yin, T.K.; Perng, J.W. Developing an on-road object detection system using monovision and radar fusion. Energies 2019, 13, 116. [Google Scholar] [CrossRef]
Heidari, A.; Navimipour, N.J.; Unal, M. Applications of ML/DL in the management of smart cities and societies based on new trends in information technologies: A systematic literature review. Sustain. Cities Soc. 2022, 85, 104089. [Google Scholar] [CrossRef]
Lattanzi, E.; Freschi, V. Machine learning techniques to identify unsafe driving behavior by means of in-vehicle sensor data. Expert Syst. Appl. 2021, 176, 114818. [Google Scholar] [CrossRef]
Alvarez-Coello, D.; Klotz, B.; Wilms, D.; Fejji, S.; Gómez, J.M.; Troncy, R. Modeling dangerous driving events based on in-vehicle data using Random Forest and Recurrent Neural Network. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019. [Google Scholar]
Wang, W.; Xi, J. A rapid pattern-recognition method for driving styles using clustering-based support vector machines. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; pp. 5270–5275. [Google Scholar]
Osman, O.A.; Hajij, M.; Bakhit, P.R.; Ishak, S. Prediction of near-crashes from observed vehicle kinematics using machine learning. Transp. Res. Rec. 2019, 2673, 463–473. [Google Scholar] [CrossRef]
Moreira-Matias, L.; Farah, H. On developing a driver identification methodology using in-vehicle data recorders. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2387–2396. [Google Scholar] [CrossRef]
Ghadge, M.; Pandey, D.; Kalbande, D. Machine learning approach for predicting bumps on road. In Proceedings of the 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Davangere, India, 29–31 October 2015; pp. 481–485. [Google Scholar]
Dhiman, A.; Klette, R. Pothole detection using computer vision and learning. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3536–3550. [Google Scholar] [CrossRef]
Kim, T.; Ryu, S.-K. Review and analysis of pothole detection methods. J. Emerg. Trends Comput. Inf. Sci. 2014, 5, 603–608. [Google Scholar]
Bernas, M.; Płaczek, B.; Korski, W.; Loska, P.; Smyła, J.; Szymała, P. A survey and comparison of low-cost sensing technologies for road traffic monitoring. Sensors 2018, 18, 3243. [Google Scholar] [CrossRef] [PubMed]
Martinelli, F.; Mercaldo, F.; Nardone, V.; Orlando, A.; Santone, A. Who’s Driving My Car? A Machine Learning based Approach to Driver Identification. In Proceedings of the 4th International Conference, ICISSP 2018, Funchal, Portugal, 22–24 January 2018; pp. 367–372. [Google Scholar]
Martinelli, F.; Mercaldo, F.; Santone, A. Machine learning for driver detection through CAN bus. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–5. [Google Scholar]
Goh, C.C.; Kamarudin, L.M.; Zakaria, A.; Nishizaki, H.; Ramli, N.; Mao, X.; Syed Zakaria, S.M.; Kanagaraj, E.; Abdull Sukor, A.S.; Elham, M.F. Real-time in-vehicle air quality monitoring system using machine learning prediction algorithm. Sensors 2021, 21, 4956. [Google Scholar] [CrossRef] [PubMed]
Bai, R.; Chen, X.; Chen, Z.L.; Cui, T.; Gong, S.; He, W.; Jiang, X.; Jin, H.; Jin, J.; Kendall, G.; et al. Analytics and machine learning in vehicle routing research. Int. J. Prod. Res. 2021, 1–27. [Google Scholar] [CrossRef]
Kaggle. Available online: https://www.kaggle.com/datasets/gloseto/traffic-driving-style-road-surface-condition (accessed on 18 July 2022).
Ruta, M.; Scioscia, F.; Loseto, G.; Pinto, A.; Di Sciascio, E. Machine learning in the Internet of Things: A semantic-enhanced approach. Semant. Web 2019, 10, 183–204. [Google Scholar] [CrossRef]
Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An introduction to decision tree modeling. J. Chemom. A J. Chemom. Soc. 2004, 18, 275–285. [Google Scholar] [CrossRef]
Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
SKlearn. Available online: https://scikit-learn.org/stable/ (accessed on 18 July 2022).
Numpy. Available online: https://numpy.org/ (accessed on 18 July 2022).
Pandas. Available online: https://pandas.pydata.org/ (accessed on 18 July 2022).
Matplot. Available online: https://matplotlib.org/ (accessed on 18 July 2022).
Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]

Figure 1. The figure shows the system architecture for road conditions and driving style prediction.

Figure 2. The distribution of each class in the dataset: (a) Label distribution as a percentage for the road surface conditions. (b) Label distribution as percentage for the road traffic conditions. (c) Labels distribution as a percentage for the driving style.

Figure 3. Accuracy, precision, recall and F-score for road surface conditions results.

Figure 4. Accuracy, precision, recall and F-score for road traffic conditions results.

Figure 5. Accuracy, precision, recall and F-score for driving style results.

Table 1. The dataset input features and the output classes for the road conditions, traffic conditions and the driving style.

Features	Output 1: Road Surface Conditions	Output 2: Road Traffic Conditions	Output 3: Driving Style
AltitudeVariation	Smooth	Low traffic	Normal style
VehicleSpeedInstantaneous	Full of holes	Normal traffic	Aggressive style
VehicleSpeedAverage	Even condition	High traffic
VehicleSpeedVariance
VehicleSpeedVariation
LongitudinalAcceleration
EngineLoad
EngineCoolantTemperature
ManifoldAbsolutePressure
EngineRPM
MassAirFlow
IntakeAirTemperature
VerticalAcceleration
FuelConsumptionAverage

Table 2. Road surface conditions classification results.

	Decision Trees	Random Forests	SVM
Accuracy	0.954	0.983	0.94
Precision	0.974	0.99	0.942
Recall	0.975	0.984	0.941
F-score	0.975	0.989	0.94

Table 3. Confusion matrix for road surface conditions classification.

		Decision Trees			Random Forests			SVM
True/Predicted	Full of holes	Smooth	Even	Full of holes	Smooth	Even	Full of holes	Smooth	Even
Full of holes	649	2	1	649	1	0	553	15	41
Smooth	57	2995	1	23	3045	4	42	2932	85
Even	52	43	1198	22	21	1248	39	77	1208

Table 4. Road surface conditions feature importance using the permutation feature importance approach.

Feature Importance Rank	Decision Trees	Random Forests	SVM
1	Engine Coolant Temperature	Engine Coolant Temperature	Engine Coolant Temperature
2	Fuel Consumption Average	Intake Air Temperature	Intake Air Temperature
3	Vehicle Speed Average	Fuel Consumption Average	Vehicle Speed Average
4	Intake Air Temperature	Vehicle Speed Average	Vehicle Speed instantaneous
5	Engine RPM	Engine RPM	Engine RPM
6	Vehicle Speed Variance	Manifold Absolute Pressure	Longitudinal Acceleration
7	Manifold Absolute Pressure	Vehicle Speed Variance	Manifold Absolute Pressure

Table 5. Road traffic conditions classification results.

	Decision Trees	Random Forests	SVM
Accuracy	0.951	0.979	0.938
Precision	0.973	0.989	0.938
Recall	0.973	0.981	0.938
F-score	0.973	0.985	0.938

Table 6. Confusion matrix for road traffic conditions classification.

		Decision Trees			Random Forests			SVM
True/Predicted	High traffic	Low traffic	Normal traffic	High traffic	Low traffic	Normal traffic	High traffic	Low traffic	Normal traffic
High traffic	599	2	1	599	6	0	504	84	11
Low traffic	80	3694	2	11	3744	2	47	3693	35
Normal traffic	39	43	549	25	35	549	13	119	486

Table 7. Road traffic conditions feature importance using the permutation feature importance approach.

Feature Importance Rank	Decision Trees	Random Forests	SVM
1	Vehicle Speed Average	Engine Coolant Temperature	Engine Coolant Temperature
2	Engine Coolant Temperature	Vehicle Speed Average	Vehicle Speed instantaneous
3	Fuel Consumption Average	Fuel Consumption Average	Engine RPM
4	Intake Air Temperature	Intake Air Temperature	Intake Air Temperature
5	Vehicle Speed Variance	Vehicle Speed Variance	Longitudinal Acceleration
6	Engine RPM	Longitudinal Acceleration	Vehicle Speed Average
7	Longitudinal Acceleration	Engine RPM	Manifold Absolute Pressure

Table 8. Driving syle classification results.

	Decision Trees	Random Forests	SVM
Accuracy	0.92	0.95	0.91
Precision	0.92	0.95	0.91
Recall	0.92	0.95	0.91
F-score	0.92	0.95	0.91

Table 9. Confusion matrix for driving style classification.

	Decision Trees		Random Forests		SVM
True/Predicted	Aggressive style	Normal style	Aggressive style	Normal style	Aggressive style	Normal style
Aggressive style	354	209	345	230	140	409
Normal style	184	4243	37	4393	55	4393

Table 10. Driving style feature importance using the permutation feature importance approach.

Feature Importance Rank	Decision Trees	Random Forests	SVM
1	Vehicle Speed Average	Vehicle Speed Average	Vehicle Speed instantaneous
2	Vehicle Speed instantaneous	Longitudinal Acceleration	Vehicle Speed Average
3	Longitudinal Acceleration	Fuel Consumption Average	Engine RPM
4	Fuel Consumption Average	Vehicle Speed instantaneous	Longitudinal Acceleration
5	Vehicle Speed Variance	Vehicle Speed Variance	Vertical Acceleration
6	Engine RPM	Vertical Acceleration	Manifold Absolute Pressure
7	Vertical Acceleration	Engine RPM	Vehicle Speed Variance

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-refai, G.; Elmoaqet, H.; Ryalat, M. In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning. Appl. Sci. 2022, 12, 8928. https://doi.org/10.3390/app12188928

AMA Style

Al-refai G, Elmoaqet H, Ryalat M. In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning. Applied Sciences. 2022; 12(18):8928. https://doi.org/10.3390/app12188928

Chicago/Turabian Style

Al-refai, Ghaith, Hisham Elmoaqet, and Mutaz Ryalat. 2022. "In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning" Applied Sciences 12, no. 18: 8928. https://doi.org/10.3390/app12188928

APA Style

Al-refai, G., Elmoaqet, H., & Ryalat, M. (2022). In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning. Applied Sciences, 12(18), 8928. https://doi.org/10.3390/app12188928

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning

Abstract

1. Introduction

2. Related Work

3. System Overview

4. The Dataset

5. ML Algorithms Implementation

6. Evaluation Metrics

7. Results

7.1. Road Surface Conditions Classification Results

7.2. Road Traffic Conditions Classification Results

7.3. Driving Style Classification Results

8. Discussion

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI