1. Introduction
Today, driving assistance and road safety are critical issues in all countries around the world. According to the global status report on road safety 2015 by the World Health Organization (WHO), road accidents are in the worldwide top-ten causes of death, killing more than 1.2 million of people per year. The road traffic fatality rates are especially high in the low-income and middle-income countries [
1]. In fact, there are numerous factors possibly causing road accidents. Nonetheless, assisting and providing safety awareness for drivers during their trips is an effective approach to prevent such accidents.
Recently, researchers have paid a lot of attention to study various methods of providing assistance and safety awareness to drivers. Indeed, such works primarily fall into the following categories: recognizing vehicle mode (car, bus, train, bike, walking ...) [
2,
3,
4,
5,
6,
7,
8,
9], identifying driving styles (normal, aggressive, drunken, fatigue, drowsy, inattentive ...) [
10,
11,
12,
13,
14,
15,
16], detecting normal/abnormal driving events (moving, stopping, turning left, turning right, weaving, sudden braking, fast u-turn...) [
17,
18,
19,
20,
21,
22], accident detection [
23,
24], estimating energy consumption and pollution [
25], monitoring road and traffic condition [
26,
27,
28,
29,
30].
In fact, there are several approaches to access driver and vehicle information. In the first approach, a set of sensors and additional hardware are pre-deployed in vehicles, for instance telematic boxes (e.g., black boxes provided by car insurance companies), on-board diagnosis (OBD-II) adapters plugged into the vehicle’s controller area network (CAN) [
24,
31]. The information recorded by these devices can be then retrieved or sent over the Internet. However, this strategy requires vehicles to install extra devices, which incur more cost. Moreover, it is not feasible to implement these techniques in certain types of vehicles like bikes, and motorbikes. To overcome these drawbacks, an alternative approach is to use smartphones to collect data through a set of embedded sensors such as inertial sensors (accelerometers and gyroscopes), global positioning systems (GPS), magnetometers, microphones, image sensors (cameras), light sensors, proximity sensors, direction sensors (compass) ... The technological advances and the rapid growth in smartphone usage make the latter approach become broadly used in recent studies.
Furthermore, the global status reports on road safety 2015 by WHO also shows that approximately a quarter of all road traffic deaths involve in motorcyclists. However, very few existing works provide driving assistance and safety awareness for motorcyclists [
4,
20,
21,
22,
32]. Nonetheless, there are certain limitations in such works. The method proposed in [
32] is constrained under certain conditions such as fixing the position of smartphones, and using some predefined threshold to distinguish between normal and abnormal driving patterns. However, such thresholds may be sensitive due to the variety of sensor quality in different smartphone models or road conditions. The work proposed in [
4] must rely on the combination of GPS and accelerometer data to predict eight travelling modes. Nonetheless, its prediction accuracy is quite low with the average precision of 76.38% and an average recall of 75.88%.
In this work, we tend to develop a real-time flexible combined system, namely Vehicle mode-driving Activity Detection System (VADS), that is capable of detecting not only the vehicle mode currently used by a traveler (i.e., walking, a bike, a motorbike, a car, or a bus) but also various basic driving activities of travelers (i.e., stopping, going straight, turning left, turning right). In fact, the strategy of combining these two separating modules in our system allows improving the accuracy in recognizing driving events when the current vehicle mode of a traveler is known. The vehicle mode detection module simply relies only on accelerometer data in order to minimize the energy consumption of smartphones. However, the driving activity detection module accounts for turning left and turning right activities, which involve in changing the direction of vehicles. Hence, it requires the data from accelerometers, gyroscopes, and magnetometers. This work focuses on finding a solution that is able to collect sensor data and provide real-time prediction results in a smartphone application. Our system is thus designed to meet several main goals: it needs low computational resources and low energy consumption, and it needs to able to respond fast to the changes of travelers’ vehicle modes and driving activities.
It is well known that the data segmentation technique has been applied for activity recognition in which sensor data is split into a number of overlapping data segments (alternatively called data windows) of a predefined size. In fact, most of existing studies use the same window size and the same overlapping ratio for predicting all vehicle modes and all driving activities. Such parameter values are randomly chosen or taken from previous studies. Indeed, each vehicle mode as well as each driving activity has its own cyclic characteristics. Hence, it is unrealistic to fix such parameter values for all vehicle modes or all driving activities. Up to now, there exist few works that consider inferring the optimal data window size and the optimal overlapping ratio from training datasets [
8,
33,
34]. The authors of [
33,
34] prove that the window size of 1–2 s results in the best accuracy and the best processing speed in predicting various human activities. Then, the authors of [
8] show that the window size of 60 s leads to the highest overall recall rate in detecting vehicle modes from their dataset. However, this long window size causes a slow responding speed as well as a long processing time due to its long feature vectors. Thus, it is not suitable to apply this framework for real-time application. In this work, we thus propose an alternative algorithm to compute the optimal window size and the optimal overlapping ratio for each vehicle mode and each driving event from the training datasets. The obtained optimal window sizes fall in the range 4–6 s that are reasonable for real-time prediction. The inferred optimal parameters allow the vehicle mode detection module to improve its prediction accuracy by 2.73%, 3.04%, 6.45%, 7.37%, and 5.72% when using Random Forest, J48, Naïve Bayes, KNN, and SVM classifier, respectively, on a feature set combining time domain features, frequency domain features, and Hjorth features as comparing with the strategy using the same window size of 5 s and the same overlapping ratio of 50%. The similar improvements are observed in predicting various driving activities of motorcyclists.
The rest of the paper is organized as follows: the related work is summarized in
Section 2. In
Section 3, the detailed framework of our proposed VADS is described. Then,
Section 4 provides the description of data processing and feature extraction processes carried out in this system. Next, we present the experimental settings and the evaluation of the system’s performance in detecting vehicle mode and driving activities in
Section 5 and
Section 6, respectively.
Section 7 describes the performance comparison between our proposed framework and several recent works on a public dataset. Finally, we provide the conclusion remarks in
Section 8.
3. The Proposed Framework of Vehicle Mode-Driving Activity Detection System
Our proposed system, VADS, consists of two main modules: The first one, Vehicle mode Detection Module (VDM), focuses on detecting the vehicle mode currently used by a user (i.e., walking, a bike, a motorbike, a car, or a bus) solely relying on accelerometer sensor data. The second one, Activity Detection Module (ADM), concentrates on detecting a set of primitive driving activities based on the data collected from accelerometer, gyroscope, and magnetometer sensors of smartphones when the user ‘s vehicle mode is known (
Figure 1). This set contains the following activities: {stopping (S), going straight (G), turning left (L), turning right (R)}.
3.1. The Vehicle Mode Detection Module (VDM)
In the details, VDM is divided into two phases, including the training phase and the monitoring phase (
Figure 2). In the training phase, time series data are first collected from the accelerometer sensor and manually labeled with the corresponding vehicle type, i.e., walking, a bike, a bus, a car, or a motorbike. Then, several preprocessing techniques such as noise filtering and windowing technique are applied to calibrate the acceleration data. Next, representative information is extracted by exploring various categories of popular features, for example time domain features, and frequency domain features. A set of formulas for computing such features is presented in
Section 4.2. The resulting feature vectors are then used to train the vehicle detection model. Finally, a number of popular machine learning classifying algorithms, such as Naïve Bayes, J48, Random Forest, SVM, and KNN are tested on the training dataset to select the most suitable classifier for the monitoring phase.
In the Monitoring phase, the real-time accelerometer data is captured, preprocessed, and then extracted into a set of relevant features as describing in the training phase. Finally, the type of vehicle currently used by a traveler is identified based on the best vehicle detection model built in the above training phase and the computed feature vectors.
3.2. The Activity Detection Module (ADM)
As described above, ADM focuses on recognizing a set of basic driving activities for each vehicle mode, i.e., {Stopping, Going straight, turning Left, turning Right}. The structure of ADM is quite similar to VDM with two phases—the training phase and the monitoring phase (
Figure 3).
Indeed, turning left and turning right activities involve the directional changes of vehicles. Thus, in order to adequately capture necessary information, ADM collects input data from the accelerometer, gyroscope, and magnetometer. In addition, there are several changes in the Preprocessing and Feature extraction processes in the framework of ADM. In the preprocessing process, the raw accelerometer data is reoriented from a smartphone’s coordinates into the vehicle’s coordinates in order to accurately receive the information about the directional changes of vehicles. In the feature extraction process, a number of additional features representing the angle changing information of vehicle are introduced.
7. Performance Comparison with the Recent Works
In this section, we provide a comparison between our proposed framework and several existing works on a dataset recently collected by HTC company [
46]. Up to date, this is the only publicly available dataset which consists of various transportation modes (still, walk, run, bike, bus, car, metro, train, tram, HSR). Yet, there exist only a few works validating their proposed methods on this dataset [
6,
8,
46]. Nonetheless, the authors of [
47] concentrate on differentiating between non-motorized modes (still, walk, run, and bike) and being on a vehicle. The two remaining frameworks detect either non-motorized modes (still, walk) or motorized modes (bus, car, metro, train, tram, and HSR) relying on data collected from accelerometer, magnetometer, and gyroscope. In fact, it has been proved that among these three sensors, accelerometer consumes the lowest amount of power [
46]. Thereafter, our method, based on only accelerometer data, certainly requires less power than those of [
6,
8]. As previously mentioned, their frameworks require a long window size, i.e., 17.06 s and 60 s, that leads to a longer responding time and a higher computational resource as comparing with our framework. Moreover, the approach proposed in [
8] must rely on a very large feature set containing 348 features. It is thus infeasible to be implemented in real-time prediction application. Note that our vehicle mode detection module requires only 27 features. In addition,
Table 13 shows that on the dataset of HTC company, our method achieves the overall prediction accuracy of 97.33% which is significantly higher than the best method of two recent works. In the meantime, our model requires less computational time as comparing with the one proposed in [
6].
8. Conclusions
In this work, we propose a flexible combined system that is composed of two modules: one to detect the vehicle mode of users, one to detect the instant driving events regardless the orientation and the position of smartphones. Our system achieves the average accuracy of 98.33% in detecting the vehicle modes, and the average accuracy of 98.95% in recognizing the driving events of motorcyclists when using the Random Forest classifier, and a feature set consisting of time domain features, frequency features and Hjorth features. Moreover, the experimental results indicate that the optimal parameters (window size and overlapping ratio) lead to a considerable increment of the system performance as comparing to the approach using the same window size of 5 s and the overlapping ratio of 50%. In detail, the vehicle mode detection module improves its prediction accuracy by 2.73%, 3.04%, 6.45%, 7.37%, and 5.72% when respectively using Random Forest, J48, Naïve Bayes, KNN, and SVM classifiers. Similarly, the activity detection module gains the prediction accuracy by 7.98%, 9.06%, 8.60%, 9.33%, and 8.48% for respectively Random Forest, J48, Naïve Bayes, KNN, and SVM classifiers. Note that the optimal window sizes inferred by Algorithm 1 range from 4 to 6 s, which are feasible for real-time application. Furthermore, Naïve Bayes, KNN, and SVM classifiers are shown to be quite sensitive to the correlation of features as the driving event prediction accuracy decreases when more features are added. By contrast, Random Forest and J48 classifiers do not suffer from such effects.