1. Introduction
The ability to move (mobility) is essential in our daily lives nowadays. One of the most widely used means of transportation is cars due to the increased convenience they provide compared with other means, such as public transport.
Statistics show that the estimated number of worldwide car sales has followed a constant increasing trend over the past 10 years, with the exception of the period between 2020 and 2021 that corresponded to the COVID-19 pandemic lockdown time [
1]. Despite this, car sales started to raise again in 2021 after the end of most lockdown-related restrictions.
This increased usage of cars has led to a need for further exploration of car-related research areas, including driver behavior analysis [
2] in relation to traffic, safety [
3] and environmental issues [
4,
5,
6].
Driving a car is an activity that involves many neural circuits in the brain that are related to visual-motor coordination, episodic and procedural memory, visual search and executive functions such as the ability to plan, change the strategy of conduct or initiate and inhibit reactions [
7]. Most of the information that reaches the driver while driving a car takes place through the visual system, due to the need to perceive signs, objects, people or events [
8]. Therefore, proper visual and motor coordination of the driver is necessary [
9].
There are undeniable differences in road behavior between a driver who is just learning to drive and one who is already experienced [
10]. An experienced driver performs the basic driving activities (e.g., turning the steering wheel, shifting gears or pressing the pedals) automatically without paying special attention to them. Analyzing the driving styles and behaviors of both new and experienced drivers could lead to insightful findings for the future development of the automotive industry by taking advantage of the advances in driving, which could be made economical and safer [
11,
12]. Research on driving style analysis in particular has possible applications to logistics, the transport industry, car insurance companies, government regulatory organizations, the controlled development of infrastructure and public transport [
13].
One specific field of interest of driving style analysis is accident detection. According to a WHO report in 9 February 2004, road accidents are a major but neglected public health challenge that requires a concerted effort toward effective and sustainable prevention. Of all the systems people deal with on a daily basis, road traffic systems are the most complex and dangerous. It is estimated that 1.2 million people worldwide are killed in road accidents and as many as 50 million are injured each year. These numbers are projected to increase by around 65% over the next 20 years unless a new prevention commitment is made [
14].
Studies on the topic of accident detection can be divided into two main categories. The first category focuses on accident prevention based either on the recognition of cognitive activities using wearable sensors during driving [
15], detection of the road type using either vision-based or car internal sensors [
16] or the monitoring of specific dangerous events during driving (e.g., wrong-way driving or a drop in vigilance) using either wearable or vision-based sensors [
17,
18,
19]. The second category investigates the topic of post-crash detection for either the dispatching of emergency services [
20,
21] or as an inspection tool for car-renting companies [
22,
23,
24].
In this paper, we investigate the first category of work through the topic of road type detection. To the best of our knowledge, the number of past studies regarding this topic is very limited, and they exclusively use either vision-based or car internal sensors. Jo et al. for instance [
25] proposed a vehicle-tracking and behavior-reasoning algorithm to provide advanced driver assistance using LIDAR, radar and RGB camera data to obtain insight on the surroundings of the vehicle. Ramanishka et al. [
26] introduced the
Honda Research Institute Driving Dataset, a dataset combining RGB and car modalities (GPS, LIDAR, CAN-Bus and IMU sensors) in suburban, urban and highway environments to help researchers investigate the topic of automated driving scene analysis. Finally, many other studies have been proposed in the past to automatically analyze the surroundings of a vehicle without necessarily directly classifying road types. Examples include the monitoring of cars in neighboring lanes in a highway environment [
27] or of weather or traffic conditions in highway and urban environments [
28]. It can be noted from the aforementioned work that the topic of road type detection using wearable devices is still unexplored.
We therefore propose a system for the automatic detection of road types based on applying state-of-the-art machine learning techniques on physiological data acquired from a pair of smartglasses. We use in our study the
JINS Meme glasses (Jins Inc., Tokyo, Japan) that record electrooculography (EOG), characterizing eye movements, and the linear acceleration and angular velocity, characterizing head movements. The data acquired by such a device have been shown to be useful for recognizing cognitive activities in a driving context in past studies [
15]. We try to verify if road type detection is possible on the basis of solely using wearable modalities in particular, without any assistance from other external sensors such as car internal sensors. To the best of our knowledge, this study is the first to tackle this specific classification problem exclusively using physiological sensor modalities.
To abstract the problem of road type detection from a machine learning point of view, we translate it into a classification problem and solve it using standard supervised learning techniques, following the standard pattern recognition chain that comprises the following steps [
29]:
Data acquisition: the choice and set-up of sensors and design of an experimental set-up;
Data processing: operations to clean the data such as noise removal, filtering, synchronization and segmentation;
Feature extraction: computations of specific values in the data that carry a specific relevance for the classification problem to be solved;
Classification: training and evaluation of a classifier operating on vectors of the features previously extracted.
In particular, we focus on four types of areas to recognize: a city, a highway, a housing estate and an undeveloped area [
30]. We manually extracted statistical features from the data following our past observations that more complex feature-learning methods using deep neural networks tend to not necessarily produce better results for problems where physiological time series data are involved and data are relatively scarce [
22,
31]. Using these features, we perform a comparative study of the most widely used state-of-the-art classifiers and a feature analysis based on the computation of feature importance scores using ANOVA.
To summarize, the main contributions of our paper are as follows:
We perform a machine learning study for the classification of four different road types (city road, highway, housing estate and undeveloped area) using manually crafted features extracted from physiological signals (EOG, acceleration and angular velocity) acquired from smart glasses worn by the driver;
We perform a comparative experiment involving several state-of-the-art classifiers to find the best configuration for the classification problem to be solved;
We perform feature selection with ANOVA to determine what the top-performing features are for the classification problem at stake.
The rest of our paper is structured as follows. The materials and methods used in our study are first described in
Section 2. The results of our experiments are presented in
Section 3 and discussed in
Section 4. Finally, a conclusion and comment about future outlooks are provided in
Section 5.
2. Materials and Methods
2.1. Technology Used
For the data acquisition, we used JINS MEME smart glasses, a device furnished with three-point EOG and a six-axis inertial measurement unit (IMU) with an accelerometer and a gyroscope [
15,
32]. The sampling frequency of the acquired signals was 100 Hz. The data were transmitted to a computer via Bluetooth or USB and could be exported to a CSV file.
For the data preparation and classification, we used MATLAB R2022a. The models were trained using a PC with an Intel(R) Core(TM) i5-9300H CPU processor and 16 GB of RAM.
2.2. Data Acquisition
A dataset of physiological data obtained by the JINS Meme smart glasses (i.e., EOG, IMU linear acceleration and angular velocity), was acquired for this and previous related studies [
8,
33]. The study was conducted in accordance with Chapter 4 of the Act on Vehicle Drivers of the Republic of Poland and with the permit issued by the Provincial Police Department in Katowice. Before the study, the volunteers presented all the necessary documents confirming that they were allowed to participate in road traffic. The participants voluntarily gave their informed consent to participate in the study.
Data were acquired under real road conditions from 30 healthy subjects, including 20 experienced drivers and 10 students attending a driving school [
34,
35,
36]. There were 16 males and 14 females with an average age of 38 ± 17 who participated in the study. The complete dataset is available at the IEEE
DataPort [
37].
Each participant performed a driving test on a route of 28.7 km, which took approximately 75 min. The route was localized in the Silesian Voivodeship (southern Poland) in the cities of Tarnowskie Góry, Radzionków, Bytom, and Piekary Śląskie. The course of the route was determined in consultation with the driving instructor and on the basis of the rules of practical driving tests in Poland. While performing a test, the driver was wearing the smart glasses. The set-up of the experiment is presented in
Figure 1.
All data were labeled during the drive and then divided the into four groups regarding the type of the road (1: highway; 2: city road; 3: undeveloped area; 4: housing estate). The labeling process was performed manually by simply putting a marker when the particular type of road started and ended.
2.3. Data Preparation
The data acquired from smart glasses include signals from three axes of the accelerometer (
,
and
), signals from three axes of the gyroscope (
,
and
), and the four channels of the EOG signal (
,
,
and
).
Table 1 contains general statistics for all the acquired signals.
The magnitude of the acceleration was computed using the Euclidean norm of the 3D acceleration, as shown in Equation (
1) [
38]:
All signals were then filtered using the third-order median filter and rescaled to the range
using min-max normalization, as specified in Equation (
2):
where the min and max values were determined for each signal. All normalized sensor signals were then segmented with a sliding time window approach with a window length of 1 s (100 samples) and a 50% stride (50 samples). We applied the original label to each window. If the signal length was not divisible by 50, then the last samples were discarded.
2.4. Feature Extraction
For this problem, we decided to apply traditional feature engineering. For each window, the following features were calculated:
The feature extraction process resulted in 36,669 feature vectors of 48 dimensions (6 features for each of the 8 sensor channels) that were used to train, validate and test the classifiers.
Finally, feature selection was also performed using ANOVA on each feature separately to determine which ones maximized the distance between the four classes we used in our problem.
Figure 2 and
Figure 3 present the F scores and associated
p-values, respectively, for each feature on which ANOVA was applied.
Through the trial-and-error method, it was decided to select the 30 features that maximized the ANOVA F score, as it was the configuration that produced the best results.
2.5. Classification
For this study, we used a tool available in MATLAB called Classification Learner [
40]. It enabled quickly training, validating and testing many classifiers, tuning the parameters and comparing the results. The models available in this tool are different types of widely used supervised machine learning classifiers, whose list is provided below:
Classification trees;
Model with Gaussian, multinomial or kernel predictors (nearest neighbors);
K-nearest neighbor;
Support vector machine;
Boosting, random forest, bagging, random subspace and ECOC ensembles for multiclass learning (generalized additive model).
Data were randomly separated into training, validation and test datasets in proportions of 70%, 20% and 10%, respectively, and the validation scheme was holdout validation. We performed a series of tests on different classifiers and hyperparameters, compared them and chose the one that best handled the problem.
The preliminary experiments showed that most misclassifications were concerning examples from classes 1 and 2. To reduce their numbers and improve the performances of our models overall, we introduced classification penalty costs to make examples belonging to specific classes count more in the computation of the loss during the training of the models. The matrix of such costs is provided in
Table 2.
Table 3 presents the comparison of the overall performance parameters of the four best classifiers from the initial tests. A detailed comparison of results obtained with these classifiers can be found in
Appendix A.
The initial tests showed that the problem was significantly better when solved by an ensemble classifier. For this reason, we decided to continue working on improving only this model.
In order to obtain the best possible classification accuracy, we decided to conduct a hyperparameter optimization process. The method was Bayesian optimization, and the acquisition function was expected in terms of improvement per second.
Figure 4 shows the process of tuning the hyperparameters.
In this process, different sets of hyperparameters were tested, and the set with the smallest classification error was selected. The hyperparameters tested include the following:
Preset: specifies the type of the classifier to be used. The types available were boosted trees, bagged trees, subspace discriminant, subspace KNNs and RUSBoost trees.
Ensemble method: a method to meld the weak learners into a model with a high-quality. There was a different choice of ensemble methods for each preset.
Number of learners: this parameter defines a number of weak learners to use in the ensemble;
Learning rate: regulates the speed of the learning process. Using a smaller learning rate helped to make sure that the model was not overfitted;
Maximum number of splits: this parameter controlled the depth of the tree learners (i.e., “branch points”).
The best classification model that we found was ensemble classification with random undersampling boosted trees (RUSBoosted Trees) [
41], a random forest classifier where each weak learner (e.g., decision tree) is trained on a random subset of the whole training set that is undersampled when it comes to the dominant class. More specifically, RUSBoosted Trees iteratively trains a chosen number of weak learners, with each learner being trained on a subset of the training set that underwent two modifications: a random subsampling of the dominant class and a normalized weighting of the examples of the training subset, which is taken into account when computing the loss of the learner. The weights are initialized to follow a uniform distribution for the first iteration and then iteratively updated using an update parameter computed using the loss at the previous iteration. The model hyperparameters that were determined to be optimal in the end were as follows:
Preset: RUSBoosted Trees;
Ensemble method: AdaBoost;
Learner type: decision tree;
Maximum number of splits: 2078;
Number of learners: 451;
Learning rate: 0.81048.
3. Results
To evaluate the relevance of our trained classifier, we calculated some standard evaluation metrics computed from the confusion matrix of the classifier (shown in
Figure 5), which presented the number of examples from the test dataset classified into a specific group (predicted label) compared with their real class label (true label).
We also compute the accuracy, precision, recall and average F1 score, whose expressions are provided in Equations (
3)–(
6):
Finally, we used the receiver operator characteristic (ROC) as a tool to assess the correctness of the classifier. It provided a joint description of the sensitivity and specificity; in other words, it can be described as a graph of dependency between the true positive rate and false positive rate [
42,
43]. In a multi-class model, we could plot the N number of the area under the curve (AUC) ROC curves for the N number of classes using the one vs. all methodology. In our case, we had a class 1 threshold, minimizing or maximizing the distributions overlapping when the AUC was around 0.9, which means that there was a 90% chance that the model would be able to distinguish a positive class from a negative class [
44]. For comparison, the work related to driver drowsiness or anger evaluation with the ROC-AUC showed that the AUCs for the models were 0.904, 0.863 and 0.805, related to the threshold balancing classification between the AUC and ROC curves for drowsiness [
45], and 0.7914–0.8635 for driver anger evaluation [
46].
Figure 6,
Figure 7,
Figure 8 and
Figure 9 present the ROC curves for all four classes.
The values obtained for all the aforementioned evaluation metrics (accuracy, precision, recall, average F1 score and AUC) are provided in
Table 4.
The proposed approach yielded an overall accuracy of 87.64%. Among the four specific classes, housing estates were the best recognized. The highway was the least well-recognized class, although its recognition accuracy remained relatively high for a four-class classification problem at 83.77%.
4. Discussion
The topic of this paper was the development of a classification algorithm based on machine learning techniques for road type classification. The problem faced by this work was to determine whether physiological data acquired with smart glasses (EOG, ACC and GYRO signals) are suitable for classifying the type of road traveled by the person driving the car.
The best classifier that was found (RUSBoosted Trees) yielded a promising accuracy of 87.64% for a four-class classification problem. This relatively high accuracy indicates that physiological data acquired from JINS MEME smart glasses (EOG, acceleration and angular velocity) are sufficient to determine the type of road being traveled. It should be noted that the ANOVA analysis we performed on our features showed that the most relevant information to our classification problem resided in the IMU data, as shown in
Figure 2, and more specifically the angular velocity measured by the gyroscope. This might indicate that head movements (rather than eye movements) are one of the main factor that could lead to the distinction of a road type.
A possible axis of development is the investigation of additional physiological modalities that could be set up in an unobtrusive way for the driver to provide insights regarding the environment they are driving in. For instance, Leicht et al. [
47] investigated the monitoring of the heart rate (HR) and respiration rate (RR) of drivers in both urban and rural scenarios. More specifically, the efficiency of unobtrusive sensors for obtaining an estimation of both the HR and RR was evaluated in both real and simulated conditions by comparing the estimations derived from them to the readings of the reference HR and RR sensors. Under laboratory conditions, magnetic induction and photoplethysmography, both integrated into the seat belt, and hybrid imaging, combining visual and thermal imaging, were evaluated for respiration rate (RR) sensing. In real driving, the sensing of the RR by hybrid imaging and sensing of the heart rate (HR) by a seat-integrated capacitive electrocardiograph (ECG) were evaluated. Under laboratory conditions, reliable RR detection was possible using all three sensor technologies. In real-world driving, reliable HR and RR detection was possible during the rural scenario only. In the urban scenario, only RR detection was feasible. Due to motion artifacts, the capacitive ECG was disturbed, and the HR detection was impaired. The evaluated unobtrusive measurement systems can monitor physiological parameters during, for example, driving for a long time on highways, but they could not reliably accomplish this during agile inner city driving situations due to motion artifacts.
The overview literature devoted to research on transport infrastructure cannot be separated from the aspects of artificial intelligence that support the management and organization of modern transport and logistics. The field of computer science that deals with the practical application of algorithms for data analysis, based mainly on machine learning, aims to create an automatic system that, based on accumulated experience and knowledge, will be able to detect driving style patterns in the processed data, predict future events and also react to them, such as during real travel [
17]. Among the practical applications of machine learning systems, the following can be distinguished. First, research about recognizing elements in the image may also inspire analyzing single or multi-modal signals such as EOG [
48,
49], recognition of speech by detection using wearable devices around the head with all interference caused by such devices [
50], written text, navigation in an unknown area, recommendation systems, guidance, forecasting financial and economic trends, and more pioneering research can open up new areas [
51].
Finally, possible improvements regarding the machine learning aspects of the study could be tested. The investigation of several methods of signal analysis and their impact on the results might be advantageous [
52,
53]. More advanced feature extraction methods such as feature learning with deep neural networks (e.g. convolutional neural networks) [
29] could be investigated once the dataset size is increased to a point where enough data are available to properly train such models. Time-series transfer learning techniques to refine the performances of these models could also be investigated [
54].
5. Conclusions
In this paper, a machine learning method using physiological data acquired from smart glasses for the detection of road types while driving a car was presented. A pair of JINS Meme smart glasses collecting the EOG, linear acceleration and angular velocity was used by 30 subjects driving a car in real-life conditions. Statistical features were manually extracted from the data and used to train a classifier for the recognition of four different road types (city road, highway, housing estate and undeveloped area). A comparative study between various state-of-the-art classifiers was carried out and led to a best overall accuracy of 87.64% using boosted trees. Additionally, a feature importance calculation based on ANOVA showed that the most important features were coming from head movements.
Despite the very promising results obtained by our proposed approach, our study still has some major limitations. The most important one is that we used a single dataset for our analysis due to the lack of publicly available data that deal with a similar problem. Moreover, the dataset was limited in size because of sanitary restrictions caused by the COVID-19 pandemic, which might limit the generalization capacities of our method. Finally, it is currently not possible to compare our results to other studies due to the lack of past research working on a similar problem.
Future works will focus on increasing the amount of data used for such a study. This will first accomplished by resuming and extending the data acquisition campaign in real conditions that were interrupted by the COVID-19 pandemic. Alternative approaches based on acquiring synthetic data using simulators will also be tested, as they provide a relatively cost-effective way to acquire additional data that could be used to boost the generalization capacity of our trained classifiers [
55,
56,
57]. Finally, additional unobtrusive sensor-monitoring physiological modalities such as the HR or RR [
47] will be considered in future studies.