1. Introduction
Currently, car manufacturers devote much of their attention to equipping cars with new integrated safety features. These features include avoiding collisions, pedestrian detection, lane change warning, driver feedback, even semi-autonomous driving, among others.
With the advance of technology, the car can now infer dangerous behaviors such as drowsiness by using advanced sensors integrated in the vehicle (for example, night cameras, radars, ultrasonic sensors) [
1]. In addition, some high-end cars can activate automatic steering when the car moves to another lane without a safe warning (for example, the driver did not turn on the turn signal to indicate a line change), and the car can even brake before dangerously approaching the car ahead by means of the assistant of automatic braking. However, only a small fraction of automobiles present these warning systems to the driver and, even with all these safety features, one of the biggest problems that has not yet been solved is the “distracted” driver, which continues to have significant repercussions throughout the world due to the accidents generated.
According to the National Highway Traffic Safety Administration (NHTSA) [
2], around 3450 people were killed and 391,000 were injured in motor vehicle accidents with distracted drivers in 2016 and approximately 481,000 drivers were using their cell phones while they were driving, which is a potential danger to drivers and passengers, as it can cause deaths or injuries on the U.S. roads.
According to the National Highway Traffic Safety Administration (NHTSA) [
2], around 3450 people died and 391,000 were injured in car accidents with distracted drivers in 2016 and approximately 481,000 drivers participated in the use of their cell phones while driving, which represents a potential danger to drivers and passengers, as it can cause deaths or injuries on the roads in the U.S.
Therefore, the popularity of mobile devices has had some unplanned and even dangerous consequences, since distracted drivers accounted for only 8.5% of total deaths in 2017 [
3], and mobile communications are now linked to a significant increase in distracted driving, which is a serious and growing threat to road safety, causing injuries and loss of life [
4].
Based on this, detecting driver distraction, specifically the use of the cell phone while driving, can help increase drivers’ awareness of their driving habits and associated risks, and thus help reduce sloppy driving and promote safe driving practices.
A large number of studies have focused on measuring the effects of distracted driving among the four types of driver distraction: visual (when drivers do not look at the road for a period of time); cognitive (reflecting on a topic of conversation as a result of talking on the phone, instead of analyzing the situation on the road); physical (when the driver holds or operates a device instead of driving with both hands, or dials from a mobile phone or leans to tune to a radio that can lead to the address); and auditory (responding to a ringing mobile phone, or if a device is activated so loudly that it masks other sounds, such as ambulance sirens) [
5].
A term that has been studied with great interest due to its negative effects is “texting and driving”, which is defined as the act of holding/using the cell phone in front of a person near the chest, and is likely to cause accidents. This term can be differentiated from talking on the cell phone while driving, since the focal point of the driver’s eye is not the same, so it is possible to identify this action according to the movement of the eyes.
On this basis, and trying to prevent accidents related to text messages and driving, a methodology to detect such distracted behavior is proposed. This research presents a ubiquitous oriented methodology based on deep learning: convolutional neural networks (CNN) to detect drivers who use the cell phone while driving. We intend to use a wide-angle camera mounted on the roof to send images of the driver to CNN and detect distraction. Once implemented, manufacturers can use this methodology to detect and avoid accidents caused by “text and handling” behavior, thus reducing deaths.
In the next sections, the details of the proposed methodology will be presented. In
Section 2, related work is shown, followed by
Section 3, where the proposed methodology is presented.
Section 4 presents the results obtained and, finally,
Section 5 provides the discussion, while the final section presents the conclusions.
2. Related Work
According to the literature, several works have been developing different research in order to address the problem related to “texting and driving”.
Li et al. [
6] investigated the pattern of eye movement of drivers in a process of collision avoidance from the rear under the influence of driver distraction induced by cell phones using the driving simulator of the Beijing Jiaotong University (BJTU), which was projected through a 300-degree front/peripheral field of view at a resolution of 1400 × 1050 pixels, and the eye-tracking system mounted on the head, Eye tracking glasses SensoMotoric Instruments (SMI ETG), to collect eye movement data. The study showed that the distraction of tasks that were not of visual demand could also interrupt the visual attention of the drivers to the source of danger due to the multiple tasks superimposed between the manual operation (adjustment of the force exerted on the brake pedal) and the cognitive activities (calculation of arithmetic problems).
Atiquzzaman et al. [
7] developed an algorithm to detect driving behavior in two distracting tasks, sending text messages and eating or drinking. Using a simulator, subjects performed a series of tasks while driving the simulation; then, 10 characteristics related to vehicle acceleration, steering angle, speed, etc. were used as input data. Then, three data mining techniques were used: linear discriminant analysis (LDA), logistic regression (LR) and support vector machine (SVM). Results showed that the SVM algorithm outperformed LDA and LR, in the detection of distractions related to texting and eating/drinking with an accuracy of 84.33% and 79.53%, respectively. The false alarm rates for these SVM algorithms were 15.77% and 23.54%, respectively.
On the other hand, telephone conversations while driving are another type of distraction. Iqbal et al. [
8] focused on proactive alert and mediation through voice communication. The authors studied the effect of indicating the next critical conditions of the road and placing the calls on hold while driving; quantitative data was collected, such as the number of errors during the turn and collisions, as a measure of driving performance. Results show that context-sensitive mediation systems could play a valuable role in focusing the driver’s attention on the road during telephone conversations.
Klaner et al. [
9] studied the relationship between the performance of secondary tasks, including the use of cell phones, and the risk of collisions and near collisions, measured through different sensors such as accelerometers, cameras, global positioning systems, among others. The results obtained show that the most critical risk of a collision or a near collision between novice drivers is when they were dialing a number on a cell phone (odds ratio = 8.32, 95% confidence interval (CI), 2.83 to 24.42), reaching an object other than a cell phone (odds ratio = 8.00; 95% CI, 3.67 to 17.50) and among experienced drivers; the dialing of a mobile phone was associated with a significantly greater risk of a fall or near-fall (Probability ratio = 2.49, 95% CI, 1.38 to 4.54).
Gliklich et al. [
10] described the frequency of the cell phone related to distracted driving behaviors. They reported a reading or writing activity on the cell phone within the previous 30 days, with reading texts (48%), writing texts (33%) and viewing maps (43%) reported more frequently. Only 4.9% of respondents had enrolled in a program aimed at reducing distracted driving related to the cell phone.
Based on this, some technology approaches intend to reduce the dangers of distracted driving, working to dissuade people from texting while driving.
An example of the technological advances that try to reduce the accident rate with text messages is the Driver’s Distraction Prevention Dock (DDD Dock). DDD Dock is a device that can be used as a method to prevent drivers from texting while driving, putting the driver’s phone out of sight and out of reach. The car will not start unless your phone is on the dock. The device works by linking to the phone, and, if the phone is removed from the base, a notification is sent to the device administrator [
11].
In addition, “The SmartSense” is a sensor based on advanced algorithms based on computer vision and combines them with motor data, telematics, accelerometer and analytical data based on SmartDrive cameras to help solve the epidemic. Distracted and unattractive driving at the root of the problem. The system interprets unit tracks that indicate distraction, such as movements of the head and eyes, and activates a video, which is prioritized and downloaded for immediate verification and intervention [
12]. Among the applications-based solutions, Wahlström et al. [
13] present a 10-year comprehensive study of smartphones and their use in the automotive industry; among the risk factors, having a telephone conversation can increase the risk of shock factor by a factor of two, while sending text messages is even more distracting [
13,
14]; this is due to the fact that sending text messages is a cognitive task. The effect of using the cell phone while driving at the cognitive level is shown in the research proposed by Jibo et al. [
15]; the author presented a novel study to study the mutual influences of driving and text messages. For this, the subjects were asked to perform a lane change task using a simulator; then, measurements of the driving performance were taken, such as the mean and the standard deviation of the lane deviation; the results showed a greater lane deviation to sending text messages and driving. Trying to detect and prevent accidents caused by the cognitive distraction of texting and driving, several investigations have been proposed; among them, Watkins et al. [
16] present a methodology based on applications to detect texting and driving; the authors focused on the change of keystrokes and how the act of texting is affected while driving rather than how it is affected by the sending of text messages alone; the authors developed an application that recorded each event. Then, the pattern of the distribution of beats with text messages without driving was compared. This showed greater entropy in the distribution of key presses when texting and driving, demonstrating the cognitive complexity of the task. Bo et al. [
17] presented a methodology for detecting texting and driving through non-intrusive detection. TEXIVE is a system based on inertial sensors and magnetometers integrated in common smartphones to distinguish the drivers from the passengers through the recognition of rich micro-movements of smartphone users. Initially, the system detects when a subject is driving and, once the system detects such behavior, it is able to distinguish when the user is sending text messages by means of the speed and accuracy of text messages (writing). The precision reports a value of 87.18%. On the other hand, Chu et al. [
18] used smart phone sensors coupled to a support vector machine algorithm to detect if the subject was a passenger or the driver; the authors reported an accuracy of
using cross-validation. However, the authors did not focus on text messages and the driver, but on detecting the act of driving (pushing the accelerator pedal). Later, Mantouka et al. [
19] presented a methodology to detect unsafe driving profiles, including driving aggressively, being distracted from the task of driving (use of the mobile phone) and developing risk behavior (speed) while he drives. The methodology was based on an application that the user must use while driving, the application collects data (acceleration/km, Brakes/km, smoothness indicator, acceleration standard deviation, percentage of mobile use, percentage of speeding), and then the K-means that the algorithm grouped as aggressive and non-aggressive travel, nevertheless, no metrics regarding “texting and driving” were presented.
Related to accident prevention, Dai et al. [
20] developed a system to detect drunk driving using only the smartphone’s accelerometer by characterizing the drunk driver’s behavior. Händel et al. [
21] discussed the technological aspects of vehicle insurance telematics and the use of smartphones as a platform for vehicle insurance telematics [
22].
Finally, on the commercial side, several approaches have been presented, TrueMotion [
23] presented a smartphone app that monitors driving behavior using sensors in the smartphone. The app then gives an overall safety score based on the habits behind the wheel. Later, TrueMotion developed an app called “Mojo” that provides feedback and incentives to help users reduce distracted driving. The app provides an overall score that characterizes how distracted a user is while driving. The app runs in the background on a driver’s phone and uses sensor-based algorithms to capture and break down distracted driving into three categories: typing, handheld calls and hands-free calls [
24]. DriveWell is an app developed by Cambridge Mobile Telematics (Cambridge, MA, USA) that uses phone sensors to records trip data and using telematics and machine learning infer key metrics about vehicle mileage, road types, speed, acceleration patterns, phone distraction, and collisions. Contrary to TrueMotion, DriveWell can aggregate the data and provide a driver/passenger classification. Moreover, this app showed a
reduction in phone usage once the app is used [
25]. Similar to DriveWell, Floow Score is an app developed by The Floow Limited (Yorkshire, UK), which uses the phone sensors and telematic data, to score speed, smoothness of driving, distraction and time for the driver; this app is mainly used to aid insurance companies with pricing their policies and predicting risk for each driver more accurately; the app also provides tips, education, and coaching, in order to improve the score and drive more safely. A software developer kit (SDK) is also available to provide data collection, storage, scoring, etc. and generate custom solutions [
26].
4. Results
The accuracy obtained by the CNN is shown in
Figure 5, where the blue line represents the behavior of the training data, obtaining a final accuracy of 0.93, while the orange line represents the behavior of the testing data, obtaining a final accuracy of 0.98.
In contrast,
Figure 6 shows the loss function, where the blue line represents the behavior of the training data, obtaining a final value of 0.12, while the orange line represents the behavior of the testing data, obtaining a final value of 0.12.
The ROC curves obtained are shown in
Figure 7, where the training subset presented an area under the curve (AUC) value of 0.89 (red line), the testing subset presented an AUC value of 0.88 (green line), and the blind subset presented an AUC value of 0.86 (blue line).
In
Figure 8, the confusion matrix of the CNN for the training dataset is shown, obtaining a sensitivity of 0.81, specificity of 0.96, PPV of 0.87 and kappa value of 0.80.
For the testing dataset, the confusion matrix of the CNN is shown in
Figure 9, obtaining a sensitivity of 0.81, specificity of 0.95, PPV of 0.86 and kappa value of 0.79.
Finally, for the blind dataset, the confusion matrix of the CNN is shown in
Figure 10, obtaining a sensitivity of 0.97, specificity of 0.75, PPV of
and kappa value of 0.57.
5. Discussion
The proposed methodology is able to demonstrate the effectiveness of deep learning in the smart connected cars. The CNN of this work is able to detect distracted drivers while performing dangerous tasks such as texting and driving.
According to the results obtained, the CNN presents a generalized behavior since the evaluation using the test dataset presents very similar values to the obtained using the train dataset, even when the images were not seen by the CNN in the training stage, suggesting the high degree of detection once the system is fully trained.
On the other hand, in order to validate the CNN performance in a more realistic scenario, a blind dataset was used, where the data contained only unseen samples from different subjects, vehicles and cellphones, achieving an accuracy of 0.80, sensitivity and specificity of 0.97 and 0.75, respectively, and kappa value of 0.57. As it is evident, the CNN performance presents a decrease in its values due to the complete change of the environmental conditions; however, the CNN is still able to detect the texting and driving action with statistically significant performance.
From the confusion matrices for training and testing presented in
Figure 8 and
Figure 9, it is possible to observe that the performance of the CNN for the training and testing stages is very similar, with less than 1% of a difference in the false positive (FP) and, when the CNN is compared with the blind data set presented in
Figure 10, an increase of 15.5% in the false negative (FN) with respect to the original testing data set is obtained; nevertheless, the FP, with a value of 0.06% in the blind test, remained within the original training values; therefore, even with different conditions, the CNN is able to precisely detect the “texting and driving” condition with a low rate of FN and FP.
From the validation stage,
Figure 7 shows that the performance of the training and testing is also very similar, obtaining both AUC values higher than 0.88, being statistically significant, as well as the value obtained with the blind dataset, which is higher than 0.86, demonstrating again that the CNN is capable of detecting the “texting and driving” task in different conditions, suggesting the robustness of the CNN model.
In
Table 2, a comparison of the presented research against two similar approaches is shown. These approaches are “SmartSence” and “TEXIVE”, where, from the SmartSence, no performance is disclosed and, from the TEXIVE, the authors present an accuracy of 87.18%; however, even when TEXIVE presents statistically significant performance, it is based on the assumption that the subject always carries the phone in the same position (i.e., left pocket) prior to entering the car, and it is placed or held in the same place inside the car. Chu et al. [
18] focused instead on the detection of driving and did not detect distracted drivers. Lastly, Mantouka et al. [
19] detected aggressive driver detection; however, the author did not disclose any performance regarding distracted driving. From this table, it is possible to observe that the methodology presented in this work exceeded any other similar approaches; in addition, it is not necessary to force the driver to carry the phone in any way or special place.
The present study limits the system to a green box delimited by return on investment, shown in
Figure 2b; this restriction was selected as the most common place to hold the phone while texting, since, in order to write a text, the subject should see the screen of the phone; therefore, holding the phone outside of the ROI and texting represent only a small fraction of the total cases; nevertheless, future research will focus on addressing such scenarios. As mentioned in the related work in
Section 2, several approaches have been proposed to solve the problem of distracted drivers [
13,
15,
16,
17,
18,
19]. Most of the approaches are based on applications; therefore, it does not require any modification of hardware, while our approach uses fixed sensors for vehicles, while we can think that this implementation will have a higher cost; the CNN used in this investigation can export to low-cost computers like Raspberry Pi 3 and can even be incorporated into an Android application, reducing the cost gap only at the cost of the wide-angle camera. In addition, our approach does not analyze “writing” patterns such as those proposed by He et al. [
15] nor does it send any telematics outside the vehicle such as the SmartSense [
12] approach, thus maintaining privacy. If we compare our approach with He et al. [
15], if changes in the pattern of text messages are used to detect driving, our approach does not require any text message patterns. That is, we can detect when the user is distracted even when the user scrolls down in certain applications, sees a video, plays a game, etc. Furthermore, we do not force the user to enter from a specific door or take the cell phone to a particular place (i.e., the right pocket), according to the research proposed by Chu et al. [
18]. Another aspect is that the proposed approach is not based on cloud-based telematics such as [
12,
21,
23,
25,
26]; all the processing is done on the site, keeping, as shown, the fixed sensors in the vehicle. They have huge advantages over application-based approaches; however, we do not believe that both approaches fight each other, but we think that the both approaches can benefit from each other; for example, our approach can work backwards (i.e., detect driving instead of sending text messages), and communicate with the mobile phone to inform when the driver was driving, and the mobile phone in combination with other application-based approaches can use that information to block notifications, calls and other distraction events, so that the proposed methodology can be more complete for the user.
The industry has driven some approaches based on applications such as TrueMotion, DriveWell, Floow Score, among others [
23,
25,
26]. However, these approaches focus on the driver’s score to evaluate the risk on the part of the insurance companies. Some of them present some feedback to the driver. In some cases, the user of this feedback reduced the use of the cell phone as in DriveWell [
25]. Nevertheless, this change is achieved by increasing the policy insurance or labor penalties for those who use the cell phone. Contrary to this, our approach does not require that the data be collected, transmitted and analyzed to “score” a driver, but our approach is based on the detection of users in real time. With this approach, we can send feedback in real time to avoid such use (i.e., steering wheel vibration, dashboard warning, etc.). With this warning, we could reduce the use of the cellphone while driving, and, contrary to industry approaches, we can prevent accidents related to the distraction, and not only calculate a “risk” score. In addition, as indicated above, our approach can be used with other applications as a complement to assess driver behavior and calculate individual risk.