1. Introduction
The American National Highway Traffic Safety Administration (
https://www.nhtsa.gov (accessed on 4 August 2021)) has an estimated 100,000 accidents reported each year mainly due to drowsy driving. This results in more than 1550 deaths, 71,000 injuries, and 12.5 billion dollars of property damage. According to the National Safety Council (
https://www.nsc.org (accessed on 4 August 2021)), 13% of drivers admitted to falling asleep behind the wheel at least once a month and 4% of them resulted in accidents. Morgenthaler et al. announced that drowsiness is one of the main causes of traffic accidents in their study [
1]. It is estimated that about 10–15% of car accidents are related to lack of sleep. The sleep questionnaire obtained from professional drivers [
2] showed that more than 10.8% of drivers are drowsy while driving at least once a month, 7% had caused a traffic accident, and 18% had near-miss accidents due to drowsiness. These alarming statistics point to the need for capable systems for monitoring drowsy drivers to prevent unfortunate traffic accidents that may occur.
In recent years, building intelligent systems for drowsy driver detection has become a necessity to prevent road accidents. Therefore, it requires a lot of research to design robust alert methods to recognize the level of sleepiness while driving. Many studies focused on constructing the smart alert techniques for intelligent vehicles that can automatically avoid traffic accidents caused by falling asleep, as illustrated in
Figure 1. Rateb et al. [
3] introduced real-time driver drowsiness detection for an android application using deep neural networks. A minimal network structure was proposed based on facial landmarks to identify drowsy drivers. The method presented a lightweight model and achieved an accuracy of more than 80%. This study focused only on eye facial landmarks without detecting the yawning of the drivers. Moreover, the method was based on a multilayer perceptron classifier with three hidden layers, which is a limitation that leads to low accuracy. Fatigue detection using Raspberry Pi 3 was provided by Akalya et al. [
4] by processing driver’s faces and eyes images. A Haar cascade classifier was applied to detect the blink duration of the driver, and the eye aspect ratio (EAR) was computed by the Euclidean distance between the eyes.
Mohana and Sheela [
5] presented a method of drowsiness detection based on eye closure and yawning detection. They recognized eyes and mouths on faces to detect eye closure and yawning. The limitation of this work is that the eye blink threshold is fixed and the boundary yawn value is 10, consecutively. Jie and Lau [
6] proposed a vision-based real-time driver alert system for monitoring drivers’ drowsiness and distraction conditions. However, this study only focused on eye characteristics, ignoring yawns, and the eye-opening threshold was also fixed, leading to limitations on the recognition of people with small eyes. In addition, if the drivers did not continuously stare at the camera for 5 s, the alert threshold was set to 0; thus, it was not possible to detect sleepy drivers. Ramos et al. [
7] presented a method based on eye movement and yawning using facial landmarks to accurately detect driver drowsiness. However, the limitation of this study also concerns the fixed eye-opening threshold, which is not suitable for people with eyes smaller than the threshold. Sukrit et al. [
8] provided a driver drowsiness detection system using a random forest classifier based on eye aspect ratio and eye closure ratio. The method was able to achieve an accuracy of 84%. Shivani et al. [
9] used the Haar cascade algorithm for driver drowsiness detection by calculating the eye aspect ratio and a blink counter variable. The driver was determined to be drowsy when the counter reached a threshold value. The Haar cascade classifier for drowsiness detection is a classical approach to compute the eye opening–closure ratios. It usually requires parameter tuning when it is applied for drowsiness detection.
Many studies based on the deep learning approach using convolutional neural networks (CNNs) have been introduced to detect drowsy drivers. A drowsiness detection method using CNN-based machine learning for android applications was introduced by Rateb et al. [
10]. The method detected facial landmarks by a camera and passed through a CNN model to detect drowsy driving, with an average accuracy of 83.33%. Zuopeng et al. [
11] introduced a driver fatigue detection method using a proposed EM-CNN to detect the states of the eyes and mouth from the region of interest images. The proposed algorithm, EM-CNN, showed an accuracy of 93.62%. Biswal et al. [
12] proposed an IoT-based smart alert system for drowsy driver detection using Raspberry Pi3 and Pi camera modules to make a persistent recording of face landmarks for eye detection. This method was based on the idea of determining blinks through an eye aspect ratio (EAR) and Euclidean distance of the eye. Although this method reached a high accuracy of 97.1% for the experiment, it had many limitations for practical systems because it only focused on eye detection, and an EAR threshold was pre-defined and unchanged for all drivers (
) to detect drowsiness. Ajinkya et al. [
13] introduced a driver drowsiness detection method using deep learning with an average accuracy of 96%. This method included two stages of the pre-processing and drowsiness detection. Haar feature-based cascade classifiers, a machine learning-based approach, were used to detect the mouth and eye regions of the drivers in pre-processing. To identify drowsiness, the frames of the mouth and eye regions were then forwarded to the proposed CNN models, which were the basic CNN models using four conv2d, four max-pooling layers, and two dense layers to identify the state of blinks and yawns in the given time threshold. There were only two features of the eyes and mouth trained on these two CNN models for drowsiness detection without considering other physiological factors. Moreover, experiments were implemented on a small dataset of 2423 subjects, including 1192 people with closed eyes, and 1231 people with open eyes. In addition, Madhav et al. [
14] presented a deep learning approach to detect driver drowsiness. This method extracted the characteristics of the driver’s eyes on each frame using Dlib’s API and passed through a classification model to predict the state of drowsiness. Adam was used as the optimizer and the average accuracy reached 94%.
Unfortunately, most of these studies focused on analyzing the mouth and eye regions to detect blinks and yawns without considering other regions of the head and face. Therefore, they are not accurate enough and have revealed many limitations on drowsiness detection for actual systems, because dozing is a natural state in the human body amongst other behaviors. Moreover, there is a problem surrounding physiological measures in these works, in that they may not be feasible in practice. They are hard to apply to actual systems, since the measuring devices are not available on vehicles and are often uncomfortable for drivers. Drowsiness is a natural phenomenon in a human body that happens due to different factors causing distraction. It is not merely determined by recording the number blinks or yawns. Meanwhile, the deep learning approach is able to learn all features automatically for drowsiness detection. In this paper, we propose two approaches for detecting drowsiness using the techniques of facial landmark identification and deep learning. We make the following contributions: (1) the collection of a dataset of drowsiness and non-drowsiness from images and videos monitored through a camera for experiments; (2) the proposal of two methods for drowsiness detection and prediction using facial landmarks and deep learning. In the facial landmark-based method, we improve drowsiness detection by determining the appropriate thresholds of blinks and yawns for each driver. In the deep learning-based method, we propose the use of two adaptive deep neural networks with the transfer learning approach for drowsiness detection. We designed and perfected these networks developed on the advanced networks of MobileNet-V2 and ResNet-50V2, which are more efficient in terms of memory and complexity. The proposed networks are very good feature extractors, since they can capture and learn relevant features of drowsiness automatically; (3) the use of a transfer learning approach to well solve the problem of fast training, a small training dataset and accuracy improvement; (4) the comparison and discussion the performance, accuracy, and advantages of the proposed methods with other methods. The experimental results show that the proposed methods achieve an accuracy of 97%. It can be concluded that the proposed method using deep learning has similar efficiency to other effective methods using a combination of behavioral and physiological features, but it is more feasible and has outstanding advantages.
The rest of the paper is presented as follows:
Section 2 presents the background on identifying facial features and deep learning models. Details of the proposed methods are presented in
Section 3. We provide some experimental results and evaluation in
Section 4. Finally, we give conclusions in
Section 5.
5. Conclusions
Most of the traditional methods for drowsiness detection are based on behavioral factors, while some require expensive sensors and devices to measure sleepiness, and may even interfere with the driving process, distracting drivers. Therefore, in this paper, we propose two methods with three scenarios for driver’s drowsiness detection systems. The proposed method with scenario 1 uses facial landmarks to detect drowsiness. This method analyzes the videos and detects drivers’ faces in every frame using image processing techniques. Facial landmarks are determined in order to compute the eye aspect ratio (EAR) and the mouth-opening value () to detect drowsiness based on adaptive thresholds. We propose an improvement of the eye-opening threshold for each driver without using a pre-defined threshold for everyone, as in preceding studies. In each video frame, the frequent detection of eye blinking and yawning will help to properly compute the drowsiness level. The driver is alerted when the blink and yawn thresholds reach the adaptive maximum thresholds. However, the drowsiness detection based on blinking and yawning is not accurate enough, since drowsiness has different surrounding factors. Therefore, we propose method 2, with two scenarios for a drowsy alert system using deep learning techniques with the transfer learning approach. We design two adaptive deep neural networks developed from MobileNet-V2 and ResNet-50V2 for scenarios 2 and 3, respectively. Method 2 analyzes the videos and detects the driver’s activities in every frame to automatically learn all features for drowsiness detection. It takes advantage of deep neural networks to extract all features and movements of the head and face. This method does not require the definition of input thresholds as in method 1, especially the eye-opening and yawning thresholds. Additionally, method 2 uses a combination of many typical signs of drowsiness to give accurate results, such as eye-opening, head movements, eyebrows, mouth, etc. Moreover, we leverage the advantage of transfer learning to pre-train the proposed networks on datasets of Bing Search API, Kaggle, and RMFD. We then use the pre-trained weights and re-train them on our training dataset to fine-tune the parameters of these networks. This helps to solve the problem of small training datasets and gives a fast training time.
Experiments were conducted to test the efficacy of the proposed approaches. The results show that the proposed methods can achieve a high accuracy of 97% using deep learning techniques. Method 2 with scenario 3 provides more accurate results than scenario 2 and method 1 because it detects most of the drowsiness in various experimental contexts. Some experimental comparisons between the proposed methods and the preceding methods are made to discuss the advantage and limitation of the proposed methods. We also highlight the limitations of the preceding studies that affect the effectiveness in actual applications. Method 2 improves the accuracy of drowsiness detection compared with eyelid and mouth movement-based methods.
From the above results, the proposed method using deep learning techniques can be useful for monitoring the fatigue of drivers to give early warning of falling asleep, avoiding unfortunate traffic accidents. The experimental results show that the method is feasible and adaptable to the development of applications for doze alert systems, especially mobile applications. This study helps to prevent automobile accidents caused by falling asleep behind the wheel. However, a key requirement for drowsiness detection is that the solutions work in real time or near real time. Thus, we will make an improvement in our further research using big data analysis to build a real-time system for drowsiness detection.