**1. Introduction**

An automotive safety system called driver drowsiness detection works to stop accidents when the driver is about to nod off. According to several studies, weariness may play a role in up to 50% of specific roads and about 20% of all traffic accidents. Numerous auto accidents are significantly influenced by driver drowsiness. According to recent figures, collisions caused by driver drowsiness result in an estimated 1200 fatalities and 76,000 injuries per year [1]. A significant obstacle in the development of accident-avoidance systems is the detection or prevention of tiredness. A biological condition known as drowsiness or sleepiness occurs when the body is transitioning from an alert state to a sleeping state [2]. At this point, a motorist may become distracted and be unable to make decisions like avoiding crashes or using the brakes in a timely manner. There are clear indicators that a motorist may be drowsy, such as eye blinking/inability to keep eyes open, wobbling the head forward, and frequent yawning. Three categories can be used to categorize drowsiness detection are vehicle-based, behavioral-based and physiological based [3].

Vehicle-based measure: Several measurements include lane deviations, steering wheel movement, abrupt pressure on the accelerator pedal, etc. [4]. These parameters are continuously tracked, and if they fluctuate or cross a predetermined threshold, it means the driver is drowsy. Behaviour-based measurement: The metrics used here are based on the driver's actions, such as yawning, closing of the eyes, blinking, head posture, etc. A

**Citation:** Parvez M., M.; Allanki, S.; Sudhagar, G.; R. S., E.R.; Santosh, C.; Mohammed, A.B.; Muqeet, M.A. Advanced Driver Fatigue Detection by Integration of OpenCV DNN Module and Deep Learning. *Eng. Proc.* **2023**, *34*, 15. https://doi.org/ 10.3390/HMAM2-14158

Academic Editor: Vijayakumar Anand

Published: 13 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

physiologically based measurement accesses the driver's conditions by attaching electrical devices to the skin. Electroencephalogram, Electrocardiogram, and Electrooculogram are examples of this.

Convolutional networks with three carefully chosen levels are used to suggest a new strategy for approximating the positions of important face points [5]. Two advantages exist: To discover every important spot, texture context data applied to the entire face is used initially. Second, the geometric relationships between key points are implicitly embedded in networks since they are adept at concurrently predicting all the critical points. The drawback is that while regionally sharing the weights of neurons on a single map enhances performance, global sharing of the weights does not perform well on images like faces or in some unusual poses or emotions.

Milla et al. [6] create a system that is not light-sensitive. For object detection, they employed the Haar algorithm, and for face classification, they used the OpenCV libraries' face classifier. Anthropometric parameters are used to determine the eye regions from the facial region. The amount of ocular closure is then determined by detecting the eyelid. The disadvantage is that this model performs poorly in dim lighting.

Omidyeganeh, M. et al. [7] used a camera to record face appearance. Face extraction from the image, eye detection, mouth detection, and alert creation are the four steps of the plan while asleep. Since a non-contact method is employed here, the system's shortcoming is that it depends on elements such as light, the camera, and other things.

Warwick et al. [8] demonstrated a wireless wearable sensor for drowsy driver detection systems. A drowsiness system is developed in two steps: the collection of physiological data using a biosensor and the identification of the essential components of drowsiness through analysis of the data. In the second stage, drowsy people are alerted via a mobile app and an algorithm that detects their tiredness. The heart and respiration rates of a driver are reliable indicators of weariness.

This approach describes a noncontact method for determining a driver's drowsiness by utilising OpenCV DNN module for detecting face and a convolution neural network model to classify his state.

#### **2. Methodology**

The entire process of the drowsiness detection system is carried out following image processing, which is a technique for carrying out some actions on an image. The traditional Drowsiness Detection System is depicted in Figure 1.

There are five steps in this system flow, and they basically consist of Acquiring video, detection of the face, detection of eyes and mouth, state assessment and at last categorisation into drowsy or non-drowsy. There are three fundamental steps in this approach. First, video is captured using the camera, then it is transformed into frames, then faces are identified, and last, deep learning is used to detect sleepiness. In comparison to contact-based methods, this technology is non-contact-based and will be inexpensive.

#### *2.1. Acquiring Video*

This uses the Kaggle Drowsiness dataset. This accessible dataset is meant to aid in research projects. The camera is used to capture the photographs. There are four classes in this dataset that are used to categorize the state of the individual. The dataset contains images with a resolution of 640 × 480 pixels.

**Figure 1.** Proposed approach.

### *2.2. Detection of Face*

Due to its simplicity, the Viola–Jones algorithm is frequently employed for face detection. Here, Haar characteristics are retrieved to identify facial, bodily, skeletal, and other markers [9]. As illustrated in Figure 2, Edge, line, and four-sidedness are the three Haar-like properties. The Haar-like feature needs to provide you with a higher score because our faces have complicated forms with darker and brighter areas. Computation of Haar features comprises comparing pixel intensities within predefined rectangular regions which may be either non-negative (white) or negative (black).

**Figure 2.** Features of Haar (**a**) Edge (**b**) Line (**c**) Four rectangles (**d**) Face detection [9].

For that integral picture calculates the sum of pixel values in an image or rectangular section of an image, which helps us accomplish these laborious computations fast and effectively. Even after using the integral image approach, each image sub-window still has more than 180,000 rectangle features. Even though it is very efficient to compute each feature individually, doing it for the entire collection of features is expensive.

To find the optimal features, the AdaBoost method is employed [10]. In the end, the algorithm creates a simple standard to determine if a property is considered to be valuable.

Another choice is to use the "Dlib" library for face detection. Dlib is an open-source platform for machine learning applications. The 68 coordinates (x, y) that map the facial locations on a person's face are estimated using the Dlib. It is a facial detector with trained models for landmarks. The pre-trained model using the iBUG300-W dataset was used to locate these spots.

The multi-task cascaded convolutional network (MTCNN) model is an additional technique for face detection. This model [11] can retain real-time speed while outperforming several face-detection benchmarks. Three convolutional networks are present (P-Net, R-Net, and O-Net). High accuracy is possible when a deep neural network is used. Using three networks, each with many layers allows for more precision because each network can adjust the results of the one that came before it. This model also locates both large and small faces using an image pyramid. NMS, R-Net, and O-Net all assist in the rejection of several inaccurate bounding boxes despite the potentially overwhelming amount of data generated.

The most recent version of OpenCV has a deep neural network (DNN) module with an excellent pre-trained convolutional neural network (CNN) for face detection. The new model performs better in face detection when compared to earlier models like Haar. It is a Caffe model based on the single-shot multibox detector with ResNet-10 architecture at its heart (SSD).

#### *2.3. Drowsiness Detection:*

The system's classification procedure involves drowsiness detection. For classification, many machine-learning techniques have previously been created. You can choose one of two classification kinds. (i) Determine if the mouth or eyes are open or closed by analysing the eye and mouth region of interest. (ii) Examine the full area of the face that is of interest. Here, using a modified convolution neural network [12], a novel training approach is developed. CNNs are crucial resources for deep learning and are particularly well-suited for studying picture data.

Convolutional networks are composed of input, output, and perhaps one or more hidden layers as shown in Figure 3. In contrast to a regular neural network, a convolutional network's layers include neurons organized in three dimensions (width, height, and depth dimensions). On account of this, CNN can transform a three-dimensional input volume into an output volume. The hidden layers are composed of convolution layers, pooling layers, normalizing layers, and completely.

**Figure 3.** Modified CNN Model.

### **3. Result**

In addition to other research projects, this effort requires a variety of deep learning techniques. The first goal of this effort is face detection with occlusions. There were four implemented algorithms on Figure 4. Red squares are utilized to provide a visual representation of the outcomes generated by the algorithms.

**Figure 4.** Test image [13].

Although the Viola–Jones algorithm executes relatively quickly, it cannot recognize side faces and produces a greater number of false positives as shown in Figure 5.

**Figure 5.** Face detection using Viola–Jones [13].

Dlib cannot distinguish between small faces. Dlib's facial detector had an issue because it cannot identify faces smaller than 80 × 80 pixels. Since the images were so small, the faces were even smaller, as illustrated in Figure 6. Because the face size cannot be very small and up-sampling the image may lengthen processing time, the image was scaled up by a factor of 2 for testing. However, this is a major issue when using Dlib.

**Figure 6.** Face detection using Dlib [13].

MTCNN slightly outperformed Dlib in terms of outcomes, although Dlib cannot recognise very small faces. Additionally, MTCNN may produce the greatest results if the size of the images is sufficiently large and adequate illumination, minimal occlusion, and primarily front faces are guaranteed, as shown in Figure 7.

The Caffe model from OpenCV used by the DNN module is the best. It is also a helpful parameter in the sleepiness detection approach that has been developed, and as shown in Figure 8, it functions well with occlusion, fast head movements, and the ability to recognize side faces.

**Figure 7.** Face detection using MTCNN [13].

**Figure 8.** Face detection using DNN [13].

The results are acquired using an AMD Ryzen 7 5800H processor and an Nvidia RTX 3050 GPU, and the images passed are all 640 × 360 pixels, with the exception of the DNN module, which is still receiving 300 × 300 pixels as usual. Frame rate [14] for different methods is illustrated in the Table 1.



Figure 9 the model performance during training process. The red line indicates the validation accuracy per epoch and the blue line indicates training accuracy per epoch of the model.

Table 2 illustrates how the suggested system performs better than the current drowsiness detection techniques [15,16].

**Figure 9.** Model performance after 50 epochs of training.



#### **4. Conclusions**

This approach will go through a method for determining a driver's level of intoxication in this section. For face detection, it was shown that the OpenCV DNN module outperformed Viola–Jones, Dlib, and MTCNN. The categorization process employs a modified CNN. The accuracy rate of the system is 96.8%. By evaluating the validation accuracy while using the validation dataset for model training, the model was proven to be accurate. In the future, the detection can be improved by using an infrared camera for low-light situations. Counting the frequency of yawns over a certain amount of time might also help identify drowsiness. Further, one can utilize a multi-model machine-learning approach and include additional modalities, such as the audio channel, in addition to the video frames, to enhance performance.

**Author Contributions:** All authors contributed to the study conception and design. Data collection and analysis were performed by M.P.M., S.A., G.S., E.R.R.S., C.S., A.B.M. and M.A.M. The first draft of the manuscript was written by S.A. and M.P.M. and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Research Data sets for this submission was used from (https://www. kaggle.com/datasets/dheerajperumandla/drowsiness-dataset, accessed on 20 November 2022).

**Conflicts of Interest:** The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
