Gaussian Weighted Eye State Determination for Driving Fatigue Detection

Xiang, Yunjie; Hu, Rong; Xu, Yong; Hsu, Chih-Yu; Du, Congliu

doi:10.3390/math11092101

Open AccessArticle

Gaussian Weighted Eye State Determination for Driving Fatigue Detection

by

Yunjie Xiang

¹,

Rong Hu

^2,*,

Yong Xu

²,

Chih-Yu Hsu

^3,4 and

Congliu Du

⁵

¹

Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou 350118, China

²

Fujian Provincial Key Laboratory of Big Data Mining and Applications, School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China

³

School of Transportation, Fujian University of Technology, Fuzhou 350118, China

⁴

Intelligent Transportation System Research Center, Fujian University of Technology, Fuzhou 350118, China

⁵

State Grid Tibet Electric Power Research Institute, Lhasa 850000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(9), 2101; https://doi.org/10.3390/math11092101

Submission received: 20 March 2023 / Revised: 17 April 2023 / Accepted: 20 April 2023 / Published: 28 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

Fatigue is a significant cause of traffic accidents. Developing a method for determining driver fatigue level by the state of the driver’s eye is a problem that requires a solution, especially when the driver is wearing a mask. Based on previous work, this paper proposes an improved DeepLabv3+ network architecture (IDLN) to detect eye segmentation. A Gaussian-weighted Eye State Fatigue Determination method (GESFD) was designed based on eye pixel distribution. An EFSD (Eye-based Fatigue State Dataset) was constructed to verify the effectiveness of this algorithm. The experimental results showed that the method can detect a fatigue state at 33.5 frames-per-second (FPS), with an accuracy of 94.4%. When this method is compared to other state-of-the-art methods using the YawDD dataset, the accuracy rate is improved from 93% to 97.5%. We also performed separate validations on natural light and infrared face image datasets; these validations revealed the superior performance of our method during both day and night conditions.

Keywords:

fatigue driving; eye-based fatigue state dataset; fatigue degree; face detection; image segmentation

MSC:

68U10

1. Introduction

Statistically, driving while fatigued has a close relationship with collisions, personal injuries, and other traffic accidents. Surveys have shown that up to 37% of vehicle fatalities in the United States are associated with fatigued or drowsy driving [1]. According to a NHTSA (National Highway Traffic Safety Administration) survey, more than 70% of interviewed drivers feel fatigued [2,3]. Some studies have found that driving while fatigued is associated with 2% to 23% of crash incidents [4].

There are generally two main types of detection methods for driving while fatigued. One type of detection is in the form of research questionnaires; this is a subjective method. The other type is to extract fatigue features by measuring the signals of the driver or the driving car. Fatigue-driving detection is accomplished using these features. This method is referred to as the objective method. Objective methods broadly include three types of solutions: (1) detection based on the physiological features of the driver; (2) detection based on vehicle driving features; and (3) detection based on the behavior of the driver [5].

Due to rapid developments in the field of machine learning and machine vision, these techniques have been successfully applied to fatigue-driving detection; this development balances the requirements of accuracy in real-time, and the technology is simple and inexpensive to deploy. Researchers have proposed several effective detection methods for drivers’ facial features [6,7]. During the COVID-19 pandemic, people’s lives were seriously affected; mask use while driving has increased, especially for cab and bus drivers. In this new masked-driving environment, only part of the features on the face can be extracted for detecting driver fatigue; this, in turn, affects the facial feature marking. As a result of mask wearing, detection technologies may not be able to accurately locate the key points of the face. In such cases, detection algorithms that need to examine the fatigue features of the mouth are no longer usable. New algorithms are required to capture fatigue features for the eyes alone.

In this paper, we first construct the Eye-based Fatigue State Dataset. We then use an improved DeepLabv3+ network (IDLN) architecture to segment the eye region. The natural light face dataset (Eye Image Detection and Segmentation Dataset, EIMDSD) [8] and the infrared light image dataset (CASIA-Iris-Distance infrared face image dataset) [9] are employed to verify the effectiveness of our method during both day and night conditions. The results show that our method has a good segmentation effect in both conditions. Finally, a method based on Gaussian-weighted eye opening and closing frequency to define driver fatigue states is proposed; it overcomes the interference of face occlusion, can directly extract driver fatigue features from the eye region, and is used to determine the fatigue states of different drivers.

The rest of this paper is organized as follows: The related work is presented in Section 2. Section 3 describes the method proposed in this paper by introducing an improved DeepLabv3+ network (IDLN) architecture and Gaussian-weighted Eye State Fatigue Determination method (GESFD) and showing how a fatigue state is determined. Section 4 first presents how the Eye-based Fatigue State Dataset (EFSD) is constructed, then conducts experiments and discusses the results. Section 5 concludes and gives an outlook.

2. Related Work

Current fatigue-driving detection methods are mainly divided into subjective and objective ones. As is shown in Figure 1, the subjective method is mainly conducted in the form of a questionnaire, which is evaluated by the subjective feelings of drivers. The objective method focuses on the determination of a fatigue state by detecting the driver’s physiological parameters, the vehicle’s motion characteristics, or the driver’s facial features.

Some of the well-known questionnaires in the subjective method include the Stanford Sleep Scale, the Pearson Fatigue Scale, the Driver Record Form, and the Cooper Harper Evaluation Questionnaire. They mainly use subjective records in the form of rest time, driving time, and driving habits given by the driver on his own initiative. Researchers then conduct a statistical analysis based on the data. Due to its subjective and arbitrary nature, it is difficult to quantify and conduct theoretical studies [10,11].

Research in the objective method is divided into three main categories: (1) fatigue detection based on the drivers’ physiological features; (2) fatigue detection based on vehicle driving features; and (3) fatigue detection based on drivers’ behavior features.

(1): Fatigue detection based on drivers’ physiological features

When the driver’s body is in a state of fatigue, the relevant physiological parameters are also significantly altered. Therefore, by observing these physiological parameters, it is possible to diagnose whether a car driver is driving under fatigue or not. The fatigue monitoring methods based on biological parameters mainly include the acquisition of physiological parameters such as electroencephalogram (EEG), electrocardiograph (ECG), electromyography (EMG), and electrooculogram (EOG) of drivers using wearable devices [12,13,14,15,16]. After the physiological parameters are analyzed and compared, the fatigue state can be assessed. Since the sensors acquire the driver’s physiological parameters by contacting the human body, it can visually reflect the driver’s fatigue level and thus obtain higher accuracy.

(2): Fatigue detection based on vehicle driving features

This type of method records digital changes of the vehicle’s state during driving, mainly by equipping sensors in specific parts of the vehicle to capture features such as trajectory, gear changes, steering amplitude changes, etc. These data are then used to determine the reasonableness of the car driving and whether the car driver is in a fatigued driving state [17,18]. Generally, when the driver feels tired during driving, his attention, thinking agility, ability to understand the road conditions, and reaction to accidents will all be reduced, and the vehicle control will become unstable. The vehicle is in an abnormal driving state at this time and the driver’s fatigue level can be determined by these data.

(3): Fatigue detection based on driver’s behavior features

This method requires placing a camera and lights at designated locations in the cockpit for imaging. The images inside the cabin during driving are collected. Then, after real-time analysis of the images collected by the camera, the driver’s face is detected and the fatigue features corresponding to the facial area; for example, eye, mouth, nose, and head posture features, etc.; are captured. Thus, the driver fatigue state is determined [19,20,21].

When the driver’s face is obstructed, the eye state can effectively reflect the driver’s fatigue state. Researchers have already extracted fatigue features directly from the eye area for determining driver’s fatigue state. For example, Pandey et al. [22] proposed a real-time fatigue detection method based on eye state analysis. They first use the Dlib library to detect and locate 68 key points on the face, then calculate the eye aspect ratio using the key points in the eye region, and finally determine the driver’s fatigue state by setting a threshold value. Kaur et al. [23] proposed a support vector machine-based eye-based fatigue detection system, which first locates key points on the face and extracts their coordinates, then calculates the features of the eye, mouth, head and other regions based on the coordinates, and finally classifies the fatigue state by a trained support vector machine. Miah et al. [24] proposed a fatigue detection method based on the driver’s blink pattern and the driver’s vertical eye distance feature, which first uses a Haar cascade classifier for face and eye detection, then uses a cascade regression method to obtain 68 key points in the face region, and finally determines the fatigue state by blink and vertical eye distance features.

The methods described above need to combine features such as the eyes, mouth, and head for fatigue detection; thus, new methods are required to determine fatigue only using the state of eyes. In this paper, we propose a fatigue determination and fatigue-driving detection method based on a Gaussian-weighted eye state determination, which only uses the state of the eyes to detect a driver’s fatigue state and can achieve very good results.

3. Methods

There are two main challenges in eye state detection: eye location and illumination conditions. The eye locating system has two tasks: eye location detection and eye state detection. In this paper, we first perform eye detection based on the YOLOv4-tiny network, as is shown in Figure 2a. Secondly, we use the IDLN model to implement the semantic segmentation of the eyes, as is seen in Figure 2b. Finally, based on the semantic pixels of the eyes, we propose a Gaussian-weighted eye-based fatigue determination method to determining fatigue state, as is shown in Figure 2c. The whole fatigue-driving detection system framework is shown in Figure 2. For the illumination problem, we will use natural light and infrared images to simulate day and night conditions to verify and test the validity of our model, respectively. In the following, we will introduce our method from these 3 aspects.

3.1. Eye Detection Based on the YOLOv4-Tiny Model

In the fatigue detection system, accurate and fast eye detection is the basic requirement of the algorithm, and the accuracy of eye detection is crucial to the fatigue-driving detection task.

Existing human eye detection methods first build a model to manually select features of the eye, and then use the model to classify and localize the eye. In the early 1990s, Yuille [25] proposed a detection method based on eye geometry. In 2005, Hamouz [26] used appearance features to localize eyes. In 2011, Yang [27] proposed a texture-based eye detection algorithm. In recent years, some systems locate the eyes by the face (face detection followed by eye detection), which can obtain good performance. However, algorithms based on geometry, appearance, and texture features can perform poorly if the image is occluded or light-reflective. When there is a mask obscuration, it may be impossible to locate the face, let alone the eyes.

With the development of information technology, deep convolutional neural networks have dominated many tasks in machine vision. Region-based convolutional neural networks have excellent performance and have become the dominant approach in the domain of target detection. This method is generally divided into two categories: the two-stage approach based on regional recommendations and the one-stage approach based on regression. The former contains region-based convolutional neural network methods [28], Fast-RCNN [29], Faster-RCNN methods [30], region-based fully convolutional network methods [31], and other improved methods based on convolutional neural networks [32,33]. The latter contains the YOLO (You only look once) method proposed by Redmon et al., which is more suitable for the situations where real-time performance is required [34]. Compared with traditional methods, convolutional neural-network-based eye detection avoids manual extraction of features. With the support of the dataset, the performances of eye detection are greatly improved.

All the YOLO series of methods contain very deep networks with a large number of network parameters [35,36,37]. Therefore, these methods require powerful GPU (Graphic Processing Unit) computing power to achieve real-time target detection. However, as there is a need to deploy on mobile devices or embedded devices (autonomous driving devices, augmented reality devices, and other smart devices) to achieve real-time target detection in fatigue-driving detection tasks, which have limited computing power and memory to conduct complex calculations in real-time, a lightweight version of the YOLO series is required.

Fortunately, researchers have proposed the lightweight YOLO series [38,39,40]. YOLOv4-tiny is one of them based on the YOLOv4 model [41], which simplified the network structure, cut down the parameters, and enhanced the target detection speed at the expense of accuracy. The YOLOv4-tiny model uses the CSPDarknet53-tiny network [42] to extract features, then a feature fusion structure is added to withdraw the multi-scale feature maps. All these strategies promote the detection accuracy of the model, making the YOLOv4-tiny model have both good detection accuracy and faster detection speed, improving the feasibility of the model in the application of embedded systems or mobile devices. In this paper, an optimized YOLOv4-tiny model is adopted to implement eye detection.

The network structure of YOLOv4-tiny is shown in Figure 3, which is composed of the following three parts: (1) CSPDarknet53-tiny backbone feature extraction network, (2) feature pyramid network (FPN), and (3) YOLO-head feature prediction structure. The backbone network CSPDarknet53-Tiny is for feature extraction consisting of three DarknetConvD_BN_Leaky modules and three stacked Resblock_body blocks. The DarknetConvD_BN_Leaky module consists of a two-dimensional convolutional layer, a two-dimensional normalization processing layer BatchNorm2d, and an activation function LeakyReLU group.

3.2. Eye Semantic Segmentation Model

After the eye area is detected, the driver’s eye area needs to be segmented. The DeepLabv3+ model [43,44] is a variant of a typical fully convolutional neural network that has achieved good performance in using contextual information for semantic segmentation. In this paper, we propose an improved DeepLabv3+ network architecture, called IDLN [8], which is shown in Figure 4. The IDLN uses the Atrous Spatial Pyramid Pooling (ASPP) module [45] to capture contextual semantic features at different scales by using parallel hole convolution techniques with different expansion rates and retains the DeepLabv3+ model encoding–decoding structure. The encoder–decoder architecture can freely adjust the resolution of the encoder module feature map by balancing accuracy and time consumption through the null convolution. With its effective decoding module, the model can recover detailed information about the boundaries. To reduce the number of IDLN parameters, the feature extraction part of the IDLN network uses the more lightweight network MobileNetv2 [46]. Meanwhile, the attention mechanism is adopted to improve the accuracy of edge feature extraction.

As is shown in Figure 4, we use MobileNetv2, a lightweight feature extraction network with a much smaller number of parameters. Then, we introduce the Convolutional Block Attention Module (CBAM) to enhance the extraction of eye edge features [47]. This CBAM contains two sequential sub-modules: the channel attention module and the spatial attention module, which are applied to sequential alignment to obtain better results. Enhancement of high-level features and low-level features extracted from MobileNetv2 network is implemented by CBAM.

3.3. A Gaussian-Weighted Eye State for Fatigue Determination

After calculating the pixel values of the driver’s eye region to obtain the semantic pixel segmentation of the eye image, we propose a Gaussian weighting-based eye state determination method for driver fatigue detection. The pseudo code of our proposed method is shown in Algorithm 1. The process of calculating the Gaussian weight score

S_{T}

and the state judgment threshold

T

for one of the videos will be described in detail later.

Algorithm 1 A Gaussian-weighted Eye State for Fatigue determination
Input: Gaussian weight score for a video segment S_T, state determination threshold T Output: Fatigue state
1:	$if (S_{T} \geq T$ ) then
2:	judged as fatigue state;
3:	else
4:	judged as normal state;

3.3.1. Normalization of the Eye State Degree

After obtaining the semantic pixel segmentation of the eye image, the number of pixel points of the driver’s eye in each frame within the video is first calculated. Then, the eye state degree is worked out for each frame of the video based on the number of pixels to effectively represent the state of the driver. When the driver is drowsy, their eyes cannot be opened as wide as they would be in a normal state. When the driver is severely fatigued, their eyes may be closed for some time, where the driver’s eye state degree is very small or even 0. When the driver is in a normal state, their eyes will open normally, and the eye state degree at this time is also in the normal size, which is assumed to be 1.

Eliminating the influence of anomalous data and improving the accuracy of the model are important. In this paper, we adopt the maximum–minimum normalization method, as is shown in Equation (1), to convert the number of eye pixel points to between [0, 1] and define this value as the eye state degree. When the eyes are open to the maximum, a value of 1 should be obtained, which indicates that the eyes are open to the maximum at this time. When the eyes are closed to their minimum, it is 0.

P^{'} = \frac{p - \min (p)}{\max (p) - \min (p)}

(1)

where

P^{'}

indicates the eye state degree obtained after the maximum–minimum normalization of the number of eye pixels,

p

indicates the number of eye pixels at this time,

\min (p)

indicates the least number of eye pixels in the video frame image, and

\max (p)

indicates the maximum number of eye pixels in the video frame image.

The eye state curves of drivers in normal and fatigue states are different. As is shown in Figure 5, the following scenarios are selected to visualize the eye state curves of a simulated driving experimenter in this study: the bare face, masked, glasses, and mask and glasses cover, respectively. In this study, 400 consecutive frames of video of one driver in different states are selected for visualization. Among them, Figure 5a,c,e,g show that the driver is in normal condition, and Figure 5b,d,f,h show that the driver is in fatigue condition. Figure 5a,b indicate that the driver is in the bare face scene, Figure 5c,d indicate that the driver is in the mask-covered scene, Figure 5e,f indicate that the driver is in the glasses wearing scene, and Figure 5g,h indicate that the driver is in the scene with the mask and glasses cover. Comparing the two states in the same scenario, we can see that the driver’s eye state is very small or even 0 when they are fatigued, and the eye state is stable at a certain value when they are in a normal state, and it only becomes smaller when they perform normal blinking operations. The visualization in Figure 5 shows that the curves of eye state in the normal and fatigue states are different regardless of the scenario. In this study, we analyze and extract fatigue features based on the changes in the eye state curve.

3.3.2. A Gaussian-Weighted Approach to the Definition of Fatigue Degree

The eye states obtained from the above cannot be applied directly to characterize the fatigue level of the driver. There are two reasons for this. On the one hand, the direct characterization of eye status by eye state is a linear relationship and does not provide a more sensitive response to the true level of fatigue. On the other hand, each person’s eye size is different, and different people with the same eye state value do not represent the same level of fatigue. To overcome these, this paper proposes a Gaussian weighting method to convert the relationship between eye opening and the fatigue state into a nonlinear relationship, as is shown in Equation (2). The eye-opening Gaussian score is the sum of the eye-opening Gaussian weight. At the same time, we introduce the baseline driver’s eye state value, as is shown in Equation (3). Then, each eye state Gaussian score was divided by the baseline driver eye state Gaussian score to obtain the fatigue status score without individual differences, as is shown in Equation (4). Thus, it can better circumvent the problem that different people have different eye sizes with different fatigue scores, and the fatigue scale is the same regardless of eye size and the length of video frames. The driver’s state (normal or fatigue) causes the driver’s eye opening to change, and the different eye openings are assigned different weights and differentiated according to the final score obtained.

G_{n} = e^{- \frac{P_{n}^{'}^{2}}{σ^{2}}}

(2)

where,

G_{n}

denotes the Gaussian weight obtained after assigning weights to the eye openness in the n-th image frame.

P^{'}

denotes the eye openness value of the driver in the

n

-th image frame.

σ

is the Gaussian weight factor set in this study, and different weight factors will affect the final scores calculated.

To simplify the calculation as well as to facilitate the analysis of the calculation results, the baseline amount is set in this study, and the formula for the baseline amount is shown in (3).

R_{n} = e^{- \frac{{(\frac{n}{N})}^{2}}{σ^{2}}}

(3)

where,

N

denotes the total number of video frames, and

n

denotes the

n

-th image frame.

R_{n}

denotes the baseline volume of the

n

-th frame.

σ

is the weighting factor, and setting different weighting factors will have an impact on the results of the baseline volume.

The formula for calculating the weighting score obtained by the weight assignment formula for all frames of the video is shown in (4).

S = \frac{\sum_{n}^{N} G_{n}}{\sum_{n}^{N} R_{n}}

(4)

where,

S

denotes the total weighting score obtained by accumulating each frame of the video after the weight assignment and dividing it by the corresponding baseline amount,

n

denotes the

n

-th frame of the video, and

N

denotes the total number of frames of the video.

The Gaussian weight factor

σ

is a hyperparameter, and the change of eye-opening Gaussian weights for different

σ

values (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) is shown in Figure 6, where the horizontal coordinates indicate the eye state degree and the vertical coordinates indicate the weight values corresponding to different eye state degrees. The overall idea of the weight assignment proposed in this study is that when the eye state degree is smaller, the corresponding weight value is larger, and when the eye state degree is larger, the corresponding weight value is smaller. From Figure 6, it can be found that different curves express the meanings that are consistent with the overall idea of the weight assignment proposed in this study, i.e., giving a larger weight to smaller eye openings and a smaller weight to larger eye openings. The optimal value of

σ

is obtained by experiments.

The effect of different

σ

values on the degree of fatigue based on Gaussian weight curves will be analyzed and compared for normal and fatigue states. Thus, the Gaussian weight factor σ that can make the greatest difference between the two state scores is finally selected.

3.3.3. Eye State Determination

Driver’s fatigue is a state description, and the corresponding fatigue degree is a dynamic process. The main traditional way of determining fatigue state is to use the Percentage of Eye Closure (PERCLOS) metric for differentiation. This type of method calculates a threshold value from the PERCLOS metric. Then, the fatigue state is determined by calculating the state of each frame of the video based on the threshold value [20]. However, the fatigue state of a driver is dynamically transformed. The fatigue determination algorithm that simply sets up a threshold value for each frame of the video is not flexible enough, and it is difficult to extract the dynamic transformation characteristics of the driver’s fatigue state. The fatigue determination method based on Gaussian weighting proposed in this study can analyze the relationship of the driver’s eye state degree with time. Therefore, the Gaussian weighted eye-based fatigue determination method is used to calculate the Gaussian weighting score of a video (200 frames in this paper). The Gaussian weighting score is used to represent the fatigue state during a specific video time.

The Gaussian weighting scores of drivers in normal and fatigue states are different. In this study, we experimentally calculate a threshold value that can classify the state of the video, i.e., it can distinguish the fatigue state of the driver when he is in different video segments. The calculation details of the threshold are shown below.

First, we divide the total number of videos into the training and test sets and accumulate the Gaussian weighting scores of different video periods in the training set when the driver is in the normal state. After that, we work out the average Gaussian weighting score when the driver is in the normal state, as is shown in Equation (5).

S_{N} = \frac{\sum_{i}^{I} S_{i}}{I}

(5)

where

S_{N}

denotes the average of the Gaussian weighting score when the driver is in a normal state.

S_{i}

denotes the Gaussian weighting score corresponding to the

i

-th video.

I

denotes the total number of videos in the training set when the driver is in the normal state and

I

= 40 in this paper.

Then, we accumulate the Gaussian weighting scores of different video periods in the training set when the driver is in the fatigue state. After that, we work out the average of Gaussian weighting scores when the driver is in the fatigue state, as is shown in Equation (6).

S_{F} = \frac{\sum_{j}^{J} S_{j}}{J}

(6)

where,

S_{F}

denotes the average Gaussian weighting score when the driver is in the fatigue state.

S_{j}

denotes the Gaussian weighting score corresponding to the

j

-th video.

J

denotes the total number of videos when the driver is in the fatigue state and

J = 40

in this paper.

Finally, we calculate the state determination threshold

T

. This threshold determines the driver’s fatigue state during a period of video time, which is shown in Equation (7).

T = \frac{S_{N} + S_{F}}{2}

(7)

According to the state determination threshold

T

, we can effectively determine the fatigue state in the test set. When the Gaussian weighting score

S_{T} \geq T

in the test video (200 frames in this paper), the driver is judged in a fatigue state. When the Gaussian weighting score of the test video

S_{T} < T

, the driver is judged in a normal state.

In summary, the state determination threshold

T

and the Gaussian weighting score

S_{T}

corresponding to the videos are first calculated, then we can effectively determine the driver’s state based on the magnitude of these two values.

4. Experimental Results and Discussion

4.1. Eye-Based Fatigue State Dataset

Since there is no publicly available dataset combining fatigue-driving detection with masking, etc., the Eye-based Fatigue State Dataset (EFSD) is constructed in this study. The process and specifications for fatigue detection dataset acquisition were explained in detail to the experimental staff prior to the experimental acquisition. The video acquisition was performed in the data acquisition vehicle inside the laboratory. The scene inside the data acquisition vehicle is shown in Figure 7. The data acquisition test vehicle is equipped with professional video acquisition devices and can simulate actual driving scenarios. There were eight experimenters in total, with each being captured in eight videos in normal and fatigued states taken in the bare face, masked, glasses, and masked and glasses cover states. The length of each video was about 700 frames. After the video acquisition, a manual cross-check was performed to make sure that the data that do not meet the requirements were reacquired to ensure the reliability of the dataset. The schematic diagram of some sample data of the self-built fatigue-driving detection dataset is shown in Figure 8.

4.2. Evaluative Criteria

As is shown in Figure 2, our proposed method consists of three parts, which are (1) eye detection, (2) eye semantic segmentation, and (3) a Gaussian-weighting-based method for determining eye fatigue status. The effectiveness of the eye detection model and the eye semantic segmentation model directly affects the accuracy of the eye fatigue determination method. To describe the effectiveness of the eye detection model and the eye semantic segmentation model objectively, some evaluation metrics are used. As

m A P

(mean Average Precision) is widely used to evaluate the effectiveness of target detection models, we also use it to quantify the effectiveness of the eye detection model in this paper. In the semantic segmentation task,

m I o U

(mean Intersection over Union) and

m P A

(mean Pixel Accuracy) are often used to measure the effectiveness of image segmentation models. Therefore, we choose them to quantify our eye semantic segmentation model.

4.2.1. Evaluation Criteria for Eye Detection

To obtain the

m A P

value, we need first to obtain the precision–recall curve and calculate the average precision

A P

(Average Precision). The formulas for precision and recall are shown in (8) and (9).

Precision indicates the number of correctly detected fatigue samples as a percentage of all detected fatigue samples, and Recall indicates the number of correctly detected fatigue samples as a percentage of all fatigue samples.

P r e c i s i o n = \frac{T_{p}}{F_{p} + T_{p}}

(8)

R e c a l l = \frac{T_{p}}{T_{p} + F_{n}}

(9)

In Equations (8) and (9),

T_{p}

(True positives) indicates the number of fatigue samples correctly identified as fatigue samples.

F_{p}

(False positives) indicates the number of normal samples that are misidentified as fatigue samples.

F_{n}

(False negatives) indicates the number of fatigue samples incorrectly identified as normal samples.

A P

measures how well the trained model does on each category.

A P

is calculated as shown in Equation (11).

m A P

is the average value of

A P

as is shown in Equation (12).

m A P

reflects the global performance.

P_{M a x P r e c i s i o n} (R) = \max_{\overset{ˇ}{R} \geq R} P (\overset{ˇ}{R})

(10)

A P = \frac{1}{11} \sum_{R \in (0, 0.1, \dots, 1)} P_{M a x P r e c i s i o n} (R)

(11)

m A P = \frac{1}{|Q_{R}|} \sum_{q \in Q_{R}} A P (q)

(12)

In Equation (10),

P_{M a x P r e c i s i o n} (R)

is the maximum accuracy rate when recall satisfies

\overset{ˇ}{R} \geq R

.

Q_{R}

is the number of categories. In this paper, as both the left eye and the right eye are detected, the category

Q_{R} = 2

.

4.2.2. Evaluation Criteria for Eye Semantic Segmentation

The main evaluation metrics used include

m I o U

and

m P A

.

m I o U

represents the average

I o U

(Intersection over Union) of all categories, which is an essential evaluation metric in the field of image segmentation and is widely used to evaluate the performance of image segmentation algorithms. To obtain the actual quantified value of

m I o U

, the

I o U

of each category needs to be calculated first.

I o U

is defined as the intersection area between the predicted segmented graph and the real label graph, divided by the joint area between the predicted segmented graph and the real label graph, as is shown in Formula (13).

I o U = \frac{|A \cap B|}{|A \cup B|}

(13)

The value of

I o U

ranges between 0 and 1. The closer to 0, the larger the gap between the predicted segmentation map and the real labeled map, and the worse the segmentation quality. The closer to 1, the closer the predicted segmentation map is to the true labeled map, and the better the segmentation effect is.

The

I o U

of each class is calculated and then summed. After that, the averaging operation is taken to obtain the

m I o U

, which is formulated in Equation (14).

m I o U = \frac{\sum_{i = 0}^{K} I o U_{i}}{K}

(14)

where,

K

denotes the total number of categories and

I o U_{i}

denotes the

I o U

value of the

i

-th category.

The

m P A

indicates the proportion of correctly classified pixels on a per-class basis. It is then averaged over the total number of classes.

m P A

is an extension of

P A

(Pixel Accuracy) that is defined as the proportion of correctly classified pixels divided by the total number of pixels. For

K + 1

classes (

K

foreground classes and 1 background class),

P A

as well as

m P A

are defined by the following equations.

P A = \frac{\sum_{i = 0}^{K + 1} p_{i i}}{\sum_{i = 0}^{K + 1} \sum_{j = 0}^{K + 1} p_{i j}}

(15)

m P A = \frac{1}{K + 1} \sum_{i = 0}^{K + 1} \frac{p_{i i}}{\sum_{j = 0}^{K + 1} p_{i j}}

(16)

where,

p_{i j}

denotes the number of pixel points for which category

i

is predicted to be category

j

.

K + 1

represents the total number of categories.

4.3. Eye Detection

In this paper, we use EIMDSD to train the model. The trained eye detection model is then used to extract the driver’s eye region, laying the foundation for the subsequent extraction of eye-based fatigue features and the completion of the fatigue determination task. Therefore, the performance of the eye detection model is related to that of the overall fatigue detection algorithm.

The deep learning framework used is PaddlePaddle version 2.2.2. The CPU of the computer is four cores, the RAM size is 32G, the version of the GPU is Tesla V00, and the video memory size is 32 GB.

The change of loss value during training is shown in Figure 9. The loss value decreases very fast at the beginning of the training, but finally tends towards stable, indicating that the training process of the model has stabilized.

We used the proposed eye detection model for testing on the EIMDSD, and the results are shown in Figure 10. For both images with unobscured face and face-obscured conditions, our model can accurately detect the eye region and can distinguish between the left eye and right eye categories.

4.4. Eye Semantic Segmentation

After detecting the eye region, we use the proposed eye segmentation model to segment the driver’s eye region. The performance of the eye segmentation model affects the effectiveness of our overall face occlusion automatic fatigue-driving detection method. We use the IDLN model to segment the eye region, which is trained on EIMDSD [8]. The experimental results are shown in Table 1.

As is seen in Table 1, we validate the performance of the eye segmentation model using a test dataset containing 600 images. Analyzing the mIoU and mPA metrics, our eye segmentation model obtained the largest mIoU (0.883) and mPA (0.962). Compared with PSPNet, the mIoU of our eye semantic segmentation model is improved by 0.081 and mPA by 0.023. Analyzing the FPS metrics, our model obtained the largest FPS (34.1). This means that our segmentation model can segment eye regions in real-time.

4.5. Fatigue State Determination under Face Masking Conditions

In this study, different Gaussian weighting factors (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) are set for experiments. The experiments were performed in EFSD. Each experiment used 200 frames of video, and 58 experiments were conducted for the normal state as well as the fatigue state. Therefore, 58 weighting scores in the normal state and the fatigue state under each Gaussian weighting factor are obtained in this study. Table 2 lists 10 videos with the driver in a normal state under different Gaussian weighting factors corresponding to each video. Table 3 shows the results of the weighting scores corresponding to the videos in which the driver is in a fatigue state under different Gaussian weight factors. In the experiments, the videos states have been randomly shuffled. For example, the serial numbers of the videos in the normal state do not correspond to the serial numbers of the videos in the fatigue state. They were not obtained from the same experimenter, nor in the same occlusion scene. The purpose of shuffling the dataset in this way is to increase the robustness of the training.

Table 4 gives the thresholds calculated by Equation (7) for the data in Table 2 and Table 3 and the number of misclassified samples under different Gaussian weight factors. From Table 4, it can be seen that when

σ = 0.1

, the number of misclassified samples is 0. As the Gaussian weight factor increases, the number of misclassified samples increases and the state determination threshold faces difficulties in accurately distinguishing the driver’s fatigue state. Therefore, we set

σ = 0.1

.

Figure 11 visualizes the contents of Table 2 and Table 3, where (a–j) denote the Gaussian weighting scores at 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0, respectively. The changes of the weight score curves for the normal and fatigue states are different for different weighting factors. For example, in Figure 11a, where the Gaussian weight factor

σ = 0.1

, the distinction between the weight score curves of the normal state and the fatigue state is more obvious, which indicates that the difference between the two states can be well-distinguished under this Gaussian weight factor. As the Gaussian weight factor gradually increases, the difference between the Gaussian weight score curves in the normal state and the fatigue state becomes smaller and smaller. When

σ = 1

, the two curves are approximately blended. As the Gaussian weighting factor becomes larger, the overall trends of the two-state curves are mixed. It is difficult to effectively distinguish between the two states of the drive in that case.

Comparing the analysis of Table 2, Table 3 and Table 4 and Figure 11, we found that when

σ = 0.1

, the Gaussian weight score curves of the normal state and the fatigue state are more clearly distinguished. The difference between the two curves was most significant at this point. Therefore, based on these results, we use 0.1 as the final weighting factor.

In the following experiments, we first randomly select 40 videos in a normal state and 40 videos in a fatigue state, respectively, as the training set. Thus, I = 40, J = 40, and our EFSD training set consists of a total of 80 videos. The remaining 18 videos in normal condition and 18 videos in fatigue condition are used as the test set.

From Equations (5) and (6), we can calculate the average Gaussian weight score

S_{N}

in the normal state and

S_{F}

in the fatigue state to obtain

S_{N} = \frac{S_{1} + S_{2} + \dots + S_{40}}{40} = 2.939

and

S_{F} = \frac{S_{1} + S_{2} + \dots + S_{40}}{40} = 8.707

.

The state determination threshold

T

is calculated from Equation (7).

T = \frac{S_{N} + S_{F}}{2} = \frac{2.939 + 8.707}{2} = 5.823

Then, the Gaussian weighting score is calculated for each video in the test set. If the Gaussian weighting score of a video is greater than or equal to

T

, it is determined as fatigue. Otherwise, it will be regarded as normal. The results of the fatigue determination are shown in Table 5.

The summary of the experimental results is shown in Table 6, in which the accuracy of fatigue-driving detection, the number of floating-point operations (Flops), number of parameters (Params) and the frames per second (FPS) are listed. We can see from this table that our method can detect fatigues accurately in real time and meet the needs of practical applications.

4.6. Day and Night Fatigue Detection Verification

At night, lighting is not allowed in the driver’s cab. In extremely low light conditions at night, infrared cameras are often used to capture infrared images to cope with the problem of nighttime fatigue-driving detection. Infrared images and natural light images differ in terms of spatial resolution, texture and edge features, and correlation between pixels, as infrared images are taken based on the difference in the emissivity of the object to infrared light, and natural light images are taken based on the difference in the reflectivity of the visible light by the object. Therefore, to evaluate the effectiveness of the proposed method for face occlusion fatigue detection in both day and night driving environments, experiments are conducted on the natural light and the infrared light image datasets using an eye detection model and an eye segmentation model in this paper.

We use the proposed optimized YOLOv4-tiny model for eye detection and conduct experiments on the natural light image dataset EIMDSD dataset. The experimental results are shown in Table 7.

To more intuitively demonstrate the eye segmentation effect of the YOLOv4-tiny model, some of the detection results are shown in Figure 10. As can be seen from Table 7 and Figure 10, our model can accurately detect the left and right eyes in a face-obscured environment. In summary, the experimental results on EIMDSD show that our proposed method can effectively perform the eye detection task in a daytime driving environment.

The experiments conducted on the infrared light image dataset use the CASIA-Iris-Distance infrared face image dataset that is a subset of CASIA-IrisV4, which contains only part of the face infrared images around the eyes, as is shown in Figure 12. The eye detection model was trained on 2000 CASIA-Iris-Distance infrared face image dataset and tested on 300 test sets. The test results are shown in Table 8, and some of the detection results are shown in Figure 13.

We can see from Table 8 that it can detect the left and right eyes separately with an average accuracy mAP of 98.57% at 33.2FPS, which showed that our proposed algorithm has high accuracy and good applicability for human eye detection on infrared face images. Meanwhile, the experimental results on the CASIA-Iris-Distance infrared face image dataset also verify that our proposed eye detection method can effectively accomplish the detection of the eye region in a night driving environment.

We also conducted experiments on EIMDSD and CASIA-Iris-Distance infrared face image dataset using the proposed eye segmentation model. The experimental results of the eye segmentation model on EIMDSD are shown in Table 9. We can see from this table that our eye segmentation model is effective and lightweight and can perform segmentation in real-time.

To verify that our method has stable segmentation effects even at night, we use infrared face images from the CASIA-Iris-Distance infrared face image dataset for experimental validation. In total, 300 infrared images of the eye region were intercepted, and the infrared eye images are shown in Figure 14, some segmentation results are shown in Figure 15, and the test results are shown in Table 10.

Figure 14 shows three samples of the left eye region and the right eye region. The test results are shown in Figure 15, where the blue area indicates the segmented eye region and the gray area is the background. It can be seen that the blue area can segment the eye region completely, which indicates that the model trained on top of our own eye segmentation dataset can complete the segmentation task of infrared images well. The segmentation results are given in Table 10, which indicates our model is effective and lightweight on infrared images and can run in real-time.

4.7. Comparisons with State-of-the-Art Methods

To evaluate the detection effectiveness of the algorithm proposed in this paper on other datasets, experiments were conducted using the YawDD dataset [48]. The YawDD dataset is an accepted dataset for fatigue-driving detection that contains videos taken under real and different lighting conditions. It contains two datasets of driver videos with various facial features. In the first dataset, the camera is mounted under the front view mirror of the car, which provides 322 videos. In the second dataset, the camera is mounted on the driver’s dashboard, which provides 29 videos. We compared our results with those from other fatigue-driving detection algorithms. The results are shown in Table 11.

One of the algorithms compared was proposed in [22], which is a real-time fatigue detection method based on eye state analysis. The best detection accuracy achieved by this method was 92.5%. Another algorithm proposed in [23] is an eye fatigue detection system that uses support vector machines. The final system was evaluated and obtained a fatigue detection accuracy of 91%. The study proposed in [24] is a fatigue detection method based on the driver’s blink pattern and the driver’s vertical eye distance characteristics, which achieved a fatigue-driving detection accuracy of 93%. The fatigue detection algorithm proposed in this paper achieved an accuracy of 97.5%. The results indicate that the algorithm proposed in this paper performs well on the YawDD dataset and can effectively and accurately detect fatigue states.

Overall, the results demonstrate that the algorithm proposed in this paper outperforms the other compared algorithms in terms of detection accuracy, which contributes to the development of more precise and reliable fatigue detection systems.

5. Conclusions

For the problem of fatigue detection of face-obscured drivers, this paper proposed the Gaussian-weighted Eye State Fatigue Determination (GESFD) model for determining eye fatigue status. We also proposed an optimized YOLOv4-tiny model to implement the semantic segmentation of eye images, which can accurately detect eye regions in real-time. Then, the improved DeepLabv3+ network architecture (IDLN) was proposed to detect eye and conduct segmentation. The results show that the technique detects fatigue states with 94.4% accuracy at 33.5 FPS. The experimental results showed that our method can accurately detect the fatigue state in real-time. The experiments also verified the effectiveness of our method for the face occlusion fatigue detection tasks in both day and night driving environments. On the YawDD dataset, the accuracy rate increased from 93% to 97.5% compared with other state-of-the-art methods. The results showed that the algorithm proposed in this paper outperformed other models on the YawDD dataset and could accurately complete fatigue detection.

The method proposed in this paper is highly accurate and reliable, but there are still some limitations: (1) The eye detection model proposed in this paper has been shown to outperform the YOLOv3 model and the Faster-RCNN model. However, we did not compare it with other target detection models, which means that the accuracy of other types of models needs to be further investigated. (2) In this paper, the fatigue features of the eye region were extracted without the fusion of multiple pieces of information. The combined analysis of different types of fatigue data is also an urgent problem to be solved in the future.

Author Contributions

Conceptualization, R.H. and Y.X. (Yong Xu); methodology, R.H., Y.X. (Yong Xu) and C.-Y.H.; software, Y.X. (Yunjie Xiang); validation, Y.X. (Yunjie Xiang); formal analysis, R.H. and Y.X. (Yong Xu); investigation, Y.X. (Yunjie Xiang), C.-Y.H. and C.D.; resources, Y.X. (Yong Xu); data curation, C.-Y.H. and C.D.; writing—original draft preparation, Y.X. (Yunjie Xiang); writing—review and editing, R.H. and Y.X. (Yong Xu); visualization, R.H. and Y.X. (Yong Xu); supervision, C.-Y.H.; project administration, R.H. and Y.X. (Yong Xu); funding acquisition, R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Available upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Amodio, A.; Ermidoro, M.; Maggi, D.; Formentin, S.; Savaresi, S.M. Automatic detection of driver impairment based on pupillary light reflex. IEEE Trans. Intell. Transp. Syst. 2018, 20, 3038–3048. [Google Scholar] [CrossRef]
Zhou, Z.; Zhou, Y.; Pu, Z.; Xu, Y. Simulation of pedestrian behavior during the flashing green signal using a modified social force model. Transp. A Transp. Sci. 2019, 15, 1019–1040. [Google Scholar] [CrossRef]
Zhou, Z.; Cai, Y.; Ke, R.; Yang, J. A collision avoidance model for two-pedestrian groups: Considering random avoidance patterns. Phys. A Stat. Mech. Its Appl. 2017, 475, 142–154. [Google Scholar] [CrossRef]
Fernandes, R.; Hatfield, J.; Job, R.S. A systematic investigation of the differential predictors for speeding, drink-driving, driving while fatigued, and not wearing a seat belt, among young drivers. Transp. Res. Part F Traffic Psychol. Behav. 2010, 13, 179–196. [Google Scholar] [CrossRef]
Li, K.; Gong, Y.; Ren, Z. A fatigue driving detection algorithm based on facial multi-feature fusion. IEEE Access 2020, 8, 101244–101259. [Google Scholar] [CrossRef]
Du, G.; Zhang, L.; Su, K.; Wang, X.; Teng, S.; Liu, P.X. A multimodal fusion fatigue driving detection method based on heart rate and PERCLOS. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21810–21820. [Google Scholar] [CrossRef]
Zhao, G.; He, Y.; Yang, H.; Tao, Y. Research on fatigue detection based on visual features. IET Image Process. 2022, 16, 1044–1053. [Google Scholar] [CrossRef]
Hsu, C.Y.; Hu, R.; Xiang, Y.; Long, X.; Li, Z. Improving the Deeplabv3+ Model with Attention Mechanisms Applied to Eye Detection and Segmentation. Mathematics 2022, 10, 2597. [Google Scholar] [CrossRef]
Liu, Y.; Shen, W.; Wu, D.; Shao, J. IrisST-Net for iris segmentation and contour parameters extraction. Appl. Intell. 2022, 1–15. [Google Scholar] [CrossRef]
Zou, J.; Yan, P. Research on Method of Fatigued Driving Detection. In Proceedings of CICTP 2018: Intelligence, Connectivity, and Mobility, Proceedings of the 18th COTA International Conference of Transportation Professionals, Beijing, China, 5–8 July 2018; American Society of Civil Engineers: Reston, VA, USA, 2018; pp. 1967–1974. [Google Scholar]
Kundinger, T.; Riener, A.; Sofra, N.; Weigl, K. Drowsiness detection and warning in manual and automated driving: Results from subjective evaluation. In Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Seoul, Republic of Korea, 18–22 September 2018; pp. 229–236. [Google Scholar]
Wang, H.; Zhang, C.; Shi, T.; Wang, F.; Ma, S. Real-time EEG-based detection of fatigue driving danger for accident prediction. Int. J. Neural Syst. 2015, 25, 1550002. [Google Scholar] [CrossRef]
Gao, Z.; Li, S.; Cai, Q.; Dang, W.; Yang, Y.; Mu, C.; Hui, P. Relative wavelet entropy complex network for improving EEG-based fatigue driving classification. IEEE Trans. Instrum. Meas. 2018, 68, 2491–2497. [Google Scholar] [CrossRef]
Boon-Leng, L.; Dae-Seok, L.; Boon-Giin, L. Mobile-based wearable-type of driver fatigue detection by GSR and EMG. In Proceedings of the TENCON 2015–2015 IEEE Region 10 Conference, Macao, China, 1–4 November 2015; pp. 1–4. [Google Scholar]
Jing, D.; Liu, D.; Zhang, S.; Guo, Z. Fatigue driving detection method based on EEG analysis in low-voltage and hypoxia plateau environment. Int. J. Transp. Sci. Technol. 2020, 9, 366–376. [Google Scholar] [CrossRef]
Luo, H.; Qiu, T.; Liu, C.; Huang, P. Research on fatigue driving detection using forehead EEG based on adaptive multi-scale entropy. Biomed. Signal Process. Control. 2019, 51, 50–58. [Google Scholar] [CrossRef]
Li, X.; Hong, L.; Wang, J.C.; Liu, X. Fatigue driving detection model based on multi-feature fusion and semi-supervised active learning. IET Intell. Transp. Syst. 2019, 13, 1401–1409. [Google Scholar] [CrossRef]
Ma, J.; Zhang, J.; Gong, Z.; Du, Y. Study on fatigue driving detection model based on steering operation features and eye movement features. In Proceedings of the 2018 IEEE 4th International Conference on Control Science and Systems Engineering (ICCSSE), Wuhan, China, 21–23 August 2018; pp. 472–475. [Google Scholar]
Akrout, B.; Mahdi, W. A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 527–552. [Google Scholar] [CrossRef]
Liu, S.; Wu, Y.; Liu, Q.; Zhu, Q. Design of Fatigue Driving Detection Algorithm Based on Image Processing. In Proceedings of 2020 Chinese Intelligent Systems Conference: Volume II, Proceedings of the CISC 2020, Monterey, CA, USA, 19–21 February 2020; Springer: Singapore, 2021; pp. 602–610. [Google Scholar]
Bin, F.; Shuo, X.; Xiaofeng, F. A fatigue driving detection method based on multi facial features fusion. In Proceedings of the 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Qiqihar, China, 28–29 April 2019; pp. 225–229. [Google Scholar]
Pandey, N.N.; Muppalaneni, N.B. Real-time drowsiness identification based on eye state analysis. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Pichanur, India, 25–27 March 2021; pp. 1182–1187. [Google Scholar]
Kaur, R.; Guleria, A. Digital eye strain detection system based on svm. In Proceedings of the 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 3–5 June 2021; pp. 1114–1121. [Google Scholar]
Miah, A.A.; Ahmad, M.; Mim, K.Z. Drowsiness detection using eye-blink pattern and mean eye landmarks’ distance. In Proceedings of the International Joint Conference on Computational Intelligence, Proceedings of the IJCCI 2018, Seville, Spain, 18–20 September 2018; Springer: Singapore, 2020; pp. 111–121. [Google Scholar]
Yuille, A.L.; Hallinan, P.W.; Cohen, D.S. Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 1992, 8, 99–111. [Google Scholar] [CrossRef]
Hamouz, M.; Kittler, J.; Kamarainen, J.K.; Paalanen, P.; Kalviainen, H.; Matas, J. Feature-based affine-invariant localization of faces. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1490–1495. [Google Scholar] [CrossRef]
Yang, F.; Huang, J.; Yang, P.; Metaxas, D. Eye localization through multiscale sparse dictionaries. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–23 March 2011; pp. 514–518. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
Park, S.H.; Yoon, H.S.; Park, K.R. Faster R-CNN and geometric transformation-based detection of driver’s eyes using multiple near-infrared camera sensors. Sensors 2019, 19, 197. [Google Scholar] [CrossRef]
Prasad, K.S.V.; D’souza, K.B.; Bhargava, V.K. A downscaled faster-RCNN framework for signal detection and time-frequency localization in wideband RF systems. IEEE Trans. Wirel. Commun. 2020, 19, 4847–4862. [Google Scholar] [CrossRef]
Zhou, L.; Min, W.; Lin, D.; Han, Q.; Liu, R. Detecting motion blurred vehicle logo in IoV using filter-DeblurGAN and VL-YOLO. IEEE Trans. Veh. Technol. 2020, 69, 3604–3614. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Zhao, H.; Zhou, Y.; Zhang, L.; Peng, Y.; Hu, X.; Peng, H.; Cai, X. Mixed YOLOv3-LITE: A lightweight real-time object detection method. Sensors 2020, 20, 1861. [Google Scholar] [CrossRef] [PubMed]
Jiang, J.; Fu, X.; Qin, R.; Wang, X.; Ma, Z. High-speed lightweight ship detection algorithm based on YOLO-v4 for three-channels RGB SAR image. Remote Sens. 2021, 13, 1909. [Google Scholar] [CrossRef]
Hui, T.; Xu, Y.; Jarhinbek, R. Detail texture detection based on Yolov4-tiny combined with attention mechanism and bicubic interpolation. IET Image Process. 2021, 15, 2736–2748. [Google Scholar] [CrossRef]
Guo, C.; Lv, X.L.; Zhang, Y.; Zhang, M.L. Improved YOLOv4-tiny network for real-time electronic component detection. Sci. Rep. 2021, 11, 22744. [Google Scholar] [CrossRef] [PubMed]
Yu, K.; Cheng, Y.; Tian, Z.; Zhang, K. High Speed and Precision Underwater Biological Detection Based on the Improved YOLOV4-Tiny Algorithm. J. Mar. Sci. Eng. 2022, 10, 1821. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Dong, J. A framework integrating deeplabV3+, transfer learning, active learning, and incremental learning for mapping building footprints. Remote Sens. 2022, 14, 4738. [Google Scholar] [CrossRef]
Xi, D.; Qin, Y.; Wang, Z. Attention Deeplabv3 model and its application into gear pitting measurement. J. Intell. Fuzzy Syst. 2022, 42, 3107–3120. [Google Scholar] [CrossRef]
Shahi, T.B.; Sitaula, C.; Neupane, A.; Guo, W. Fruit classification using attention-based MobileNetV2 for industrial applications. PLoS ONE 2022, 17, e0264586. [Google Scholar] [CrossRef] [PubMed]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Rajkar, A.; Kulkarni, N.; Raut, A. Driver drowsiness detection using deep learning. In Applied Information Processing Systems, Proceedings of the ICCET 2021, online, 25–27 February 2021; Springer: Singapore, 2022; pp. 73–82. [Google Scholar]

Figure 1. Fatigue-driving detection methods.

Figure 2. The framework of the fatigue-driving detection system.

Figure 3. The network structure of YOLOv4-tiny.

Figure 4. The network structure of IDLN.

Figure 5. Variation of eye state degree with video frames when the driver is in different states and different scenes. (a,c,e,g) are drivers in the normal state, and (b,d,f,h) are drivers in the fatigue state. (a,b) indicate that the driver is in the bare face scene, (c,d) indicate that the driver is in the mask-covered scene, (e,f) indicate that the driver is in the glasses wearing scene, and (g,h) indicate that the driver is in the scene with mask and glasses cover.

Figure 6. Variation of Gaussian weighting scores with eye state degree for different Gaussian weighting factors.

Figure 7. Internal scene of the data collection vehicle.

Figure 8. Some of the EFSD samples.

Figure 9. The curve of loss value of the YOLOv4-tiny model.

Figure 10. Detection results of the YOLOv4-tiny model on EIMDSD.

Figure 11. Comparison of Gaussian weighting scores for normal and fatigue states with different Gaussian weight factors. (a–j) denote the Gaussian weighting scores at 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0, respectively.

Figure 12. Selected infrared face images from the CASIA-Iris-Distance infrared face image dataset.

Figure 13. Eye detection model results on CASIA-Iris-Distance infrared face image dataset.

Figure 14. Sample data of some infrared eye images.

Figure 15. Results of eye semantic segmentation of infrared images.

Table 1. Segmentation results of IDLN in EIMDSD.

Model	Images	mIoU	mPA	FPS
ANN	600	0.863	0.957	15.15
BiSeNetv2	600	0.840	0.950	14.93
DANet	600	0.818	0.944	19.61
DeepLabv3	600	0.866	0.959	18.52
DeepLabv3+	600	0.868	0.961	28.50
Fast-SCNN	600	0.859	0.955	16.95
FCN	600	0.858	0.955	5.85
ISANet	600	0.858	0.955	16.13
OCRNet	600	0.862	0.957	6.25
PSPNet	600	0.802	0.939	14.29
UNet	600	0.866	0.957	27.78
Ours	600	0.883	0.962	34.10

Table 2. Weighting scores corresponding to different Gaussian weighting factors when the driver is in a normal state.

Video	σ = 0.1	σ = 0.2	σ = 0.3	σ = 0.4	σ = 0.5	σ = 0.6	σ = 0.7	σ = 0.8	σ = 0.9	σ = 1
1	1.159	0.671	0.551	0.570	0.647	0.727	0.790	0.836	0.870	0.895
2	0.519	0.311	0.238	0.241	0.319	0.426	0.529	0.614	0.681	0.734
3	0.305	0.180	0.149	0.178	0.273	0.393	0.503	0.593	0.665	0.721
4	0.151	0.105	0.110	0.171	0.284	0.410	0.522	0.611	0.681	0.735
5	4.103	2.552	2.045	1.770	1.580	1.442	1.343	1.271	1.219	1.179
6	2.113	1.308	1.088	1.052	1.060	1.066	1.065	1.060	1.054	1.047
7	4.371	2.619	2.047	1.748	1.560	1.411	1.314	1.246	1.197	1.160
8	3.279	2.215	1.942	1.784	1.636	1.506	1.402	1.323	1.163	1.217
9	2.887	1.928	1.598	1.450	1.359	1.290	1.235	1.192	1.158	1.131
10	1.998	1.273	1.083	1.062	1.071	1.081	1.079	1.072	1.064	1.056

Table 3. Weighting scores corresponding to different Gaussian weighting factors when the driver is in a fatigue state.

Video	σ = 0.1	σ = 0.2	σ = 0.3	σ = 0.4	σ = 0.5	σ = 0.6	σ = 0.7	σ = 0.8	σ = 0.9	σ = 1
1	4.925	2.768	2.061	1.742	1.553	1.423	1.331	1.263	1.214	1.176
2	6.254	3.532	2.543	2.052	1.756	1.562	1.429	1.337	1.269	1.220
3	4.873	2.560	1.843	1.517	1.346	1.247	1.188	1.142	1.113	1.092
4	7.417	3.878	2.732	2.184	1.856	1.638	1.487	1.383	1.307	1.251
5	7.857	4.181	2.941	2.306	1.920	1.671	1.504	1.391	1.310	1.252
6	8.292	4.416	3.126	2.456	2.038	1.762	1.576	1.447	1.356	1.289
7	7.583	4.017	2.817	2.232	1.882	1.652	1.496	1.387	1.310	1.253
8	10.578	5.419	3.649	2.757	2.227	1.889	1.666	1.515	1.408	1.331
9	8.752	4.621	3.183	2.448	2.010	1.732	1.548	1.423	1.335	1.271
10	7.572	3.935	2.729	2.138	1.791	1.571	1.427	1.329	1.261	1.211

Table 4. Gaussian weight factors and the number of misclassified samples under different thresholds.

Gaussian Weighting Factors	State Determination Threshold (T)	Number of Misclassified Samples
σ = 0.1	4.749	0
σ = 0.2	2.624	1
σ = 0.3	1.924	4
σ = 0.4	1.593	4
σ = 0.5	1.408	4
σ = 0.6	1.295	4
σ = 0.7	1.221	5
σ = 0.8	1.172	5
σ = 0.9	1.137	5
σ = 1	1.111	5

Table 5. Detection results of our proposed method on the EFSD test set.

Number of Video Samples	Gaussian Weighting Score (S)	State Determination Threshold (T)	Actual Driving State	Predicted Driving State
1	2.126	$5.823$	Normal	Normal
2	1.288	$5.823$	Normal	Normal
3	5.051	$5.823$	Normal	Normal
4	4.130	$5.823$	Normal	Normal
5	4.460	$5.823$	Normal	Normal
6	0.603	$5.823$	Normal	Normal
7	0.430	$5.823$	Normal	Normal
8	8.276	$5.823$	Normal	Fatigue
9	7.198	$5.823$	Normal	Fatigue
10	5.508	$5.823$	Normal	Normal
11	1.295	$5.823$	Normal	Normal
12	0.820	$5.823$	Normal	Normal
13	0.589	$5.823$	Normal	Normal
14	3.436	$5.823$	Normal	Normal
15	0.067	$5.823$	Normal	Normal
16	1.121	$5.823$	Normal	Normal
17	0.827	$5.823$	Normal	Normal
18	5.064	$5.823$	Normal	Normal
19	8.424	$5.823$	Fatigue	Fatigue
20	7.942	$5.823$	Fatigue	Fatigue
21	6.222	$5.823$	Fatigue	Fatigue
22	6.848	$5.823$	Fatigue	Fatigue
23	6.160	$5.823$	Fatigue	Fatigue
24	6.222	$5.823$	Fatigue	Fatigue
25	9.527	$5.823$	Fatigue	Fatigue
26	10.150	$5.823$	Fatigue	Fatigue
27	10.653	$5.823$	Fatigue	Fatigue
28	10.747	$5.823$	Fatigue	Fatigue
29	8.249	$5.823$	Fatigue	Fatigue
30	5.974	$5.823$	Fatigue	Fatigue
31	8.251	$5.823$	Fatigue	Fatigue
32	8.318	$5.823$	Fatigue	Fatigue
33	9.274	$5.823$	Fatigue	Fatigue
34	8.925	$5.823$	Fatigue	Fatigue
35	6.949	$5.823$	Fatigue	Fatigue
36	6.160	$5.823$	Fatigue	Fatigue

Table 6. Statistical results of our proposed method tested on the EFSD test set.

Method	Accuracy (%)	Flops (M)	Params (M)	FPS
Proposed	94.4	74,520.89	11.62	33.5

Table 7. Detection statistics of the YOLOv4-tiny model on the EIMDSD dataset.

Method	AP_0 (%)	AP_1 (%)	mAP (%)	Flops (M)	Params (M)	FPS
YOLOv4-tiny	81.43	79.91	80.67	34,734.37	5.78	37.86

Table 8. Detection statistics of the YOLOv4-tiny model on the CASIA-Iris-Distance infrared face image dataset.

Method	AP_0 (%)	AP_1 (%)	mAP (%)	Flops (M)	Params (M)	FPS
YOLOv4-tiny	99.31	97.83	98.57	34,734.37	5.78	33.2

Table 9. Experimental results of IDLN model for eye semantic segmentation on EIMDSD.

Method	mIoU	mPA	Flops (M)	Params (M)	FPS
IDLN	0.883	0.962	39,786.52	5.84	34.1

Table 10. Segmentation performance of the eye semantic segmentation model in the infrared eye image test set.

Method	mIoU	mPA	Flops (M)	Params (M)	FPS
IDLN	0.902	0.943	39,786.52	5.84	32.6

Table 11. Comparison of experimental results with other fatigue-driving detection algorithms.

Method	Accuracy (%)
Pandey et al. [22]	92.5
Kaur et al. [23]	91
Miah et al. [24]	93
Proposed	97.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, Y.; Hu, R.; Xu, Y.; Hsu, C.-Y.; Du, C. Gaussian Weighted Eye State Determination for Driving Fatigue Detection. Mathematics 2023, 11, 2101. https://doi.org/10.3390/math11092101

AMA Style

Xiang Y, Hu R, Xu Y, Hsu C-Y, Du C. Gaussian Weighted Eye State Determination for Driving Fatigue Detection. Mathematics. 2023; 11(9):2101. https://doi.org/10.3390/math11092101

Chicago/Turabian Style

Xiang, Yunjie, Rong Hu, Yong Xu, Chih-Yu Hsu, and Congliu Du. 2023. "Gaussian Weighted Eye State Determination for Driving Fatigue Detection" Mathematics 11, no. 9: 2101. https://doi.org/10.3390/math11092101

APA Style

Xiang, Y., Hu, R., Xu, Y., Hsu, C.-Y., & Du, C. (2023). Gaussian Weighted Eye State Determination for Driving Fatigue Detection. Mathematics, 11(9), 2101. https://doi.org/10.3390/math11092101

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gaussian Weighted Eye State Determination for Driving Fatigue Detection

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Eye Detection Based on the YOLOv4-Tiny Model

3.2. Eye Semantic Segmentation Model

3.3. A Gaussian-Weighted Eye State for Fatigue Determination

3.3.1. Normalization of the Eye State Degree

3.3.2. A Gaussian-Weighted Approach to the Definition of Fatigue Degree

3.3.3. Eye State Determination

4. Experimental Results and Discussion

4.1. Eye-Based Fatigue State Dataset

4.2. Evaluative Criteria

4.2.1. Evaluation Criteria for Eye Detection

4.2.2. Evaluation Criteria for Eye Semantic Segmentation

4.3. Eye Detection

4.4. Eye Semantic Segmentation

4.5. Fatigue State Determination under Face Masking Conditions

4.6. Day and Night Fatigue Detection Verification

4.7. Comparisons with State-of-the-Art Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI