Next Article in Journal
Feasibility Test of a Liquid Film Thickness Sensor on a Flexible Printed Circuit Board Using a Three-Electrode Conductance Method
Next Article in Special Issue
3D Visual Tracking of an Articulated Robot in Precision Automated Tasks
Previous Article in Journal
Statistical Modeling of Indirect Paths for UWB Sensors in an Indoor Environment
Previous Article in Special Issue
A Novel Probabilistic Data Association for Target Tracking in a Cluttered Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-User Identification-Based Eye-Tracking Algorithm Using Position Estimation

Department of Electronic Engineering, Sogang University, Seoul 04107, Korea
Sensors 2017, 17(1), 41; https://doi.org/10.3390/s17010041
Submission received: 13 December 2016 / Revised: 22 December 2016 / Accepted: 23 December 2016 / Published: 27 December 2016
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)

Abstract

:
This paper proposes a new multi-user eye-tracking algorithm using position estimation. Conventional eye-tracking algorithms are typically suitable only for a single user, and thereby cannot be used for a multi-user system. Even though they can be used to track the eyes of multiple users, their detection accuracy is low and they cannot identify multiple users individually. The proposed algorithm solves these problems and enhances the detection accuracy. Specifically, the proposed algorithm adopts a classifier to detect faces for the red, green, and blue (RGB) and depth images. Then, it calculates features based on the histogram of the oriented gradient for the detected facial region to identify multiple users, and selects the template that best matches the users from a pre-determined face database. Finally, the proposed algorithm extracts the final eye positions based on anatomical proportions. Simulation results show that the proposed algorithm improved the average F1 score by up to 0.490, compared with benchmark algorithms.

1. Introduction

Currently, various fields require information about human eye recognition. In particular, the eye recognition is one of the most important features in applications in vehicles because it can estimate human fatigue state, which has a direct impact on the safety of the driver and the passenger. For example, Figure 1a shows a system that checks drowsiness by analyzing the driver’s eyes. In addition, human eyes can be used as an interface to control the operation of the display in the vehicle. Figure 1b shows that the eyes of multiple users control the display of the center console. In these cases, the precise eye positions for multiple users are required. To do so, the eye-tracking algorithm should calculate accurate positional information in the horizontal direction (x), vertical direction (y), and depth direction (z), on the basis of the camera device [1,2].
Various eye-tracking algorithms have been proposed. A video-based eye-tracking algorithm has been proposed [3] to track the eye positions in input frames. This algorithm detects the user’s face using eigenspaces, and estimates motion based on a block-matching algorithm to track the user’s face. However, this algorithm is only suitable for a single user. Another algorithm uses depth and color image sequences for depth-camera–based multi-user eye tracking [4]. This algorithm uses an object-tracking algorithm and eye localization. However, it requires considerable computation time to track multiple users, and it cannot distinguish between them—i.e., it does not associate any particular facial region with a single discrete user.
Generally, eye-tracking algorithms require an accurate face-detection algorithm for high performance. There are two representative face-detection algorithms. A local binary pattern–based algorithm [5,6] uses local image textures in an input image. Hence, it is robust to gray-scale variations, and it is efficient insofar as it uses simple binary patterns. Another approach is a robust real-time face-detection algorithm [7,8]. It uses an integral imaging technique for fast computation. In addition, it uses cascade classifiers based on an adaptive boost-learning algorithm (AdaBoost) to improve the detection accuracy. Eye-tracking algorithms can adopt either of these face-detection algorithms.
In this paper, a new multi-user eye-tracking algorithm is proposed. It is based on a previous study [9], but overall operation blocks are totally changed to enhance performance. The proposed algorithm performs the calibration of red, green, and blue (RGB) and depth images to prevent distortion, and uses the user classification module and several features to enhance the performance. Specifically, it selects the candidate regions (in which faces exist) from an input image. Then, it adopts an AdaBoost-based face-detection algorithm based on [7], and extracts features from the histogram of gradient (HOG) in a facial region. Then, it searches for a template that best matches the input face from a pre-calculated face database. Finally, it estimates and extracts user eye positions based on anatomical proportions.
This paper is organized as follows. Section 2 describes the proposed multi-user eye-tracking algorithm. Section 3 presents performance evaluations comparing the proposal with benchmark algorithms. Section 4 concludes the paper.

2. Proposed Algorithm

Figure 2 shows a conceptual block diagram for the proposed algorithm. First, in the pre-processing module, the proposed algorithm calibrates the RGB and depth images, which are captured by RGB and depth cameras. Second, the face-detection module performs face extraction from the input images. Third, the user-classification module identifies multiple users. Finally, the 3D eye positions are extracted. Figure 3 shows a detailed block diagram for the proposed algorithm. The specific operations are described in the following sub-sections.

2.1. Pre-Processing Module

The proposed algorithm uses RGB and depth cameras. In some cases, the pixel resolution of the RGB and depth images can differ. Hence, the resolutions must be calibrated, and the proposed algorithm increases a low-resolution depth image such that its resolution matches the RGB image. The resolution of depth images is generally lower than that of RGB images. To match the resolution, the proposed algorithm uses a bilinear interpolation algorithm [10], as shown in Figure 4. For example, if the resolution is doubled, it is defined as follows:
I x + 1 2 , y = λ 1 × { I x , y + I x + 1 , y } , I x , y + 1 2 = λ 2 × { I x , y + I x , y + 1 } , I x + 1 2 , y + 1 2 = λ 3 × { I x , y + I x + 1 , y + I x , y + 1 + I x + 1 , y + 1 } ,
where λ1, λ2, and λ3 denote the horizontal, vertical, and diagonal weights, respectively (which are 0.5, 0.5, and 0.25, respectively), and Ix+1/2,y, Ix,y+1/2, and Ix+1/2,y+1/2 denote the horizontal, vertical, and diagonal interpolated pixels, respectively.
Then, the proposed algorithm extracts the candidate search region. In the input image captured by the cameras, the region where users are likely to be when watching a large-sized display such as a television is restricted to a certain area. Therefore, the proposed algorithm uses this region to search for users’ faces, thereby reducing the computation time. The detailed operation for detecting faces is described in the following sub-section.

2.2. Face-Detection Module

The proposed algorithm uses the classifier-based face-detection algorithm proposed in [7]. This algorithm offers a high detection rate and it can be operated in real time. In addition, the proposed algorithm analyzes the facial candidate regions selected during pre-processing, thereby enhancing the detection accuracy while reducing the search region. Specifically, the face-detection algorithm uses several rectangular features, and calculates these features based on an integral image [7,11]. This integral image technique generates a summed area table to generate the sum of the pixel values in a rectangular window to enhance the computational efficiency. In addition, it uses simple classifiers generated by the AdaBoost algorithm [7] to select features from the detected face. Finally, the face-detection algorithm uses a cascading structure to generate classifiers which can more accurately detect faces while reducing the operation time. Figure 5 shows the concept for the cascading structure of the face-detection module in the proposed algorithm. The first classifier rejects negative inputs using a few operations. The operations at further stages of the cascade also reject negative inputs, and gradually enhance the accuracy of the detection after multiple stages. Therefore, the proposed algorithm can detect the facial region exactly.

2.3. User-Classification Module

After the faces are detected, they are classified individually based on a pre-calculated database. Figure 6 provides an overall block diagram for this process. The histogram of oriented gradients (HOG) is used as a classification feature because of its robustness in classifying faces [12]. Specifically, the horizontal and vertical gradients for the facial region are calculated as follows:
H G = [ 1 0 1 ] B F , V G = [ 1 0 1 ] T B F ,
where HG and VG respectively denote the horizontal and vertical gradients filtered with a 1D-centered discrete derivative mask, and BF denotes a detected face block. Using the gradients, the HOGs of magnitude and orientation for each pixel are generated as follows:
M x , y = ( H G x , y 2 + V G x , y 2 ) 1 2 , θ x , y = tan 1 ( V G x , y H G x , y ) + π 2 ,
where Mx,y and θx,y denote the magnitude and orientation of the pixel, respectively. Histograms for the two properties are generated, and histograms for several blocks are combined into one feature vector. Then, the feature vector is classified using a support vector machine (SVM) [13] to partition the classes maximally, thereby generating the exact class for the input face.

2.4. Three-Dimensional Eye-Position Extraction Module

In this module, the proposed algorithm calculates the left and right eye positions. Specifically, it uses the anatomical proportions for the eye position in a human face. Figure 7 shows a conceptual image of this module. First, it computes the horizontal and vertical positions (x and y axes), and then it calculates the depth position (z axis). The image on the left in Figure 7 includes several parameters for calculating the 3D eye position, and these are derived as follows:
p x 1 = x i + α , p x 2 = x i + 1 α , p y = y i + β , p z = d max × I d e p t h I max ,
where xi and yi denote an initial pixel point in the detected facial region, α and β denote the horizontal and vertical offsets, respectively, Imax and Idepth denote the maximum intensity level and the intensity level of the detected face, and dmax denotes the real maximum distance. Using these parameters, the final left and right eye positions are as follows:
p e y e L = ( p x 1 , p y , p z ) , p e y e R = ( p x 2 , p y , p z ) .
Using this module, the proposed algorithm can extract the final 3D eye positions.

3. Simulation Results

The detection accuracy of the proposed algorithm was evaluated by comparing it with benchmark algorithms. In addition, the identification ratio with multiple users was calculated for the proposed algorithm. The RGB camera had a resolution of 1280 × 960 pixels and the depth camera’s resolution was 640 × 480 pixels. The dataset we used was an image sequence taken with a direct RGB camera and a depth camera in consideration of the distance change. Three benchmark algorithms were used: the classifier-based detection algorithm (Algorithm 1) [7], the improved Haar feature–based detection algorithm (Algorithm 2) [8,9], and the low binary pattern (LBP)-based detection algorithm (Algorithm 3) [6]. For an objective evaluation, the proposed algorithm calculated by precision, recall, and F1 scores [14,15], which are derived as follows:
Precision = T P T P + F P , Recall = T P T P + F N , F 1 Score = 2 × Precision × Recall Precision + Recall ,
where TP, FP, and FN denote the number of true positives, false positives, and false negatives that were detected, respectively. Using these values, the F1 score was calculated, for which a value of one indicates perfect accuracy. For the test sequences, we used several sequences at different distances (ranging from 1 m to 3.5 m) between the camera and multiple users.
First, the accuracy of detection using the proposed and benchmark algorithms was compared. Table 1 shows the average precision and recall values for the proposed and benchmark algorithms at different distances. Table 2 shows the average F1 score, combining precision and recall at different distances. In terms of precision, the total averages of the benchmark Algorithms 1, 2, and 3 were 0.669, 0.849, and 0.726 on average, respectively. In contrast, the proposed algorithm resulted in a perfect score of 1.000. In terms of recall, the total averages of the benchmark Algorithms 1, 2, and 3 were 0.988, 0.993, and 0.738, whereas the proposed algorithm resulted in 0.988. Therefore, the average F1 score for the proposed algorithm was up to 0.294, 0.151, and 0.490 higher than those of Algorithms 1, 2, and 3, respectively. This means that the detection accuracy of the proposed algorithm was higher than that of the benchmark algorithms. Figure 8 also shows the same results where the precision and recall values of the proposed algorithm were higher than those of the benchmark algorithms. This was because the proposed algorithm accurately classified foreground and background images by using several cascade classifiers after calibrating RGB and depth images.
Figure 9 and Figure 10 show the resulting RGB and depth images from the proposed and benchmark algorithms at different distances (2.5 m and 3.5 m). The benchmark algorithms detected false regions as faces, and some faces remained undetected. In addition, these algorithms could not associate any particular facial region with a single discrete user. On the other hand, the proposed algorithm accurately detected the faces of multiple users and classified each of them by assigning each face a different number, as shown in Figure 9d and Figure 10d (here, 1, 2, and 3 are the identification numbers for the users).
The identification accuracy of the proposed algorithm for each face from multiple users was also evaluated. Table 3 shows the identification number and ratio for multiple users with the proposed algorithm. The maximum number of users was three. The identification ratios for Faces 1, 2, and 3 were 0.987, 0.985, and 0.997, respectively. In total, the ratio was 0.990 on average, which is highly accurate. This was because the proposed algorithm used the pre-training process for required users, and hence, it had a higher performance than the conventional algorithms.

4. Conclusions

This paper presented a robust multi-user eye-tracking algorithm using position estimation. It determines the candidate eye-position regions from input RGB and depth images. Using this region, the proposed algorithm adopts a classifier-based face-detection algorithm, and computes features based on the histogram of oriented gradients for the detected facial region. Then, it selects the template that best matches the input face from a pre-determined database, and extracts the final eye positions based on anatomical proportions. The results of a simulation demonstrated that the proposed algorithm is highly accurate, with an average F1 score that was up to 0.490 higher than that of the benchmark algorithms.

Acknowledgments

This research was supported by a grant (16CTAP-C114672-01) from the Infrastructure and Transportation Technology Promotion Research Program funded by the Ministry of Land, Infrastructure and Transport of Korean government.

Author Contributions

The author would like to thank to Yong-Woo Jeong of Hanyang University for providing a set of image data.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Lopez-Basterretxea, A.; Mendez-Zorrilla, A.; Garcia-Zapirain, B. Eye/Head Tracking Technology to Improve HCI with iPad Applications. Sensors 2015, 15, 2244–2264. [Google Scholar] [CrossRef] [PubMed]
  2. Lee, J.W.; Heo, H.; Park, K.R. A Novel Gaze Tracking Method Based on the Generation of Virtual Calibration Points. Sensors 2013, 13, 10802–10822. [Google Scholar] [CrossRef] [PubMed]
  3. Chen, Y.-S.; Su, C.-H.; Chen, J.-H.; Chen, C.-S.; Hung, Y.-P.; Fuh, C.-S. Video-based eye tracking for autostereoscopic displays. Opt. Eng. 2001, 40, 2726–2734. [Google Scholar]
  4. Li, L.; Xu, Y.; Konig, A. Robust depth camera based multi-user eye tracking for autostereoscopic displays. In Proceedings of the 9th International Multi-Conference on Systems, Sygnals & Devices, Chemnitz, Germany, 20–23 March 2012.
  5. Ojala, T.; Pietikainen, M.; Maenp, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
  6. Bilaniuk, O.; Fazl-Ersi, E.; Laganiere, R.; Xu, C.; Laroche, D.; Moulder, C. Fast LBP face detection on low-power SIMD architectures. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014.
  7. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001.
  8. Jain, A.; Bharti, J.; Gupta, M.K. Improvements in openCV’s viola jones algorithm in face detection - tilted face detection. Int. J. Signal Image Proc. 2014, 5, 21–28. [Google Scholar]
  9. Kang, S.-J.; Jeong, Y.-W.; Yun, J.-J.; Bae, S. Real-time eye tracking technique for multiview 3D systems. In Proceedings of the 2016 IEEE International Conference on Consumer Electronics, Las Vegas, NV, USA, 8–11 January 2016.
  10. Lehmann, T.M.; Gonner, C.; Spitzer, K. Survey: Interpolation methods in medical image processing. IEEE Trans. Med. Imaging 1999, 18, 1049–1075. [Google Scholar] [CrossRef] [PubMed]
  11. Crow, F. Summed-area tables for texture mapping. ACM SIGGRAPH Comput. Gr. 1984, 18, 207–212. [Google Scholar] [CrossRef]
  12. Dala, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005.
  13. Lowe, D.G. Distinctive image features from scale-invariant key points. Int. J. Comp. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  14. Kang, S.-J.; Cho, S.I.; Yoo, S.; Kim, Y.H. Scene change detection using multiple histograms for motion-compensated frame rate up-conversion. J. Disp. Technol. 2012, 8, 121–126. [Google Scholar] [CrossRef]
  15. Yang, Y.; Liu, X. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999; pp. 42–49.
Figure 1. Various examples in a vehicle application: (a) a drowsiness warning system; and (b) an interface control system using multi-user eye tracking.
Figure 1. Various examples in a vehicle application: (a) a drowsiness warning system; and (b) an interface control system using multi-user eye tracking.
Sensors 17 00041 g001
Figure 2. Overall concept for the proposed multi-user eye tracking algorithm.
Figure 2. Overall concept for the proposed multi-user eye tracking algorithm.
Sensors 17 00041 g002
Figure 3. Overall block diagram for the proposed algorithm.
Figure 3. Overall block diagram for the proposed algorithm.
Sensors 17 00041 g003
Figure 4. Pixel arrangement in bilinear interpolation algorithm when an input image resolution is doubled.
Figure 4. Pixel arrangement in bilinear interpolation algorithm when an input image resolution is doubled.
Sensors 17 00041 g004
Figure 5. Concept for the cascading structure of the face-detection module in the proposed algorithm.
Figure 5. Concept for the cascading structure of the face-detection module in the proposed algorithm.
Sensors 17 00041 g005
Figure 6. Overall block diagram for the multi-user classification module.
Figure 6. Overall block diagram for the multi-user classification module.
Sensors 17 00041 g006
Figure 7. Concept for extracting 3D eye position from the RGB and depth images.
Figure 7. Concept for extracting 3D eye position from the RGB and depth images.
Sensors 17 00041 g007
Figure 8. The data distribution of the precision-recall graph for the proposed and benchmark algorithms.
Figure 8. The data distribution of the precision-recall graph for the proposed and benchmark algorithms.
Sensors 17 00041 g008
Figure 9. Comparing the detection accuracy of the proposed and benchmark algorithms at a distance of 2.5 m from the RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm 1; (b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm.
Figure 9. Comparing the detection accuracy of the proposed and benchmark algorithms at a distance of 2.5 m from the RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm 1; (b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm.
Sensors 17 00041 g009
Figure 10. Comparing the detection accuracy of the proposed and benchmark algorithms at a distance of 3.5 m from RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm 1; (b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm.
Figure 10. Comparing the detection accuracy of the proposed and benchmark algorithms at a distance of 3.5 m from RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm 1; (b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm.
Sensors 17 00041 g010
Table 1. Average precision and recall values for the proposed and benchmark algorithms at different distances.
Table 1. Average precision and recall values for the proposed and benchmark algorithms at different distances.
Distance (m)Algorithm 1Algorithm 2Algorithm 3Proposed Algorithm
PrecisionRecallPrecisionRecallPrecisionRecallPrecisionRecall
1.0000.7410.9810.8770.9910.7300.6191.0000.981
1.5000.5730.9850.7320.9910.4930.5141.0000.985
2.0000.6370.9750.8330.9810.8250.7891.0000.975
2.5000.6641.0000.8531.0000.7130.9381.0001.000
3.0000.7170.9910.8861.0000.8240.8281.0000.991
3.5000.8060.9910.9720.9950.7080.8001.0000.991
Random0.5440.9940.7920.9940.7920.6771.0000.994
Table 2. F1 score values for the proposed and benchmark algorithms at different distances.
Table 2. F1 score values for the proposed and benchmark algorithms at different distances.
Distance (m)Algorithm 1Algorithm 2Algorithm 3Proposed Algorithm
F1 ScoreDifferenceF1 ScoreDifferenceF1 ScoreDifferenceF1 Score
1.0000.844−0.1470.931−0.0600.674−0.3200.991
1.5000.725−0.2680.842−0.1510.503−0.4900.993
2.0000.771−0.2160.901−0.0860.807−0.1800.987
2.5000.798−0.2020.921−0.0790.811−0.1891.000
3.0000.832−0.1640.939−0.0570.826−0.1700.996
3.5000.889−0.1070.983−0.0130.751−0.2450.996
Random0.703−0.2940.882−0.1150.731−0.2660.997
Table 3. Identification number and ratio for multiple users with the proposed algorithm.
Table 3. Identification number and ratio for multiple users with the proposed algorithm.
Distance (m)Face 1Face 2Face 3
Detection NumberDetection RatioDetection NumberDetection RatioDetection NumberDetection Ratio
1.00070/701.00068/700.97070/701.000
1.50070/701.00069/700.98069/700.980
2.00064/680.94067/680.98068/681.000
2.50070/701.00070/701.00070/701.000
3.00070/701.00070/701.00070/701.000
3.50069/700.98070/701.00070/701.000
Random89/900.99087/900.97090/901.000

Share and Cite

MDPI and ACS Style

Kang, S.-J. Multi-User Identification-Based Eye-Tracking Algorithm Using Position Estimation. Sensors 2017, 17, 41. https://doi.org/10.3390/s17010041

AMA Style

Kang S-J. Multi-User Identification-Based Eye-Tracking Algorithm Using Position Estimation. Sensors. 2017; 17(1):41. https://doi.org/10.3390/s17010041

Chicago/Turabian Style

Kang, Suk-Ju. 2017. "Multi-User Identification-Based Eye-Tracking Algorithm Using Position Estimation" Sensors 17, no. 1: 41. https://doi.org/10.3390/s17010041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop