An Adaptive Face Tracker with Application in Yawning Detection
Abstract
:1. Introduction
- Face tracker that can track face and facial landmarks in challenging conditions.
- The proposed tracking scheme utilizes the tracked target face samples collected during tracking to update the appearance model online to adapt to the shape and appearance changes in the tracked face along the time.
- A dynamic error prediction scheme to evaluate the correctness of the tracking process during face tracking.
- Utilization of a resyncing mechanism based on the Constrained Local Models (CLM), when the error predictor indicates high error.
- An improvement in the classical CLM approach, namely Weighted CLM (W-CLM) to improve the facial landmark localization.
- An improvement in a yawning detection scheme by using facial landmarks and imposing multiple conditions to avoid false positives.
2. Proposed Adaptive Face Tracking Method
- Block 1: In the first video frame, the initial target face, its affine parameters and the landmarks are localized using W-CLM (for details on W-CLM, see Section 2.2).
- Block 2: In order to track the target face in the subsequent video frames, new affine parameters values are drawn around the affine parameters values of the initial/tracked target face in the previous video frames (see details in Section 2.3).
- Block 3: The affine parameters previously computed are used to warp the current video frame candidate target face samples of size .
- Block 4: If a specific number () of new target face samples have been gathered, the eigenbases are built.
- Block 5: If the condition in block 4 is satisfied, the candidate target face samples are decomposed into patches ( and ), because the eigenbases are built using patches (see Section 2.1).
- Block 6: The tracked target face is found among the candidate target face samples by maximizing the likelihood function in Equation (10) (see details in Section 2.3).
- Block 7: If the condition in block 4 is not satisfied, the tracked target face is estimated by the mean of the previously tracked target face samples. (see Equation (8)).
- Block 8: The proposed error predictor checks if resyncing of the tracked target face is required to correct the tracking process (see details in Section 2.4).
- Blocks 9-10: If resyncing of the tracked target face is not required, the eigenbases are updated if a sufficient new tracked target face samples have been accumulated (see details in Section 2.1).
- Blocks 12-13: In case the tracking error is higher than a threshold , W-CLM is used to re-locate the tracked target face landmarks and correct the tracking process (see details in Section 2.2).
- Block 14: The yawning is detected (see details in Section 3.4).
- Block 15: The tracked target face and its affine parameters are used as seeds to keep tracking the tracked target face in the next video frame if there are more frames to process.
2.1. Incremental Update of the Eigenbases and the Mean
2.2. Weighted Constrained Local Model (W-CLM) as the Feature Detector Used for Resyncing
2.2.1. CLM Model Building
2.2.2. Weighted CLM Search Method
2.3. The Proposed Tracking Method Applied to Human Faces
Algorithm 1 Incremental Learning For Face Tracking Algorithm (ILFT). |
|
2.4. Tracking Error Prediction and Resyncing Mechanism
Algorithm 2 Adaptive Face Tracker with Resyncing Mechanism using W-CLM (AFTRM-W). |
|
3. Experimental Results and Discussion
3.1. Choice of Batch Size
3.2. Discussion on Error Prediction and Resyncing
3.3. Quantitative Evaluation of the Proposed Face Tracking Method
3.4. Evaluation of the Proposed Face Tracking Method in Yawning Detection
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Liu, H.; Li, S.; Fang, L. Robust Object Tracking Based on Principal Component Analysis and Local Sparse Representation. IEEE Trans. Instrum. Meas. 2015, 64, 2863–2875. [Google Scholar] [CrossRef]
- Chrysos, G.G.; Antonakos, E.; Snape, P.; Asthana, A.; Zafeiriou, S. A comprehensive performance evaluation of deformable face tracking “In-the-Wild”. Int. J. Comput. Vis. 2018, 126, 198–232. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Omidyeganeh, M.; Shirmohammadi, S.; Abtahi, S.; Khurshid, A.; Farhan, M.; Scharcanski, J.; Hariri, B.; Laroche, D.; Martel, L. Yawning Detection Using Embedded Smart Cameras. IEEE Trans. Instrum. Meas. 2016, 65, 570–582. [Google Scholar] [CrossRef]
- Liu, S.; Wang, Y.; Wu, X.; Li, J.; Lei, T. Discriminative dictionary learning algorithm based on sample diversity and locality of atoms for face recognition. J. Vis. Commun. Image Represent. 2020, 102763. [Google Scholar] [CrossRef]
- Terissi, L.D.; Gomez, J.C. Facial motion tracking and animation: An ICA-based approach. In Proceedings of the Signal Processing Conference, Poznan, Poland, 3–7 September 2007. [Google Scholar]
- Babenko, B.; Yang, M.H.; Belongie, S. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 22–24 June 2009; pp. 983–990. [Google Scholar]
- Ross, D.A.; Lim, J.; Lin, R.S.; Yang, M.H. Incremental Learning for Robust Visual Tracking. Int. J. Comput. Vis. 2008, 77, 125–141. [Google Scholar] [CrossRef]
- Ou, W.; Yuan, D.; Li, D.; Liu, B.; Xia, D.; Zeng, W. Patch-based visual tracking with online representative sample selection. J. Electron. Imaging 2017, 26, 1–12. [Google Scholar] [CrossRef]
- Cootes, T.; Edwards, G.J.; Taylor, C. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 681–685. [Google Scholar] [CrossRef] [Green Version]
- Kim, M.; Kumar, S.; Pavlovic, V.; Rowley, H. Face tracking and recognition with visual constraints in real-world videos. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Li, X.; Liu, Q.; He, Z.; Wang, H.; Zhang, C.; Chen, W.S. A multi-view model for visual tracking via correlation filters. Knowl.-Based Syst. 2016, 113, 88–99. [Google Scholar] [CrossRef]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Learning Spatially Regularized Correlation Filters for Visual Tracking. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Las Condes, Chile, 11–18 December 2015; pp. 4310–4318. [Google Scholar]
- Wang, Q.; Gao, J.; Xing, J.; Zhang, M.; Hu, W. DCFNet: Discriminant Correlation Filters Network for Visual Tracking. arXiv 2017, arXiv:1704.04057. [Google Scholar]
- Kart, U.; Lukežič, A.; Kristan, M.; Kämäräinen, J.K.; Matas, J. Object tracking by reconstruction with view-specific discriminative correlation filters. In Proceedings of the Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Sanchez, L.E.; Martinez, B.; Tzimiropoulos, G.; Valstar, M. Cascaded Continuous Regression for Real-Time Incremental Face Tracking; Springer: Amsterdam, The Netherlands, 11–14 October 2016; pp. 645–661. [Google Scholar]
- Cootes, T.F.; Taylor, C.J.; Cooper, D.H.; Graham, J. Active shape models-their training and application. Comput. Vis. Image Underst. 1995, 61, 38–59. [Google Scholar] [CrossRef] [Green Version]
- Cristinacce, D.; Cootes, T.F. Feature Detection and Tracking with Constrained Local Models. In Proceedings of the British Machine Vision Conference (BMVC), Edinburgh, Scotland, 4–7 September 2006; p. 3. [Google Scholar]
- Lucey, S.; Wang, Y.; Cox, M.; Sridharan, S.; Cohn, J.F. Efficient constrained local model fitting for non-rigid face alignment. Image Vis. Comput. 2009, 27, 1804–1813. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shirmohammadi, S.; Ferrero, A. Camera as the instrument: The rising trend of vision based measurement. IEEE Instrum. Meas. Mag. 2014, 17, 41–47. [Google Scholar] [CrossRef]
- Van de Cruys, T. Two multivariate generalizations of pointwise mutual information. In Proceedings of the Workshop on Distributional Semantics and Compositionality. Association for Computational Linguistics, Portland, OR, USA, 11–13 June 2011; pp. 16–20. [Google Scholar]
- Dryden, I.L. Shape analysis. Wiley StatsRef Stat. Ref. Online 2014. [Google Scholar] [CrossRef]
- Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B. 1999, 61, 611–622. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Gu, L.; Kanade, T. A Generative Shape Regularization Model for Robust Face Alignment; Springer: Berlin, Germany, 2008; pp. 413–426. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
- Yuan, Y.; Emmanuel, S.; Lin, W.; Fang, Y. Visual object tracking based on appearance model selection. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW), San Jose, CA, USA, 15–19 July 2013; pp. 1–4. [Google Scholar]
- Abtahi, S.; Omidyeganeh, M.; Shirmohammadi, S.; Hariri, B. YawDD: A Yawning Detection Dataset. In Proceedings of the 5th ACM Multimedia Systems Conference, Singapore, 19–21 March 2014. [Google Scholar] [CrossRef]
- Talking Face Video. Available online: http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html (accessed on 20 August 2019).
- Zheng, S.; Sturgess, P.; Torr, P.H. Approximate structured output learning for constrained local models with application to real-time facial feature detection and tracking on low-power devices. In Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013; pp. 1–8. [Google Scholar]
- Khurshid, A.; Scharcanski, J. Incremental multi-model dictionary learning for face tracking. In Proceedings of the 2018 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Houston, TX, USA, 14–17 May 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Chiang, C.; Tai, W.; Yang, M.; Huang, Y.; Huang, C. A novel method for detecting lips, eyes and faces in real time. Real-Time Imaging 2003, 9, 277–287. [Google Scholar] [CrossRef]
- Bouvier, C.; Benoit, A.; Caplier, A.; Coulon, P.Y. Open or closed mouth state detection: Static supervised classification based on log-polar signature. In Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, Juan-les-Pins, France, 20–24 October 2008; pp. 1093–1102. [Google Scholar]
Param | min | max | Optimal |
---|---|---|---|
Batch Size | 1 | 16 | 5 |
Face size u | 16 | 64 | 32 |
Patch size v | 4 | 32 | 8 |
Forgetting factor f | 0.5 | 1.0 | 0.95 |
Method | Batch Size () | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
AFTRM | 0.09 | 0.21 | 0.25 | 0.31 | 0.29 | 0.38 | 0.53 | 0.62 | 0.79 | 0.98 |
AFTRM-W | 0.07 | 0.23 | 0.20 | 0.23 | 0.33 | 0.45 | 0.56 | 0.54 | 0.83 | 1.10 |
N° of resync | 1002 | 474 | 279 | 231 | 194 | 171 | 161 | 146 | 117 | 108 |
Video | 1 | 2 | 3 | 4 | 5 | 6 | Average |
---|---|---|---|---|---|---|---|
Terissi et al. [5] | 38.43 | 26.93 | 50.38 | 66.44 | 66.12 | 16.75 | 34.24 |
Ross et al. [7] | 21.43 | 10.56 | 183.7 | 30.12 | 6.23 | 12.17 | 44.04 |
Zheng et al. [30] | 33.93 | 11.46 | 12.41 | 17.05 | 12.26 | 14.02 | 16.86 |
Sanchez et al. [15] | 16.42 | 11.48 | 10.33 | 22.07 | 14.49 | 9.84 | 14.10 |
Li et al. [11] | 23.14 | 11.20 | 15.58 | 21.46 | 14.97 | 9.61 | 14.74 |
Wang et al. [13] | 20.32 | 11.35 | 7.51 | 10.84 | 15.33 | 18.07 | 13.57 |
MMDL-FT [31] | 10.12 | 7.19 | 7.63 | 22.02 | 8.06 | 30.75 | 14.29 |
MMDL-FTU [31] | 9.73 | 6.50 | 7.76 | 16.62 | 7.76 | 19.37 | 11.29 |
AFTRM | 15.01 | 9.22 | 13.78 | 15.31 | 5.91 | 7.53 | 11.12 |
AFTRM-W | 6.54 | 3.56 | 10.65 | 5.27 | 4.85 | 3.62 | 5.65 |
Video | Male Videos | Female Videos | Average |
---|---|---|---|
Terissi et al. [5] | 25.92 | 18.37 | 22.15 |
Ross et al. [7] | 14.74 | 11.33 | 13.03 |
Zheng et al. [30] | 13.02 | 10.14 | 11.58 |
Sanchez et al. [15] | 14.11 | 10.17 | 12.14 |
Li et al. [11] | 17.23 | 14.93 | 16.08 |
Wang et al. [13] | 15.52 | 13.58 | 14.55 |
MMDL-FT [31] | 10.61 | 8.70 | 9.65 |
MMDL-FTU [31] | 10.36 | 8.68 | 9.52 |
AFTRM | 8.81 | 7.54 | 8.18 |
AFTRM-W | 5.31 | 4.24 | 4.78 |
Dataset Name | Talking Face Video | |
---|---|---|
Method | CLE | RMSE |
Terissi et al. [5] | 27.79 | 26.43 |
Ross et al. [7] | 12.05 | 10.09 |
Zheng et al. [30] | 11.31 | 11.15 |
Sanchez et al. [15] | 10.42 | 10.51 |
Li et al. [11] | 9.27 | 11.39 |
Wang et al. [13] | 11.63 | 13.46 |
MMDL-FT [31] | 16.45 | 15.91 |
MMDL-FTU [31] | 13.93 | 13.48 |
AFTRM | 8.92 | 8.98 |
AFTRM-W | 6.81 | 6.62 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khurshid, A.; Scharcanski, J. An Adaptive Face Tracker with Application in Yawning Detection. Sensors 2020, 20, 1494. https://doi.org/10.3390/s20051494
Khurshid A, Scharcanski J. An Adaptive Face Tracker with Application in Yawning Detection. Sensors. 2020; 20(5):1494. https://doi.org/10.3390/s20051494
Chicago/Turabian StyleKhurshid, Aasim, and Jacob Scharcanski. 2020. "An Adaptive Face Tracker with Application in Yawning Detection" Sensors 20, no. 5: 1494. https://doi.org/10.3390/s20051494