Multitask Learning Strategy with Pseudo-Labeling: Face Recognition, Facial Landmark Detection, and Head Pose Estimation
Abstract
:1. Introduction
- We applied our pseudo-labeling framework to tasks lacking a face ID, such as facial landmark detection and head pose estimation. We also used an inherently large face ID face recognition dataset.
- Within our pseudo-labeling framework, we prepared a pseudo-labeled training dataset by assigning entirely new class labels. This dataset contained 8.3 M images annotated with 93 K facial class labels, 68 landmarks, and Euler angles using three pretrained networks.
- In our pseudo-labeling framework, we used an eye blink detection network to align the eye landmarks to make them more robust, which the pretrained network could not achieve. We then employed a selective fitting algorithm to generate high-quality pseudo-labels and validated them using pretrained networks to prevent duplicated face IDs.
- In our multitask learning framework, we designed a loss that generated a synergy between tasks for pose-invariant face recognition and pose-guided facial landmark detection. In addition, we developed a new face recognition evaluation method for pose-invariant evaluation on the IJB–C [31] dataset.
- Finally, we constructed a novel pseudo-labeling and multitask learning framework and demonstrated its SOTA or near-SOTA performance, thus illustrating its applicability to not only the three tasks but also the entire field of facial analysis.
2. Related Works
2.1. Single-Task Learning
2.1.1. Face Recognition
2.1.2. Head Pose Estimation
2.1.3. Facial Landmark Detection
2.2. Multitask Learning
3. Proposed Method
3.1. Pseudo-Labeling
3.1.1. Dataset Selection Strategy
3.1.2. Pseudo-Labeling Landmark and Pose
Algorithm 1 Sampling and Selective Fitting Algorithm for Pseudo-Labeling. |
Input: number of total samples N, image for sampling I, set of augmentations = , pretrained network with 300W-LP [22] dataset , Euclidean distance threshold . Output: Pseudo 68 landmarks and pose values.
|
3.1.3. Pseudo-Labeling Face Identification
Algorithm 2 Identifying Duplicate IDs to Assign New IDs for the Integrated Dataset. |
Input: Pretrained network using VGGFace2 [28] , pretrained network using MS1MV3 [29], VGGFace2 as , MS1MV3 as , number of IDs in , number of IDs in , 300W-LP [22] as , similarity threshold for , and similarity threshold for . Output: Our pseudo-labeled training dataset with self-curated and unduplicated IDs.
|
3.2. Multitask Learning
3.2.1. Network Architecture
3.2.2. Multitask Loss Functions
4. Experimental Results
4.1. Face Recognition
4.1.1. Test Dataset
4.1.2. Experimental Results
4.1.3. Discussion
4.2. Head Pose Estimation
4.2.1. Test Dataset
4.2.2. Experimental Results
4.2.3. Discussion
4.3. Facial Landmark Detection
4.3.1. Test Dataset
4.3.2. Experimental Results
4.3.3. Discussion
4.4. Ablation Study
4.4.1. Influence of Multitask Learning
4.4.2. Influence of Regularization Term
4.4.3. Influence of Hyperparameters and
4.5. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deng, J.; Guo, J.; Zhou, Y.; Yu, J.; Kotsia, I.; Zafeiriou, S. Retinaface: Single-stage dense face localisation in the wild. arXiv 2019, arXiv:1905.00641. [Google Scholar]
- Li, J.; Liu, L.; Li, J.; Feng, J.; Yan, S.; Sim, T. Toward a Comprehensive Face Detector in the Wild. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 104–114. [Google Scholar] [CrossRef]
- Kim, T.K.; Kittler, J. Design and Fusion of Pose-Invariant Face-Identification Experts. IEEE Trans. Circuits Syst. Video Technol. 2006, 16, 1096–1106. [Google Scholar] [CrossRef]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar]
- An, X.; Deng, J.; Guo, J.; Feng, Z.; Zhu, X.; Yang, J.; Liu, T. Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial fc. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4042–4051. [Google Scholar]
- Lee, H.J.; Kim, S.T.; Lee, H.; Ro, Y.M. Lightweight and Effective Facial Landmark Detection using Adversarial Learning with Face Geometric Map Generative Network. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 771–780. [Google Scholar] [CrossRef]
- Wu, C.Y.; Xu, Q.; Neumann, U. Synergy between 3dmm and 3d landmarks for accurate 3d facial geometry. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 453–463. [Google Scholar]
- Bae, H.B.; Jeon, T.; Lee, Y.; Jang, S.; Lee, S. Non-visual to visual translation for cross-domain face recognition. IEEE Access 2020, 8, 50452–50464. [Google Scholar] [CrossRef]
- Cho, M.; Kim, T.; Kim, I.J.; Lee, K.; Lee, S. Relational deep feature learning for heterogeneous face recognition. IEEE Trans. Inf. Forensics Secur. 2020, 16, 376–388. [Google Scholar] [CrossRef]
- Hu, W.; Hu, H. Orthogonal Modality Disentanglement and Representation Alignment Network for NIR-VIS Face Recognition. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 3630–3643. [Google Scholar] [CrossRef]
- Wu, W.; Yin, Y.; Wang, Y.; Wang, X.; Xu, D. Facial expression recognition for different pose faces based on special landmark detection. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 1524–1529. [Google Scholar]
- Jeon, T.; Bae, H.; Lee, Y.; Jang, S.; Lee, S. Stress recognition using face images and facial landmarks. In Proceedings of the 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, Spain, 19–22 January 2020; pp. 1–3. [Google Scholar]
- Kuhnke, F.; Ostermann, J. Deep head pose estimation using synthetic images and partial adversarial domain adaption for continuous label spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 10164–10173. [Google Scholar]
- Valle, R.; Buenaposada, J.M.; Valdés, A.; Baumela, L. Face alignment using a 3D deeply-initialized ensemble of regression trees. Comput. Vis. Image Underst. 2019, 189, 102846. [Google Scholar] [CrossRef]
- Jin, H.; Liao, S.; Shao, L. Pixel-in-pixel net: Towards efficient facial landmark detection in the wild. Int. J. Comput. Vis. 2021, 129, 3174–3194. [Google Scholar] [CrossRef]
- Bafti, S.M.; Chatzidimitriadis, S.; Sirlantzis, K. Cross-domain multitask model for head detection and facial attribute estimation. IEEE Access 2022, 10, 54703–54712. [Google Scholar] [CrossRef]
- Wan, M.; Zhu, S.; Luan, L.; Prateek, G.; Huang, X.; Schwartz-Mette, R.; Hayes, M.; Zimmerman, E.; Ostadabbas, S. Infanface: Bridging the infant–adult domain gap in facial landmark estimation in the wild. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 4486–4492. [Google Scholar]
- Kuhnke, F.; Ostermann, J. Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency. IEEE Trans. Biom. Behav. Identity Sci. 2023, 5, 348–359. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef]
- Wang, Z.; He, K.; Fu, Y.; Feng, R.; Jiang, Y.G.; Xue, X. Multi-task deep neural network for joint face recognition and facial attribute prediction. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Bucharest, Romania, 6–9 June 2017; pp. 365–374. [Google Scholar]
- Qin, L.; Wang, M.; Deng, C.; Wang, K.; Chen, X.; Hu, J.; Deng, W. SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2223–2234. [Google Scholar] [CrossRef]
- Zhu, X.; Lei, Z.; Liu, X.; Shi, H.; Li, S.Z. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 146–155. [Google Scholar]
- Pan, Z.; Wang, Y.; Zhang, S. Joint face detection and Facial Landmark Localization using graph match and pseudo label. Signal Process. Image Commun. 2022, 102, 116587. [Google Scholar] [CrossRef]
- Lee, D.H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the Workshop on Challenges in Representation Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 3, p. 896. [Google Scholar]
- Yu, X.; Ouyang, B.; Principe, J.C.; Farrington, S.; Reed, J.; Li, Y. Weakly supervised learning of point-level annotation for coral image segmentation. In Proceedings of the Oceans 2019 MTS/IEEE, Seattle, WA, USA, 27–31 October 2019; pp. 1–7. [Google Scholar]
- Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 132–149. [Google Scholar]
- Ranjan, R.; Sankaranarayanan, S.; Castillo, C.D.; Chellappa, R. An all-in-one convolutional neural network for face analysis. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 17–24. [Google Scholar]
- Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. Vggface2: A dataset for recognising faces across pose and age. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & GESTURE recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 67–74. [Google Scholar]
- Deng, J.; Guo, J.; Zhang, D.; Deng, Y.; Lu, X.; Shi, S. Lightweight face recognition challenge. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Sagonas, C.; Tzimiropoulos, G.; Zafeiriou, S.; Pantic, M. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 397–403. [Google Scholar]
- Maze, B.; Adams, J.; Duncan, J.A.; Kalka, N.; Miller, T.; Otto, C.; Jain, A.K.; Niggel, W.T.; Anderson, J.; Cheney, J.; et al. Iarpa janus benchmark-c: Face dataset and protocol. In Proceedings of the 2018 International Conference on Biometrics (ICB), Gold Coast, Australia, 20–23 February 2018; pp. 158–165. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer: Cham, Switzerland, 2016; pp. 87–102. [Google Scholar]
- Nech, A.; Kemelmacher-Shlizerman, I. Level playing field for million scale face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7044–7053. [Google Scholar]
- Zhu, Z.; Huang, G.; Deng, J.; Ye, Y.; Huang, J.; Chen, X.; Zhu, J.; Yang, T.; Lu, J.; Du, D.; et al. Webface260m: A benchmark unveiling the power of million-scale deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10492–10502. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-margin softmax loss for convolutional neural networks. arXiv 2016, arXiv:1612.02295. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
- Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5265–5274. [Google Scholar]
- Martins, P.; Batista, J. Accurate single view model-based head pose estimation. In Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, The Netherlands, 17–19 September 2008; pp. 1–6. [Google Scholar]
- Rocca, F.; Mancas, M.; Gosselin, B. Head pose estimation by perspective-n-point solution based on 2d markerless face tracking. In Proceedings of the Intelligent Technologies for Interactive Entertainment: 6th International Conference, INTETAIN 2014, Chicago, IL, USA, 9–11 July 2014; Proceedings 6. Springer: Cham, Switzerland, 2014; pp. 67–76. [Google Scholar]
- Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-pie. Image Vis. Comput. 2010, 28, 807–813. [Google Scholar] [CrossRef] [PubMed]
- Fanelli, G.; Weise, T.; Gall, J.; Van Gool, L. Real time head pose estimation from consumer depth cameras. In Proceedings of the Joint Pattern Recognition Symposium, Frankfurt, Germany, 31 August–2 September 2011; Springer: Cham, Switzerland, 2011; pp. 101–110. [Google Scholar]
- Joo, H.; Liu, H.; Tan, L.; Gui, L.; Nabbe, B.; Matthews, I.; Kanade, T.; Nobuhara, S.; Sheikh, Y. Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3334–3342. [Google Scholar]
- Yang, T.Y.; Chen, Y.T.; Lin, Y.Y.; Chuang, Y.Y. Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 1087–1096. [Google Scholar]
- Cao, Z.; Chu, Z.; Liu, D.; Chen, Y. A vector-based representation to enhance head pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 1188–1197. [Google Scholar]
- Dai, D.; Wangkit Wong, Z.C. RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation; Ping An Technology: Shenzhen, China, 2020. [Google Scholar]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. Facial landmark detection by deep multi-task learning. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VI 13;. Springer: Cham, Switzerland, 2014; pp. 94–108. [Google Scholar]
- Bulat, A.; Tzimiropoulos, G. How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1021–1030. [Google Scholar]
- Feng, Z.H.; Kittler, J.; Awais, M.; Huber, P.; Wu, X.J. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2235–2245. [Google Scholar]
- Zou, X.; Xiao, P.; Wang, J.; Yan, L.; Zhong, S.; Wu, Y. Towards Unconstrained Facial Landmark Detection Robust to Diverse Cropping Manners. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 2070–2075. [Google Scholar] [CrossRef]
- Liao, X.; Wang, Y.; Wang, T.; Hu, J.; Wu, X. FAMM: Facial Muscle Motions for Detecting Compressed Deepfake Videos Over Social Networks. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7236–7251. [Google Scholar] [CrossRef]
- Blanz, V.; Vetter, T. Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1063–1074. [Google Scholar] [CrossRef]
- Guo, J.; Zhu, X.; Yang, Y.; Yang, F.; Lei, Z.; Li, S.Z. Towards fast, accurate and stable 3d dense face alignment. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 152–168. [Google Scholar]
- Zhu, X.; Liu, X.; Lei, Z.; Li, S.Z. Face alignment in full pose range: A 3d total solution. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 78–92. [Google Scholar] [CrossRef] [PubMed]
- Koestinger, M.; Wohlhart, P.; Roth, P.M.; Bischof, H. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 2144–2151. [Google Scholar]
- Burgos-Artizzu, X.P.; Perona, P.; Dollár, P. Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1513–1520. [Google Scholar]
- Wu, W.; Qian, C.; Yang, S.; Wang, Q.; Cai, Y.; Zhou, Q. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2129–2138. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.L.; Yong, M.; Lee, J.; et al. MediaPipe: A framework for perceiving and processing reality. In Proceedings of the Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR) 2019, Long Beach, CA, USA, 17 June 2019. [Google Scholar]
- Hu, G.; Xiao, Y.; Cao, Z.; Meng, L.; Fang, Z.; Zhou, J.T.; Yuan, J. Towards real-time eyeblink detection in the wild: Dataset, theory and practices. IEEE Trans. Inf. Forensics Secur. 2019, 15, 2194–2208. [Google Scholar] [CrossRef]
- Song, F.; Tan, X.; Liu, X.; Chen, S. Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients. Pattern Recognit. 2014, 47, 2825–2838. [Google Scholar] [CrossRef]
- Jesorsky, O.; Kirchberg, K.J.; Frischholz, R.W. Robust face detection using the hausdorff distance. In Proceedings of the Audio-and Video-Based Biometric Person Authentication: Third International Conference, AVBPA 2001, Halmstad, Sweden, 6–8 June 2001; Proceedings 3. Springer: Cham, Switzerland, 2001; pp. 90–95. [Google Scholar]
- Martinez, A.; Benavente, R. The ar Face Database: Cvc Technical Report No. 24. 1998. Available online: https://portalrecerca.uab.cat/en/publications/the-ar-face-database-cvc-technical-report-24 (accessed on 8 May 2024).
- Gao, W.; Cao, B.; Shan, S.; Chen, X.; Zhou, D.; Zhang, X.; Zhao, D. The CAS-PEAL large-scale Chinese face database and baseline evaluations. IEEE Trans. Syst. Man Cybern. Part Syst. Hum. 2007, 38, 149–161. [Google Scholar]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17–20 October 2008. [Google Scholar]
- Duta, I.C.; Liu, L.; Zhu, F.; Shao, L. Improved Residual Networks for Image and Video Recognition. arXiv 2020, arXiv:2004.04989. [Google Scholar]
- Andriyanov, N.; Dementev, V.; Vasiliev, K.; Tashlinskii, A. Investigation of methods for increasing the efficiency of convolutional neural networks in identifying tennis players. Pattern Recognit. Image Anal. 2021, 31, 496–505. [Google Scholar] [CrossRef]
- Wu, W.; Peng, H.; Yu, S. Yunet: A tiny millisecond-level face detector. Mach. Intell. Res. 2023, 20, 656–665. [Google Scholar] [CrossRef]
- Lynch, K.M.; Park, F.C. Modern Robotics; Cambridge University Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Sengupta, S.; Chen, J.C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
- Moschoglou, S.; Papaioannou, A.; Sagonas, C.; Deng, J.; Kotsia, I.; Zafeiriou, S. Agedb: The first manually collected, in-the-wild age database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 51–59. [Google Scholar]
- Sun, Y.; Cheng, C.; Zhang, Y.; Zhang, C.; Zheng, L.; Wang, Z.; Wei, Y. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6398–6407. [Google Scholar]
- Huang, Y.; Wang, Y.; Tai, Y.; Liu, X.; Shen, P.; Li, S.; Li, J.; Huang, F. Curricularface: Adaptive curriculum learning loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5901–5910. [Google Scholar]
- Meng, Q.; Zhao, S.; Huang, Z.; Zhou, F. Magface: A universal representation for face recognition and quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14225–14234. [Google Scholar]
- Ruiz, N.; Chong, E.; Rehg, J.M. Fine-grained head pose estimation without keypoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2074–2083. [Google Scholar]
- Hsu, H.W.; Wu, T.Y.; Wan, S.; Wong, W.H.; Lee, C.Y. Quatnet: Quaternion-based head pose estimation with multiregression loss. IEEE Trans. Multimed. 2018, 21, 1035–1046. [Google Scholar] [CrossRef]
- Bhagavatula, C.; Zhu, C.; Luu, K.; Savvides, M. Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3980–3989. [Google Scholar]
- Feng, Y.; Wu, F.; Shao, X.; Wang, Y.; Zhou, X. Joint 3D face reconstruction and dense alignment with position map regression network. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018. [Google Scholar]
Dataset | # Images | # IDs | Image/ID | # Landmarks | Pose |
---|---|---|---|---|---|
CASIA-WebFace [32] | 0.5 M | 10 K | 47 | - | |
VGGFace2 [28] | 3.3 M | 9 K | 363 | - | |
MS1MV3 [29] | 5.2 M | 93 K | 56 | - | |
MegaFace2 [34] | 4.7 M | 0.6 M | 7 | - | |
WebFace260M [35] | 260 M | 4 M | 21 | - | |
300W [30] | 3.8 K | - | - | 68 | |
AFLW [55] | 20 K | - | - | 21 | |
COFW [56] | 1.3 K | - | - | 29 | |
WFLW [57] | 7.5 K | - | - | 98 | |
BIWI [42] | 15 K | - | - | - | ✓ |
CMU-Panoptic [43] | 1.3 M | - | - | - | ✓ |
300W-LP [22] | 61 K | - | - | 68 | ✓ |
Dataset | # of ID | # of Images | # of Verification Pairs |
---|---|---|---|
LFW [65] | 5.7 K | 13 K | 6 K |
CFP-FP [70] | 500 | 7 K | 7 K |
AgeDB [71] | 568 | 16.5 K | 6 K |
IJB-C [31] | 3.5 K | 148.8 K | 15 M |
Method | Dataset | LFW | CFP-FP | AgeDB | IJB-C | FPS |
---|---|---|---|---|---|---|
SphereFace [37] | CASIA | 99.42 | - | - | - | - |
CosFace [38] | CASIA | 99.73 | - | - | - | - |
SphereFace (Re-Imp) | MS1MV2 | 99.67 | 98.46 | 98.17 | 91.77 | - |
CosFace (Re-Imp) | MS1MV2 | 99.78 | 98.26 | 98.17 | 95.56 | - |
CircleLoss [72] | MS1M | 99.73 | 96.02 | - | 93.95 | - |
CurricularFace [73] | MS1MV2 | 99.80 | 98.37 | 98.32 | 96.10 | 329.8 |
ArcFace [4] | MS1MV2 | 99.82 | 98.49 | 98.05 | 96.03 | 331.5 |
ArcFace [4] | MS1MV3 | 99.83 | 99.03 | 98.17 | 96.5 | 331.5 |
MagFace [74] | MS1MV2 | 99.83 | 98.46 | 98.17 | 95.97 | 254.3 |
Partial FC [5] | MS1MV3 | 99.85 | 98.7 | 98.11 | 96.08 | 255.1 |
SwinFace [21] | MS1MV2 | 99.87 | 98.60 | 98.15 | 96.73 | 70.3 |
Ours (pretrained)—ResNet50 | VGGFace2 | 99.33 | 92.38 | 94.20 | 88.05 | 321.7 |
Ours (pretrained)—ResNet50 | MS1MV3 | 99.85 | 98.70 | 98.11 | 96.08 | 321.7 |
Ours (STL)—IResNet50 | PLTD | 99.72 | 97.73 | 96.90 | 94.05 | 253.2 |
Ours (MTL)—IResNet50-Multi | PLTD | 99.70 | 97.26 | 96.53 | 94.17 | 221.2 |
Ours (MTL)—IResNet100-Multi | PLTD | 99.68 | 97.22 | 96.29 | 94.09 | 112.0 |
Training Dataset (PLTD) | IJB-C (TAR@FAR) | |||||
---|---|---|---|---|---|---|
Ours (STL) | All Cases | 90.73 | 94.05 | 96.28 | 97.89 | 98.95 |
Soft Case | 89.03 | 92.5 | 94.94 | 96.77 | 98.23 | |
Medium Case | 76.41 | 84.19 | 89.31 | 93.51 | 96.47 | |
Hard Case | 55.34 | 68.27 | 77.64 | 85.96 | 92.9 | |
Ours (MTL) | All Cases | 89.86 | 94.17 | 96.6 | 97.95 | 98.95 |
Soft Case | 87.18 | 92.42 | 95.26 | 96.96 | 98.43 | |
Medium Case | 77.21 | 86.03 | 91.27 | 94.57 | 96.99 | |
Hard Case | 56.64 | 71.48 | 81.49 | 88.62 | 94.23 |
Method | Yaw | Pitch | Roll | MAE | FPS |
---|---|---|---|---|---|
FAN (12 points) [48] | 6.36 | 12.3 | 8.71 | 9.12 | - |
HopeNet [75] | 6.47 | 6.56 | 5.44 | 6.16 | 323.2 |
FSANet [44] | 4.5 | 6.08 | 4.64 | 5.07 | 389.6 |
3DDFA-TPAMI [54] | 4.33 | 5.98 | 4.30 | 4.87 | - |
3DDFA-V2 [53] | 4.06 | 5.26 | 3.48 | 4.27 | - |
QuatNet [76] | 3.97 | 5.62 | 3.92 | 4.15 | - |
TriNet [45] | 4.2 | 5.77 | 4.04 | 3.97 | 311.3 |
RankPose [46] | 2.99 | 4.75 | 3.25 | 3.66 | 291.7 |
SynergyNet [7] | 3.42 | 4.09 | 2.55 | 3.35 | 128.1 |
Ours (Pretrained) | 2.89 | 4.77 | 3.35 | 3.67 | 320.8 |
Ours (MTL) | 2.80 | 4.27 | 2.9 | 3.32 | 221.2 |
Method | Yaw | Pitch | Roll | MAE | FPS |
---|---|---|---|---|---|
3D-FAN [48] | 8.53 | 7.48 | 7.63 | 7.89 | - |
HopeNet [75] | 4.81 | 6.61 | 3.27 | 4.90 | 323.2 |
FSANet [44] | 4.27 | 4.96 | 2.76 | 4.00 | 389.6 |
QuatNet [76] | 4.01 | 5.49 | 2.94 | 4.15 | - |
TriNet [45] | 4.11 | 4.76 | 3.05 | 3.97 | 311.3 |
RankPose [46] | 3.59 | 4.77 | 2.76 | 3.71 | 291.7 |
Ours (pretrained) | 4.34 | 5.18 | 2.61 | 4.04 | 320.8 |
Ours (MTL) | 3.23 | 5.03 | 2.36 | 3.54 | 221.2 |
Method | Mean | FPS | |||
---|---|---|---|---|---|
3DSTN [77] | 3.15 | 4.33 | 5.98 | 4.49 | - |
3D-FAN [48] | 3.16 | 3.53 | 4.6 | 3.76 | 84.3 |
PRNet [78] | 2.75 | 3.51 | 4.61 | 3.62 | 261.3 |
3DDFA-PAMI [54] | 2.84 | 3.57 | 4.96 | 3.79 | - |
3DDFA-v2 [53] | 2.63 | 3.42 | 4.48 | 3.51 | - |
SynergyNet [7] | 2.65 | 3.30 | 4.27 | 3.41 | 128.1 |
Ours (pretrained) | 3.08 | 3.74 | 4.53 | 3.78 | 322.5 |
Ours (MTL) | 2.62 | 3.45 | 4.51 | 3.53 | 221.2 |
Method | FR | HPE | FLD | IJB-C | AFLW2000-3D (MSE) | AFLW2000-3D (MAE) | # Params | FPS |
---|---|---|---|---|---|---|---|---|
STL | ✓ | - | - | 94.05 | - | - | 24.6M | 321.7 |
STL | - | ✓ | - | - | 3.67 | - | 23.5M | 320.9 |
STL | - | - | ✓ | - | - | 3.78 | 23.8M | 322.5 |
MTL | ✓ | ✓ | - | 94.16 | 3.54 | - | 56.8M | 243.8 |
MTL | ✓ | - | ✓ | 94.17 | - | 3.34 | 60.1M | 237.9 |
MTL | - | ✓ | ✓ | - | 3.52 | 3.33 | 47.3M | 252.7 |
MTL | ✓ | ✓ | ✓ | 94.17 | 3.53 | 3.32 | 73.3M | 221.2 |
Method | IJB-C | AFLW2000-3D (NMSE) | AFLW2000-3d (MAE) | BIWI (MAE) |
---|---|---|---|---|
) | 94.16 | 3.55 | 3.58 | 3.70 |
) | 94.17 | 3.56 | 3.67 | 3.81 |
) | 94.17 | 3.53 | 3.32 | 3.54 |
IJB-C () | AFLW2000-3D (NMSE) | AFLW2000-3D (MAE) | Loss Convergence Ratio | ||
---|---|---|---|---|---|
0.1 | 0.1 | 94.22 | 5.12 | 4.14 | 0.98:0.00:0.02 |
0.1 | 1 | 94.21 | 4.77 | 3.42 | 0.86:0.00:0.14 |
0.1 | 10 | 94.22 | 4.56 | 3.35 | 0.37:0.00:0.63 |
0.1 | 100 | 94.23 | 4.98 | 3.35 | 0.08:0.00:0.92 |
1 | 0.1 | 94.20 | 4.61 | 3.98 | 0.96:0.01:0.03 |
1 | 1 | 94.19 | 4.21 | 3.41 | 0.85:0.01:0.14 |
1 | 10 | 94.12 | 4.37 | 3.39 | 0.37:0.01:0.62 |
1 | 100 | 94.18 | 4.27 | 3.38 | 0.08:0.01:0.91 |
10 | 0.1 | 94.88 | 3.99 | 3.94 | 0.94:0.05:0.02 |
10 | 1 | 94.19 | 3.65 | 3.54 | 0.82:0.04:0.14 |
10 | 10 | 94.17 | 3.61 | 3.33 | 0.35:0.02:0.63 |
10 | 100 | 94.17 | 3.60 | 3.33 | 0.06:0.00:0.93 |
100 | 0.1 | 94.18 | 3.74 | 3.99 | 0.67:0.32:0.01 |
100 | 1 | 94.19 | 3.55 | 3.59 | 0.61:0.29:0.10 |
100 | 10 | 94.17 | 3.53 | 3.32 | 0.31:0.15:0.54 |
100 | 100 | 94.17 | 3.54 | 3.32 | 0.04:0.03:0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, Y.; Jang, S.; Bae, H.B.; Jeon, T.; Lee, S. Multitask Learning Strategy with Pseudo-Labeling: Face Recognition, Facial Landmark Detection, and Head Pose Estimation. Sensors 2024, 24, 3212. https://doi.org/10.3390/s24103212
Lee Y, Jang S, Bae HB, Jeon T, Lee S. Multitask Learning Strategy with Pseudo-Labeling: Face Recognition, Facial Landmark Detection, and Head Pose Estimation. Sensors. 2024; 24(10):3212. https://doi.org/10.3390/s24103212
Chicago/Turabian StyleLee, Yongju, Sungjun Jang, Han Byeol Bae, Taejae Jeon, and Sangyoun Lee. 2024. "Multitask Learning Strategy with Pseudo-Labeling: Face Recognition, Facial Landmark Detection, and Head Pose Estimation" Sensors 24, no. 10: 3212. https://doi.org/10.3390/s24103212
APA StyleLee, Y., Jang, S., Bae, H. B., Jeon, T., & Lee, S. (2024). Multitask Learning Strategy with Pseudo-Labeling: Face Recognition, Facial Landmark Detection, and Head Pose Estimation. Sensors, 24(10), 3212. https://doi.org/10.3390/s24103212