A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition
Abstract
:1. Introduction
- To determine which techniques have been applied in the face recognition domain.
- To identify which databases of face recognition are most common.
- To find out which areas have adopted face recognition.
- To assess and identify suitable evaluation metrics to use when comparative studies are carried out in the field of face recognition.
2. Face Recognition History
- 1964: American researchers investigated computer programming for face recognition. They envisioned a semi-automated process in which users input twenty computer measurements, such the length of the mouth or the width of the eyes [9]. After that, the computer would automatically compare the distances shown in each picture, determine how much the distances differed, and provide a potential match from closed records [10].
- 1970: Takeo Kanade introduced a facial recognition system that considered the spacing between facial features to identify anatomical elements, including features like the chin. Subsequent trials showed that the system’s ability to accurately recognize face characteristics was not always consistent. Yet, as curiosity about the topic increased, Kanade produced the first comprehensive book on face recognition technology in 1977 [11].
- 1990: Research on face recognition increased dramatically due to advancements in technology and the growing significance of applications connected to security [5].
- 1993: The Defense Advanced Research Project Agency (DARPA) and the Army Research Laboratory (ARL) launched the Face Recognition Technology Program (FERET) with the goal of creating “automatic face recognition capabilities” that could be used in a real-world setting to productively support law enforcement, security, and intelligence personnel in carrying out their official duties [13].
- 1997: The PCA Eigenface technique of face recognition was refined by employing linear discriminant analysis (LDA) to generate Fisherfaces [14].
- 2000s: Hand-crafted features such as Gabor features [14,15], local binary patterns (LBPs), and variations became popular for face recognition. The Viola–Jones object identification framework for faces was developed in 2001, making it feasible to recognize faces in real time from video material [16].
- 2011: Deep learning, a machine learning technology based on artificial neural networks [17], accelerated everything. The computer chooses which points to compare: it learns faster when more photos are provided. Studies aimed to enhance the performance of existing approaches by exploring novel arcface loss functions.
- 2015: The Viola–Jones method was implemented on portable devices and embedded systems employing tiny low-power detectors. As a result, the Viola–Jones method has been utilized to enable new features in user interfaces and teleconferencing, not just broadening the practical use of face recognition systems [18].
- 2022: Ukraine is utilizing Clearview AI face recognition software from the United States to identify deceased Russian servicemen. Ukraine has undertaken 8600 searches and identified the families of 582 Russian troops who died in action [19].
3. Application of Face Recognition
3.1. Security and Surveillance
3.2. Law Enforcement
3.3. Healthcare
3.4. Access Control
3.5. Automotive Industry
4. Face Recognition Systems
4.1. Face Recognition Systems Traditionally Consist of Four Main Stages
- The face is captured in an input photograph or video.
- Pre-processing is the process of applying several techniques to an image or video, such as alignment, noise reduction, contrast enhancement, or video frame selection.
- Extracting facial features from a picture or video. Holistic, model-based, or texture-based feature extraction approaches are used in image-based methods, whilst set-based or sequence-based approaches are used in video-based methods.
- Face matching is performed with a database of stored images. If the image exists, it will be matched and if it does not exist, there will not be any match.
4.1.1. Face Detection
4.1.2. Face Preprocessing
4.1.3. Face Extraction
4.1.4. Face Matching
5. Methodology
5.1. Data Selection
5.1.1. Inclusion and Exclusion Criteria
5.1.2. Search Strategy
- “Facial recognition”;
- “Face recognition technology”;
- “Biometric identification”;
- “Deep learning in facial recognition”;
- “Privacy and facial recognition”;
- “Bias in facial recognition”;
- “Face recognition applications”;
- “Ethics of facial recognition”;
- “Convolutional Neural Networks for facial recognition”;
- “Deep learning”.
(“Facial recognition” OR “Face recognition technology” OR “Biometric identification”) AND (“Deep learning” OR “Deep learning in facial recognition” OR “Convolutional Neural Networks”) AND (“Privacy and facial recognition” OR “Bias in facial recognition” OR “Ethics of facial recognition” OR “Face recognition applications”).
5.1.3. Screening
5.1.4. Eligibility and Inclusion of Articles
5.1.5. Quality Assessment Rules (QAR)
6. Databases
6.1. ORL Database
6.2. FERET Database
6.3. Yale Face Database
6.4. AR Database
6.5. CVL Database
6.6. XM2VTS Databases
6.7. BANCA Database
6.8. FRGC (Face Recognition Grand Challenge) Database
6.9. LFW (Labeled Faces in the Wild) Database
6.10. The MUCT Face Database
6.11. CMU Multi-PIE Database
6.12. CASIA-Webface Database
6.13. IARPA Janus Benchmark-A Database
6.14. Megaface
6.15. IARPA Janus Benchmark-B Database
6.16. VGGFACE Database
6.17. CFP Database
6.18. Ms-Celeb-M1 Database
6.19. DMFD Database
6.20. VGGFACE 2 Database
6.21. IARPA Janus Benchmark C Database
6.22. MF2 Database
6.23. DFW (Disguised Faces in the Wild) Database
6.24. LFR Database
6.25. CASIA-Mask Database
7. Face Recognition Methods
7.1. Traditional
7.1.1. Principal Component Analysis (PCA)
7.1.2. Gabor Filter
7.1.3. Viola–Jones Object Detection Framework
7.1.4. Support Vector Machine (SVM)
7.1.5. Histogram of Oriented Gradients (HOG)
7.2. Deep Learning
7.2.1. AlexNet
7.2.2. VGGNet
7.2.3. ResNet
7.2.4. FaceNet
7.2.5. LBPNet
7.2.6. Lightweight Convolutional Neural Network (LWCNN)
7.2.7. YOLO
7.2.8. MTCNN
7.2.9. DeepMaskNet
7.2.10. DenseNet
7.2.11. MobileNetV2
7.2.12. MobileFaceNets
7.2.13. Vision Transformer (ViT)
7.2.14. Face Transformer for Recognition Model
7.2.15. DeepFace
7.2.16. Attention Mechanism
7.2.17. Swin Face Recognition
8. Performance Measures
8.1. Accuracy
8.2. Precision
8.3. Recall
8.4. F1-Score
8.5. Sensitivity
8.6. Specificity
8.7. AUC (Area Under the Curve)
8.8. ROC Curve (Receiver Operating Characteristic Curve)
9. Face Recognition Loss Functions
9.1. Softmax Cross-Entropy Loss
9.2. Triplet Loss
9.3. Center Loss
9.4. ArcFace Loss
9.5. Contrastive Loss
9.6. Margin Cosine Loss
10. Recent Applications of CNN Architectures
11. Limitations of Face Recognition Models
12. Challenges, Results, and Discussion
13. Conclusions
13.1. Observations
- More sophisticated deep learning architectures like FaceNet, VGG16, and MTCNN have clearly replaced previous, simpler models like SVM and Gabor Wavelets. Newer models increase scalability and accuracy, demonstrating the field of face recognition’s increasing complexity.
- There are several CNN designs, databases, and performance measures, as we have shown. In the face recognition field, several CNN architectures are applied to different tasks. Certain designs work well in scenarios such as low-quality images or videos and frontal face recognition. There were more images than videos in most of the datasets used to train and evaluate face recognition algorithms. Most researchers, we also noticed, employed Softmax verification measures.
- We observed that most of the databases used for training and testing face recognition models had less black people in comparison to other races. The lack of balance in the datasets leads to bias. The issue of privacy surrounding cameras is still being debated in the US [8,120]. One aspect of the discussion is the inability of facial recognition software to distinguish between black and white faces, resulting in racial profiling and erroneous arrests [8,120].
- Occlusion, camera angle, stance, and lighting are all issues that persist [8]. Researchers are attempting to devise solutions to these challenges; however, the varying resolutions provided by the source of images or videos, such as CCTV, creates additional issues. Intelligent and automated face recognition systems work well in controlled contexts, but badly in uncontrolled ones [121]. The two main causes of this are facial alignment and the use of high- or low-resolution face images taken in controlled settings by researchers for training.
- Models trained on bigger and more diversified datasets, like LFW and VGGFace, often perform better in real-world, large-scale recognition tasks. The comparison of simpler datasets such as ORL and more complex datasets such as LFW indicates that dataset size and variety are important variables in the performance of face recognition systems.
- Little research has been conducted using Transformers and mamba for face recognition.
- Most researchers use accuracy as a performance metric. The use of accuracy as a performance measurement without comparing it to other performance metrics raises questions about the validity of results.
- The use of several models for face detection and recognition, as demonstrated in MTCNN, highlights the potential advantages of hybrid and ensemble techniques. Research into more advanced hybrid models might increase the robustness and adaptability of face recognition systems.
13.2. Contribution of Article
13.3. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MDPI | Multidisciplinary Digital Publishing Institute |
DOAJ | Directory of open-access journals |
TLA | Three-letter acronym |
LD | Linear dichroism |
CNN | Convolutional neural networks |
VGG | Directory of open-access journals |
ORL | Olivetti Research Laboratory |
FERET | Facial Recognition Technology |
VGG16 | Visual Geometry Group 16 |
LFW | Labeled Faces in the Wild |
VGGFace | Visual Geometry Group Face |
MTCNN | Multi-Task Cascaded Convolutional Networks |
FGRC | Face Recognition Grand Challenge |
CASIA | Chinese Academy of Sciences Institute of Automation |
IARPA | Intelligence Advanced Research Projects Activity |
DMFD | Dynamic Multi-Factor Database |
PCA | Principal Component Analysis |
SVM | Support Vector Machine |
HOG | Histogram of Oriented Gradients |
AlexNet | AlexanderNet (CNN Architecture) |
VGGNet | Visual Geometry Group Network (CNN Architecture) |
ResNet | Residual Network |
FaceNet | Face Recognition Network |
LBPNet | Local Binary Patterns Network |
LWCNN | Lightweight Convolutional Neural Network |
YOLO | You Only Look Once |
ViT | Vision Transformer |
AR | Affective Computing (AR) |
CVL | Computer Vision Laboratory |
AUC | Area Under the Curve |
GAN | Generative adversarial networks |
QAR | Quality Assessment Rules |
CBAM | Convolutional Block Attention Module |
DCR | Deep Coupled ResNet |
FGPA | Field Programmable Gate Array |
CPU | Central Processing Unit (CPU) |
GPU | Graphics Processing Unit |
TPU | Tensor Processing Unit |
MHz | Megahertz |
FPS | Frames Per Second |
References
- Junayed, M.S.; Sadeghzadeh, A.; Islam, M.B. Deep covariance feature and cnn-based end-to-end masked face recognition. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 15–18 December 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
- Chien, J.T.; Wu, C.C. Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1644–1649. [Google Scholar] [CrossRef]
- Wan, L.; Liu, N.; Huo, H.; Fang, T. Face recognition with convolutional neural networks and subspace learning. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; IEEE: New York, NY, USA, 2017; pp. 228–233. [Google Scholar]
- Jain, A.K.; Ross, A.A.; Nandakumar, K. Introduction to Biometrics; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Taskiran, M.; Kahraman, N.; Erdem, C.E. Face recognition: Past, present and future (a review). Digit. Signal Process. 2020, 106, 102809. [Google Scholar] [CrossRef]
- Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face Recognition Systems: A Survey. Sensors 2020, 20, 342. [Google Scholar] [CrossRef]
- Saini, R.; Rana, N. Comparison of various biometric methods. Int. J. Adv. Sci. Technol. 2014, 2, 24–30. [Google Scholar]
- Nemavhola, A.; Viriri, S.; Chibaya, C. A Scoping Review of Literature on Deep Learning Techniques for Face Recognition. Hum. Behav. Emerg. Technol. 2025, 2025, 5979728. [Google Scholar] [CrossRef]
- Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, present, and future of face recognition: A review. Electronics 2020, 9, 1188. [Google Scholar] [CrossRef]
- Nilsson, N.J. The Quest for Artificial Intelligence; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- de Leeuw, K.M.M.; Bergstra, J. The History of Information Security: A Comprehensive Handbook; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
- Turk, M.; Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci. 1991, 3, 71–86. [Google Scholar] [CrossRef]
- Gates, K.A. Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance; NYU Press: New York, NY, USA, 2011; Volume 2. [Google Scholar]
- King, I. Neural Information Processing: 13th International Conference, ICONIP 2006, Hong Kong, China, 3–6 October 2006: Proceedings; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Xue-Fang, L.; Tao, P. Realization of face recognition system based on Gabor wavelet and elastic bunch graph matching. In Proceedings of the 2013 25th Chinese Control and Decision Conference (CCDC), Guiyang, China, 25–27 May 2013; IEEE: New York, NY, USA, 2013; pp. 3384–3386. [Google Scholar]
- Kundu, M.K.; Mitra, S.; Mazumdar, D.; Pal, S.K. Perception and Machine Intelligence: First Indo-Japan Conference, PerMIn 2012, Kolkata, India, 12–13 January 2011, Proceedings; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 7143. [Google Scholar]
- Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019, 189, 102805. [Google Scholar] [CrossRef]
- Datta, A.K.; Datta, M.; Banerjee, P.K. Face Detection and Recognition: Theory and Practice; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
- Gofman, M.I.; Villa, M. Identity and War: The Role of Biometrics in the Russia-Ukraine Crisis. Int. J. Eng. Sci. Technol. (IJonEST) 2023, 5, 89. [Google Scholar] [CrossRef]
- Barnouti, N.H.; Al-Dabbagh, S.S.M.; Matti, W.E. Face recognition: A literature review. Int. J. Appl. Inf. Syst. 2016, 11, 21–31. [Google Scholar] [CrossRef]
- Lal, M.; Kumar, K.; Arain, R.H.; Maitlo, A.; Ruk, S.A.; Shaikh, H. Study of face recognition techniques: A survey. Int. J. Adv. Comput. Sci. Appl. 2018, 1–8. [Google Scholar] [CrossRef]
- Shamova, U. Face Recognition in Healthcare: General Overview. Язык в сфере прoфессиoнальнoй кoммуникации.—Екатеринбург. 2020, pp. 748–752. Available online: https://elar.urfu.ru/handle/10995/84113 (accessed on 17 March 2024).
- Elngar, A.A.; Kayed, M. Vehicle security systems using face recognition based on internet of things. Open Comput. Sci. 2020, 10, 17–29. [Google Scholar] [CrossRef]
- Xing, J.; Fang, G.; Zhong, J.; Li, J. Application of face recognition based on CNN in fatigue driving detection. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, Dublin, Ireland, 17–19 October 2019; pp. 1–5. [Google Scholar]
- Pabiania, M.D.; Santos, K.A.P.; Villa-Real, M.M.; Villareal, J.A.N. Face recognition system for electronic medical record to access out-patient information. J. Teknol. 2016, 78. [Google Scholar] [CrossRef]
- Aswis, A.; Morsy, M.; Abo-Elsoud, M. Face Recognition Based on PCA and DCT Combination Technique. Int. J. Eng. Res. Technol. 2018, 4, 1295–1298. [Google Scholar]
- Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J.C.; Patel, V.M.; Castillo, C.D.; Chellappa, R. Deep learning for understanding faces: Machines may be just as good, or better, than humans. IEEE Signal Process. Mag. 2018, 35, 66–83. [Google Scholar] [CrossRef]
- Loy, C.C. Face detection. In Computer Vision: A Reference Guide; Springer: Berlin/Heidelberg, Germany, 2021; pp. 429–434. [Google Scholar]
- Yang, M.H. Face Detection. In Encyclopedia of Biometrics; Li, S.Z., Jain, A.K., Eds.; Springer: Boston, MA, USA, 2015; pp. 447–452. [Google Scholar] [CrossRef]
- Calvo, G.; Baruque, B.; Corchado, E. Study of the pre-processing impact in a facial recognition system. In Proceedings of the Hybrid Artificial Intelligent Systems: 8th International Conference, HAIS 2013, Salamanca, Spain, 11–13 September 2013; Proceedings 8. Springer: Berlin/Heidelberg, Germany, 2013; pp. 334–344. [Google Scholar]
- Benedict, S.R.; Kumar, J.S. Geometric shaped facial feature extraction for face recognition. In Proceedings of the 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, 24 October 2016; pp. 275–278. [Google Scholar] [CrossRef]
- Napoléon, T.; Alfalou, A. Pose invariant face recognition: 3D model from single photo. Opt. Lasers Eng. 2017, 89, 150–161. [Google Scholar] [CrossRef]
- Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.; Horsley, T.; Weeks, L.; et al. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]
- Phillips, P.; Moon, H.; Rizvi, S.; Rauss, P. The FERET evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1090–1104. [Google Scholar] [CrossRef]
- Belhumeur, P.; Hespanha, J.; Kriegman, D. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17–20 October 2008. [Google Scholar]
- Zou, J.; Ji, Q.; Nagy, G. A comparative study of local matching approach for face recognition. IEEE Trans. Image Process. 2007, 16, 2617–2628. [Google Scholar] [CrossRef]
- Peer, P. CVL Face Database; University of Ljubljana: Ljubljana, Slovenia, 2010. [Google Scholar]
- Fox, N.; Reilly, R.B. Audio-visual speaker identification based on the use of dynamic audio and visual features. In Proceedings of the International Conference on Audio-and Video-Based Biometric Person Authentication, Guildford, UK, 9–11 June 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 743–751. [Google Scholar]
- Bailly-Bailliére, E.; Bengio, S.; Bimbot, F.; Hamouz, M.; Kittler, J.; Mariéthoz, J.; Matas, J.; Messer, K.; Popovici, V.; Porée, F.; et al. The BANCA database and evaluation protocol. In Proceedings of the Audio-and Video-Based Biometric Person Authentication: 4th International Conference, AVBPA 2003, Guildford, UK, 9–11 June 2003; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2003; pp. 625–638. [Google Scholar]
- Phillips, P.; Flynn, P.; Scruggs, T.; Bowyer, K.; Chang, J.; Hoffman, K.; Marques, J.; Min, J.; Worek, W. Overview of the face recognition grand challenge. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 947–954. [Google Scholar] [CrossRef]
- Milborrow, S.; Morkel, J.; Nicolls, F. The MUCT Landmarked Face Database. In Pattern Recognition Association of South Africa; 2010; Available online: http://www.milbo.org/muct (accessed on 12 April 2024).
- Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-PIE. Image Vis. Comput. 2013, 28, 807–813. [Google Scholar] [CrossRef]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Klare, B.F.; Klein, B.; Taborsky, E.; Blanton, A.; Cheney, J.; Allen, K.; Grother, P.; Mah, A.; Jain, A.K. Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1931–1939. [Google Scholar]
- Kemelmacher-Shlizerman, I.; Seitz, S.M.; Miller, D.; Brossard, E. The MegaFace Benchmark: 1 Million Faces for Recognition at Scale. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4873–4882. [Google Scholar] [CrossRef]
- Whitelam, C.; Taborsky, E.; Blanton, A.; Maze, B.; Adams, J.; Miller, T.; Kalka, N.; Jain, A.K.; Duncan, J.A.; Allen, K.; et al. Iarpa janus benchmark-b face dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 90–98. [Google Scholar]
- Parkhi, O.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the BMVC 2015-Proceedings of the British Machine Vision Conference, Swansea, UK, 7–10 September 2015; British Machine Vision Association: Durham, UK, 2015. [Google Scholar]
- Sengupta, S.; Chen, J.C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; IEEE: New York, NY, USA, 2016; pp. 1–9. [Google Scholar]
- Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 87–102. [Google Scholar]
- Al-ghanim, F.; Aljuboori, A. Face Recognition with Disguise and Makeup Variations Using Image Processing and Machine Learning. In Proceedings of the Advances in Computing and Data Sciences: 5th International Conference, ICACDS 2021, Nashik, India, 23–24 April 2021; pp. 386–400. [Google Scholar] [CrossRef]
- Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. Vggface2: A dataset for recognising faces across pose and age. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; IEEE: New York, NY, USA, 2018; pp. 67–74. [Google Scholar]
- Maze, B.; Adams, J.; Duncan, J.A.; Kalka, N.; Miller, T.; Otto, C.; Jain, A.K.; Niggel, W.T.; Anderson, J.; Cheney, J.; et al. Iarpa janus benchmark-c: Face dataset and protocol. In Proceedings of the 2018 International Conference on Biometrics (ICB), Gold Coast, Australia, 20–23 February 2018; IEEE: New York, NY, USA, 2018; pp. 158–165. [Google Scholar]
- Nech, A.; Kemelmacher-Shlizerman, I. Level Playing Field for Million Scale Face Recognition. arXiv 2017, arXiv:1705.00393. [Google Scholar]
- Kushwaha, V.; Singh, M.; Singh, R.; Vatsa, M.; Ratha, N.K.; Chellappa, R. Disguised Faces in the Wild. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1–18. [Google Scholar]
- Elharrouss, O.; Almaadeed, N.; Al-Maadeed, S. LFR face dataset:Left-Front-Right dataset for pose-invariant face recognition in the wild. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qatar, 2–5 February 2020; pp. 124–130. [Google Scholar] [CrossRef]
- Ayad, W.; Qays, S.; Al-Naji, A. Generating and Improving a Dataset of Masked Faces Using Data Augmentation. J. Tech. 2023, 5, 46–51. [Google Scholar] [CrossRef]
- Gottumukkal, R.; Asari, V.K. An improved face recognition technique based on modular PCA approach. Pattern Recognit. Lett. 2004, 25, 429–436. [Google Scholar] [CrossRef]
- Yang, J.; Liu, C.; Zhang, L. Color space normalization: Enhancing the discriminating power of color spaces for face recognition. Pattern Recognit. 2010, 43, 1454–1466. [Google Scholar] [CrossRef]
- Ye, J.; Janardan, R.; Li, Q. Two-dimensional linear discriminant analysis. Adv. Neural Inf. Process. Syst. 2004, 17, 1–8. [Google Scholar]
- Rahman, M.T.; Bhuiyan, M.A. Face recognition using Gabor Filters. In Proceedings of the 2008 11th International Conference on Computer and Information Technology, Khulna, Bangladesh, 24–27 December 2008; pp. 510–515. [Google Scholar] [CrossRef]
- Olshausen, B.A.; Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 1996, 381, 607–609. [Google Scholar] [CrossRef]
- Hammouche, R.; Attia, A.; Akhrouf, S.; Akhtar, Z. Gabor filter bank with deep autoencoder based face recognition system. Expert Syst. Appl. 2022, 197, 116743. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. 1–8. [Google Scholar] [CrossRef]
- Ruppert, D. The elements of statistical learning: Data mining, inference, and prediction. J. Am. Stat. Assoc. 2004, 99, 567. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Nice, France, 2012; Volume 25. [Google Scholar]
- Levi, G.; Hassner, T. Age and gender classification using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 34–42. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Srivastava, R.K.; Greff, K.; Schmidhuber, J. Highway networks. arXiv 2015, arXiv:1505.00387. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Xi, M.; Chen, L.; Polajnar, D.; Tong, W. Local binary pattern network: A deep learning approach for face recognition. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: New York, NY, USA, 2016; pp. 3224–3228. [Google Scholar]
- Wu, X.; He, R.; Sun, Z.; Tan, T. A light CNN for deep face representation with noisy labels. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2884–2896. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Kumar, K.K.; Kasiviswanadham, Y.; Indira, D.; Priyanka palesetti, P.; Bhargavi, C. Criminal face identification system using deep learning algorithm multi-task cascade neural network (MTCNN). Mater. Today Proc. 2023, 80, 2406–2410, SI:5 NANO 2021. [Google Scholar] [CrossRef]
- Ullah, N.; Javed, A.; Ali Ghazanfar, M.; Alsufyani, A.; Bourouis, S. A novel DeepMaskNet model for face mask detection and masked facial recognition. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9905–9914. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2019, arXiv:1801.04381. [Google Scholar]
- Chen, S.; Liu, Y.; Gao, X.; Han, Z. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Proceedings of the Biometric Recognition: 13th Chinese Conference, CCBR 2018, Urumqi, China, 11–12 August 2018; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2018; pp. 428–438. [Google Scholar]
- Alexey, D.; Fischer, P.; Tobias, J.; Springenberg, M.R.; Brox, T. Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE TPAMI 2016, 38, 1734–1747. [Google Scholar]
- Zhou, D.; Kang, B.; Jin, X.; Yang, L.; Lian, X.; Jiang, Z.; Hou, Q.; Feng, J. DeepViT: Towards Deeper Vision Transformer. arXiv 2021, arXiv:2103.11886. [Google Scholar]
- Zhong, Y.; Deng, W. Face transformer for recognition. arXiv 2021, arXiv:2103.14803. [Google Scholar]
- Sun, Z.; Tzimiropoulos, G. Part-based face recognition with vision transformers. arXiv 2022, arXiv:2212.00057. [Google Scholar]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar]
- Zhao, H.; Jia, J.; Koltun, V. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10076–10085. [Google Scholar]
- Zhu, B.; Li, L.; Hu, X.; Wu, F.; Zhang, Z.; Zhu, S.; Wang, Y.; Wu, J.; Song, J.; Li, F.; et al. DEFOG: Deep Learning with Attention Mechanism Enabled Cross-Age Face Recognition. Tsinghua Sci. Technol. 2024, 30, 1342–1358. [Google Scholar] [CrossRef]
- Voita, E.; Talbot, D.; Moiseev, F.; Sennrich, R.; Titov, I. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. arXiv 2019, arXiv:1905.09418. [Google Scholar]
- Lin, K.; Li, L.; Lin, C.C.; Ahmed, F.; Gan, Z.; Liu, Z.; Lu, Y.; Wang, L. Swinbert: End-to-end transformers with sparse attention for video captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17949–17958. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Sharma, S.; Shanmugasundaram, K.; Ramasamy, S.K. FAREC—CNN based efficient face recognition technique using Dlib. In Proceedings of the 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), Ramanathapuram, India, 25–27 May 2016; pp. 192–195. [Google Scholar] [CrossRef]
- Arsenovic, M.; Sladojevic, S.; Anderla, A.; Stefanovic, D. FaceTime—Deep learning based face recognition attendance system. In Proceedings of the 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia, 14–16 September 2017; pp. 53–58. [Google Scholar] [CrossRef]
- Al-Azzawi, A.; Hind, J.; Cheng, J. Localized Deep-CNN Structure for Face Recognition. In Proceedings of the 2018 11th International Conference on Developments in eSystems Engineering (DeSE), Cambridge, UK, 2–5 September 2018; pp. 52–57. [Google Scholar] [CrossRef]
- Lu, Z.; Jiang, X.; Kot, A. Deep coupled resnet for low-resolution face recognition. IEEE Signal Process. Lett. 2018, 25, 526–530. [Google Scholar] [CrossRef]
- Qu, X.; Wei, T.; Peng, C.; Du, P. A Fast Face Recognition System Based on Deep Learning. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 8–9 December 2018; Volume 1, pp. 289–292. [Google Scholar] [CrossRef]
- Talab, M.A.; Awang, S.; Najim, S.A.d.M. Super-Low Resolution Face Recognition using Integrated Efficient Sub-Pixel Convolutional Neural Network (ESPCN) and Convolutional Neural Network (CNN). In Proceedings of the 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Selangor, Malaysia, 29 June 2019; pp. 331–335. [Google Scholar] [CrossRef]
- Feng, Y.; Pang, T.; Li, M.; Guan, Y. Small sample face recognition based on ensemble deep learning. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 4402–4406. [Google Scholar] [CrossRef]
- Lin, M.; Zhang, Z.; Zheng, W. A Small Sample Face Recognition Method Based on Deep Learning. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1394–1398. [Google Scholar] [CrossRef]
- Yudita, S.I.; Mantoro, T.; Ayu, M.A. Deep Face Recognition for Imperfect Human Face Images on Social Media using the CNN Method. In Proceedings of the 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), Depok, Indonesia, 14–15 September 2021; pp. 412–417. [Google Scholar] [CrossRef]
- Szmurlo, R.; Osowski, S. Deep CNN ensemble for recognition of face images. In Proceedings of the 2021 22nd International Conference on Computational Problems of Electrical Engineering (CPEE), Hradek u Susice, Czech Republic, 15–17 September 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Wu, C.; Zhang, Y. MTCNN and FACENET based access control system for face detection and recognition. Autom. Control. Comput. Sci. 2021, 55, 102–112. [Google Scholar]
- Sanchez-Moreno, A.S.; Olivares-Mercado, J.; Hernandez-Suarez, A.; Toscano-Medina, K.; Sanchez-Perez, G.; Benitez-Garcia, G. Efficient face recognition system for operating in unconstrained environments. J. Imaging 2021, 7, 161. [Google Scholar] [CrossRef] [PubMed]
- Malakar, S.; Chiracharit, W.; Chamnongthai, K.; Charoenpong, T. Masked Face Recognition Using Principal component analysis and Deep learning. In Proceedings of the 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Online, 19–22 May 2021; pp. 785–788. [Google Scholar] [CrossRef]
- Marjan, M.A.; Hasan, M.; Islam, M.Z.; Uddin, M.P.; Afjal, M.I. Masked Face Recognition System using Extended VGG-19. In Proceedings of the 2022 4th International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE), Rajshahi, Bangladesh, 29–31 December 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Li, Y.; Liu, S.; Yang, J.; Yang, M.H. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919. [Google Scholar]
- Pann, V.; Lee, H.J. Effective attention-based mechanism for masked face recognition. Appl. Sci. 2022, 12, 5590. [Google Scholar] [CrossRef]
- Yuan, L.; Li, F. Face recognition with occlusion via support vector discrimination dictionary and occlusion dictionary based sparse representation classification. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; IEEE: New York, NY, USA, 2016; pp. 110–115. [Google Scholar]
- Deng, W.; Hu, J.; Guo, J. Extended SRC: Undersampled face recognition via intraclass variant dictionary. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1864–1870. [Google Scholar] [CrossRef]
- Alzu’bi, A.; Albalas, F.; Al-Hadhrami, T.; Younis, L.B.; Bashayreh, A. Masked face recognition using deep learning: A review. Electronics 2021, 10, 2666. [Google Scholar] [CrossRef]
- Li, S.; Lee, H.J. Effective Attention-Based Feature Decomposition for Cross-Age Face Recognition. Appl. Sci. 2022, 12, 4816. [Google Scholar] [CrossRef]
- Boutros, F.; Damer, N.; Kirchbuchner, F.; Kuijper, A. Self-restrained triplet loss for accurate masked face recognition. Pattern Recognit. 2022, 124, 108473. [Google Scholar] [CrossRef]
- Deng, H.; Feng, Z.; Qian, G.; Lv, X.; Li, H.; Li, G. MFCosface: A masked-face recognition algorithm based on large margin cosine loss. Appl. Sci. 2021, 11, 7310. [Google Scholar] [CrossRef]
- Wu, G. Masked face recognition algorithm for a contactless distribution cabinet. Math. Probl. Eng. 2021, 2021, 5591020. [Google Scholar] [CrossRef]
- Yanhun, Z.; Chongqing, L. Face recognition based on support vector machine and nearest neighbor classifier. J. Syst. Eng. Electron. 2003, 14, 73–76. [Google Scholar]
- Kepenekci, B. Face Recognition Using Gabor Wavelet Transform. Master’s Thesis, Middle East Technical University, Ankara, Turkey, 2001. [Google Scholar]
- Lou, G.; Shi, H. Face image recognition based on convolutional neural network. China Commun. 2020, 17, 117–124. [Google Scholar] [CrossRef]
- Mahesh, S.; Ramkumar, G. Smart Face Detection and Recognition in Illumination Invariant Images using AlexNet CNN Compare Accuracy with SVM. In Proceedings of the 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM), London, UK, 27–29 April 2022; pp. 572–575. [Google Scholar] [CrossRef]
- Garvie, C.; Bedoya, A.; Frankle, J. Unregulated Police Face Recognition in America. Perpetual Line Up. 2016. Available online: https://www.perpetuallineup.org/ (accessed on 17 March 2024).
- Korshunov, P.; Marcel, S. DeepFakes: A New Threat to Face Recognition? Assessment and Detection. arXiv 2018, arXiv:1812.08685. [Google Scholar]
- Sikhakhane, N. Joburg hostels and Townships coming under surveillance by facial recognition cameras. Drones 2023. Available online: https://www.dailymaverick.co.za/article/2023-08-13-joburg-hostels-and-townships-coming-under-surveillance-by-facial-recognition-cameras-and-drones/ (accessed on 12 April 2024).
- Masud, M.; Muhammad, G.; Alhumyani, H.; Alshamrani, S.S.; Cheikhrouhou, O.; Ibrahim, S.; Hossain, M.S. Deep learning-based intelligent face recognition in IoT-cloud environment. Comput. Commun. 2020, 152, 215–222. [Google Scholar] [CrossRef]
Application Areas | Use |
---|---|
Security | Office access, email authentication on multimedia workstations, flight boarding systems, and building access management [20]. |
Surveillance | CCTV control, power grid surveillance, portal control, and drug offender monitoring and search [20]. |
Health | To identify patients and manage patients’ medical records [25] |
Cell phone and gaming consoles | Unlocking device, gaming, and mobile banking [10] |
Question | Purpose |
---|---|
Q1 What are the most prevalent methods used for face recognition? How do they compare in performance? | To determine which techniques have been applied in the face recognition domain |
Q2 What databases are used in face recognition? | To identify which databases of face recognition are most common |
Q3 In which areas in the real world have face recognition techniques been applied? | To find out which areas have adopted face recognition |
Q4 What are the most common evaluation metrics of face recognition systems? | To assess and identify suitable evaluation metrics to use when comparative studies are carried out in the field of face recognition |
Inclusion | Criteria | Explanation |
---|---|---|
IC1 | Publicly available articles. | This criterion guarantees that only publicly available papers or those published in open-access journals are examined. The purpose is to make the articles accessible to all researchers and readers, thus fostering transparency and reproducibility. |
IC2 | English-language studies. | This promotes uniformity and avoids problems with linguistic hurdles. It also streamlines the review process because all research can be examined in a single language, making it more efficient and useful. |
IC3 | Papers about face recognition research. | The review’s goal is to give particular insights on facial recognition. This assures that the gathered research immediately adds to the knowledge and improvement of face recognition systems. |
IC4 | Articles published between 2013 and 2023. | Face recognition has advanced significantly, particularly with the introduction of deep learning and convolutional neural networks (CNNs). Research published during the previous decade is more likely to represent the most recent advances, trends, and breakthroughs in facial recognition. Excluding older papers guarantees that the evaluation focuses on current methodologies and technology. |
Exclusion | Criteria | Explanation |
---|---|---|
EC1 | Articles published in languages other than English. | This exclusion is mostly intended for efficiency. English-language studies are significantly more accessible to a worldwide audience, ensuring uniformity in terminology, research methodologies, and conclusions. Translating non-English papers would take time and might include mistakes or misinterpretations, jeopardizing the review’s credibility. |
EC2 | Studies that do not offer the complete article. | The complete paper is required for a comprehensive evaluation because it offers specific information about the approach, findings, and conclusions. Relying just on abstracts or summaries may result in a misleading or partial assessment of the study’s quality and significance. Excluding incomplete papers guarantees that the review includes only fully accessible and transparent research. |
EC3 | Studies without an abstract. | An abstract is a short overview of a study’s aims, methodology, findings, and conclusions. It enables reviewers to swiftly assess a study’s relevance to the research subject. Without an abstract, it is impossible to evaluate if the study is aligned with the review’s aims, and such studies may lack clarity or organization, resulting in exclusion. |
EC4 | Articles published before 2013. | The removal of studies published before 2013 assures that the study is focused on the most current developments in facial recognition technology. Over the last decade, there have been major advances in deep learning, notably the use of convolutional neural networks (CNNs) for facial recognition. Older articles may not reflect these improvements and may be out of date in comparison to current cutting-edge procedures. |
QAR | Quality Questions | Explanation |
---|---|---|
QAR1 | Are the study’s objectives well defined? | The study’s objectives must be well-defined in order to focus the research and link it with the research topic. Ambiguity in the study’s aims might result in imprecise or inconclusive findings. |
QAR2 | Is the study design appropriate for the research question? | This question asks if the study design is appropriate for solving the research topic. A well-chosen design (e.g., experimental, observational, etc.) assures that the study will provide valid and trustworthy findings. |
QAR3 | Is there any comparative study conducted on deep learning methods for video processing? | This determines whether the study compares different deep learning algorithms used for image and video processing. Comparative studies assist in determining which strategies are most effective and give deeper insights into the topic matter. |
QAR4 | Are the facial recognition algorithms or methods clearly described? | This question determines if the facial recognition techniques or methodologies utilized in the study are adequately presented. A thorough explanation is essential for understanding the technique used, reproducing the study, and assessing its success. |
QAR5 | Does the study have an adequate average citation count per year? | This question assesses the study’s academic effect by assessing its average citation count annually. A greater citation count might suggest that the study is well known and significant in the area, but it should also be seen in context (for example, the study’s age and field of research). |
QAR6 | Are the outcome measures clearly defined and valid? | This evaluates if the study’s outcome measures (the variables or metrics used to assess success) are properly defined and valid (meaning they accurately assess what they are meant to measure). Valid outcome measurements guarantee that the study’s findings are significant and dependable. |
Database | Year | Images | Videos | Subjects | Clean/ Occlusion | Accessible |
---|---|---|---|---|---|---|
ORL [5] | 1994 | 400 | 0 | 40 | Both | Public |
FERET [5,34] | 1996 | 14,126 | 0 | 1199 | Clean | Public |
Yale [35] | 1997 | 165 | 0 | 15 | Data | Public |
AR [20,36,37] | 1998 | >3000 | 0 | 116 | Both | Public |
CVL [38] | 1999 | 798 | 0 | 114 | Clean | Public |
XM2VTS [39] | 1999 | 2360 | 0 | 295 | Both | Public |
BANCA [40] | 2003 | Data | 0 | 208 | Clean | Public |
FRGC [41] | 2006 | 50,000 | 0 | 7143 | Clean | Public |
LFW [36] | 2007 | 13,233 | 0 | 5749 | Both | Public |
MUCT [42] | 2008 | 3755 | - | 0 | Both | Public |
CMU Multi PIE [44] | 2009 | 750,000 | 0 | 337 | Both | Public |
CASIA Webface [45] | 2014 | 494,414 | 0 | 10,575 | Both | Public |
IARPA Janus Benchmark-A [45] | 2015 | 5712 | 2085 | 500 | Both | Public |
MegaFace [46] | 2016 | 1,000,000 | 0 | 690,572 | Both | Public |
CFP [49] | 2016 | 7000 | 0 | 500 | Both | Public |
Ms-Celeb-M1 [50] | 2016 | 10,000,000 | 0 | 100,000 | Both | Public |
DMFD [51] | 2016 | 2460 | 0 | 410 | Both | Private |
VGGFACE [48] | 2016 | 2,600,000 | 0 | 2600 | Both | Public |
VGGFACE 2 [52] | 2017 | 3,310,000 | 0 | 9131 | Both | Public |
IARPA janus Benchmark-B [47] | 2017 | 21,798 | 7011 | 1845 | Both | Public |
MF2 [54] | 2017 | 4,700,000 | 0 | 672,000 | Both | Public |
DFW [55] | 2018 | 11,157 | 0 | 1000 | Both | Public |
IARPA janus Benchmark-C [53] | 2018 | 31,334 | 11,779 | 3531 | Both | Public |
CASIA mask [57] | 2021 | 494,414 | 0 | 10,575 | Occluded | Public |
Database | Strengths | Limitations |
---|---|---|
ORL [5] | Small dataset, good for controlled experiments | Limited number of subjects (40), low-resolution images, restricted pose variation |
FERET [5,34] | Diverse faces, widely used in face recognition research | Limited pose variation, restricted illumination conditions, outdated data |
Yale [35] | Good for face recognition in controlled settings | Limited number of images, poses and expressions not varied enough |
AR [20,36,37] | Large number of subjects and images, includes both clean and occluded faces | Significant noise due to occlusion, limited ethnic diversity, low-quality images |
CVL [38] | Includes a variety of ethnicities, poses, and lighting conditions | Limited number of subjects (114), less variation in environmental conditions |
XM2VTS [39] | High-quality and -resolution images, widely used in benchmarking | Small sample of subjects, data are not diverse enough for real-world scenarios |
BANCA [40] | Focused on low-impersonation tasks, balanced dataset | Restricted in terms of pose variation, focused mostly on controlled settings |
FRGC [41] | Large scale, high-resolution images, variety of facial expressions and lighting | High computational cost due to the large number of images, limited in diversity of subjects |
LFW [36] | Large-scale dataset, commonly used for benchmarking face recognition | Limited to frontal face images, performance drops in challenging real-world scenarios |
MUCT [42] | Diverse dataset in terms of ethnicity, good for real-world scenarios | Limited to images with visible faces, low variation in poses |
CMU Multi PIE [43] | Includes a variety of poses, lighting, and expressions | Faces with extreme poses or occlusions underrepresented, limited lighting conditions |
CASIA Webface [44] | Large dataset with a variety of subjects and images | Imbalanced data distribution, low-quality images, faces mostly frontal with limited lighting |
IARPA Janus Benchmark-A [45] | High-quality data, diverse subjects and poses | Relatively limited number of subjects, focused mostly on controlled settings |
MegaFace [46] | Extremely large-scale dataset with many identities | Faces may be poorly annotated or of low resolution, data quality varies |
CFP [49] | High diversity of subjects and images, challenging tasks | Limited diversity, performance drops in challenging real-world conditions |
Ms-Celeb-M1 [50] | Very large dataset, includes a wide variety of subjects | Mislabeled or noisy data, limited to celebrity faces, limited diversity in real-world scenarios |
DMFD [51] | Focuses on face manipulation detection, high-quality data | Small number of subjects (410), limited ethnic diversity, focuses on facial manipulation detection |
VGGFACE [48] | Large-scale dataset with diverse identities, popular for face recognition | Limited variation in lighting conditions, faces mostly frontal with minimal pose changes |
VGGFACE 2 [52] | Large dataset with a good variety of subjects and poses | Faces are mostly frontal, variation in poses and lighting conditions not well covered |
IARPA Janus Benchmark-B [47] | High-quality images, good variety of facial poses and conditions | Mislabeled data in some cases, faces from controlled settings, limited pose variation |
MF2 [54] | Large-scale dataset, useful for testing large-scale face recognition systems | Faces are of low resolution and poor quality in certain instances, not diverse enough |
DFW [55] | Large-scale dataset, includes challenging scenarios for face recognition | Faces with extreme poses or occlusions underrepresented, limited facial expressions |
IARPA Janus Benchmark-C [53] | Includes high-quality data, diverse set of subjects | Data may be noisy or mislabeled, limited variation in facial expressions |
CASIA mask [57] | Focuses on occlusion, valuable for studying occluded faces | Occlusion focus limits the dataset’s applicability to face recognition in unconstrained environments |
Algorithm | Strength | Weakness |
---|---|---|
PCA [14,16,58,59,60] | Works well in low-dimensional spaces and is easy on resources for small, well-aligned face datasets | Difficulties with big datasets and non-frontal faces; sensitive to changes in illumination, emotion, and posture |
Gabor filter [61,62,63] | Withstands variations in illumination and is capable of capturing spatial details and face texture at various sizes. | Complex and resource-intensive, with challenges managing wide stance variations and non-frontal faces |
Viola–Jones [64] | Fast, efficient, real-time face detection that is resistant to lighting fluctuations and effective for frontal faces. | Low accuracy with non-frontal faces, sensitive to position changes, and challenged by occlusions and crowded backdrops. |
SVM [65] | Outstanding generalization, adaptation to classification challenges, handling of non-linear data using kernels, and accuracy. | Computationally costly, sensitive to feature selection, has trouble handling big datasets, and needs precise parameter adjustment. |
HOG [66] | Excellent at capturing edge and texture details, and resilient to minor changes in pose and lighting. | Demands precise adjustment of parameters (e.g., cell size, block normalization). Less efficient when there are significant pose variations or occlusions. |
AlexNet [67,68] | High accuracy, can handle enormous datasets, and is resistant to position, lighting, and expression changes. | High computing costs, extensive training data needs, and sensitivity to overfitting with limited datasets. |
VGGNet [69] | High accuracy, deep architecture, and robust performance on complicated datasets with a variety of faces. | Computationally costly, requires huge datasets, and may suffer from overfitting with insufficient data. |
ResNet [70,71] | Deep architecture improves accuracy, handles complicated features, and reduces vanishing gradient concerns. | Computationally complex, requires massive datasets, and can be difficult to train and infer. |
FaceNet [72] | Real-time performance, strong feature extraction, high accuracy, and efficacy in large-scale face recognition. | Needs a lot of computing power, big datasets, and could have trouble with significant occlusions or position changes. |
LBPNet [73] | Excels in texture categorization, capturing local patterns with great performance and economical calculation. | Difficulties with high computational cost and may underperform on complicated, extremely diverse materials. |
LWCNN [74] | Excels in capturing spatial data and using lightweight, effective convolutional layers to increase classification accuracy. | Due to its lightweight design and parameter limitations, it struggles with extremely complicated patterns. |
YOLO [75] | Excels in real-time object identification, providing quick, precise, and effective results for a range of activities. | Has trouble detecting little objects, being accurate in busy environments, and having limited precision in complicated situations. |
MTCNN [76] | Demonstrates exceptional proficiency in multi-task facial detection, providing great precision in facial alignment and identification. | Performs poorly in complicated or obstructed face circumstances and has trouble in real time. |
DeepMaskNet [77] | Specializes in precise object segmentation and uses deep learning to produce high-quality, accurate mask predictions. | Has significant computing needs and may struggle to execute in real time with complicated situations. |
DenseNet [78] | Excels in feature reuse, increasing efficiency by densely linking layers for efficient information flow. | Has significant memory usage and computational complexity, which limits scalability to large-scale models or datasets. |
MobileNetV2 [79] | Specializes in lightweight, efficient architecture, providing quick performance with minimal computational and memory expenses. | May compromise accuracy for efficiency, difficulty with complicated jobs that need great precision and intricacy. |
MobileFaceNets [80] | Excels at facial recognition in real time, combining cheap computational cost and great accuracy. | Decreased accuracy under difficult circumstances, like rapid changes in posture and occlusions. |
ViT [81,82] | Uses the Transformer architecture to achieve high accuracy in collecting global context for picture recognition. | Struggles with smaller datasets or a lack of training data, and demands huge datasets and computing resources. |
Face Transformer [83,84] | Excels in facial recognition, using a Transformer-based architecture to capture context and fine features. | Demands a lot of processing power and big datasets, having trouble with efficiency or smaller datasets. |
DeepFace [85] | Achieves great accuracy even in difficult conditions like low-resolution inputs or big datasets. | High computing cost and complexity, especially with huge datasets or lengthy sequences, which might be a constraint. |
Attention [86,87] | Excels in end-to-end learning for effective face recognition, precision, and robustness to variances. | Requires huge datasets, has trouble with high occlusions, and demands a lot of computing power during training. |
Swin [89] | Is excellent at collecting hierarchical features and provides great scalability and accuracy for image processing. | Requires a lot of resources, has high computational complexity, and might have trouble with jobs that need to be completed in real time. |
Perfomance Metric | Strength | Weakness |
---|---|---|
Accuracy | Simple, intuitive, and easy to compute and comprehend. | Does not offer a whole view, particularly in skewed datasets. High accuracy might be deceptive in situations such as facial blockage or aging. |
Precision and Recall | Useful for unbalanced datasets. Aids in determining false positives (precision) and proper identification (recall). | Precision and recall are negatively connected. Does not provide a fair perspective when both false positives and false negatives must be reduced. |
F1-Score | Balances precision and recall. Effective when both false positives and false negatives are essential. | Does not discriminate between the relative relevance of precision and recall. In some circumstances, discrepancies may be masked. |
Receiver Operating Characteristic (ROC) Curve | Visualizes the trade-off between FAR and FRR. Aids in comparing models across various thresholds. | Does not offer direct insight into absolute performance. AUC-ROC might be deceptive in unbalanced datasets. |
Area Under the Curve (AUC) | Provides a single number that summarizes performance. Resistant to skewed datasets. | Does not consider operational thresholds. Some subgroups may be overestimated in terms of model performance. |
Architectures | Training Set | Year | Authors | Convolutional Layers | Verif. Metric | Accuracy |
---|---|---|---|---|---|---|
SVM | ORL | 2003 | Yanhun and Chongqing [114] | - | - | 96% |
Gabor Wavelets | ORL | 2001 | Kepenekci. [115] | - | - | 95.25% |
ESP-CNN + CNN | ORL | 2019 | Talab et al. [96] | - | - | 93.5% |
Ensemble CNN | ORL | 2020 | Feng et al. [97] | 4 | Softmax | 88.5% |
VGG16 | ORL | 2020 | Lou and Shi [116] | 16 | Center Loss and Softmax | 99.02% |
DeepFace | LFW | 2014 | Taigman et al. [48,85] | - | Softmax | 97.35% |
FaceNet | LFW | 2015 | Parkhi et al. [48] | - | Triplet Loss | 98.87% |
DCR | LFW | 2018 | Lu et al. [94] | - | CM Loss | 93.6% |
ResNet | LFW | 2018 | Lu et al. [94] | - | CM Loss | 72.7% |
Localized Deep-CNN | LFW | 2018 | Al-Azzawi et al. [93] | - | Softmax | 97.13% |
FaceNet | LFW | 2021 | Malakar et al. [103] | - | - | 70–80% |
MTCNN | LFW | 2021 | Wu and Zhang [101] | 9 | Triplet Loss and ArcFace Loss | 97.83% |
MTCNN + FaceNet | LFW | 2021 | Wu and Zhang [101] | 9 | Triplet Loss and ArcFace Loss | 99.85% |
VGG16 | VGGFace | 2015 | Parkhi et al. [85] | - | Triplet Loss | 98.95% |
Light-CNN | MS-Celeb-1M | 2015 | Parkhi et al. [48] | - | Softmax | 98.8% |
Traditional CNN | CMU-PIE | 2018 | Qu et al. [95] | 5 | Sigmoid | 99.25% |
ESPCN + CNN | Yale | 2019 | Talab et al. [96] | - | - | 95.3% |
VGG16 | Yale | 2020 | Lou and Shi [116] | 16 | Center Loss and Softmax | 97.62% |
LWCNN | Yale Face Database | 2020 | Lin et al. [98] | 9 | Softmax | 96.19% |
VGG16 | CASIA | 2020 | Lou and Shi [116] | 16 | Center Loss + Softmax | 98.65% |
CASIA Mask | CASIA | 2021 | - | - | Occluded Faces | - |
AlexNet | Own dataset | 2021 | Szmurlo and Osowski [100] | 9 | Softmax | 97.8% |
AlexNet | Own dataset | 2022 | Mahesh and Ramkumar [117] | 8 | Softmax | 96% |
Face Transformer | - | 2022 | Sun and Tzimiropoulos [84] | - | - | 99.83% |
YOLO-Face | FDDB | 2021 | Sarahi et al. [102] | - | - | 72.8% |
PCA + FaceNet | Yale Face Database B | 2021 | Malakar et al. [103] | - | - | 85–95% |
Extended VGG19 | - | 2022 | Marjan et al. [104] | 19 | Softmax | 96% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nemavhola, A.; Chibaya, C.; Viriri, S. A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition. Information 2025, 16, 107. https://doi.org/10.3390/info16020107
Nemavhola A, Chibaya C, Viriri S. A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition. Information. 2025; 16(2):107. https://doi.org/10.3390/info16020107
Chicago/Turabian StyleNemavhola, Andisani, Colin Chibaya, and Serestina Viriri. 2025. "A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition" Information 16, no. 2: 107. https://doi.org/10.3390/info16020107
APA StyleNemavhola, A., Chibaya, C., & Viriri, S. (2025). A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition. Information, 16(2), 107. https://doi.org/10.3390/info16020107