Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition

Kim, Hansoo

doi:10.3390/app15073613

Open AccessArticle

Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition

by

Hansoo Kim

Department of Forensic Information Science and Technology, and Legal Informatics and Forensic Science Institute, Hallym University, Chuncheon 24252, Republic of Korea

Appl. Sci. 2025, 15(7), 3613; https://doi.org/10.3390/app15073613

Submission received: 18 February 2025 / Revised: 12 March 2025 / Accepted: 21 March 2025 / Published: 26 March 2025

(This article belongs to the Special Issue Application of Artificial Intelligence in Face Recognition Research)

Download

Browse Figures

Versions Notes

Abstract

:

Advancements in science, technology, and computer engineering have significantly influenced biometric identification systems, particularly facial recognition. However, these systems are increasingly vulnerable to sophisticated forgery techniques. This study presents a novel deep learning framework optimized for texture analysis to detect facial forgeries effectively. The proposed method leverages high-frequency texture features, such as roughness, color variation, and randomness, which are more challenging to replicate than specific facial features. The network employs a shallow architecture with wide feature maps to enhance efficiency and precision. Furthermore, a binary classification approach combined with supervised contrastive learning addresses data imbalance and strengthens generalization capabilities. Experimental results, conducted on three benchmark datasets (CASIA-FASD, CelebA-Spoof, and NIA-ILD), demonstrate the model’s robustness, achieving an Average Classification Error Rate (ACER) of approximately 0.06, significantly outperforming existing methods. This approach ensures practical applicability for real-time biometric systems, providing a reliable and efficient solution for forgery detection.

Keywords:

facial forgery detection; biometrics; human identification; image forensics; data imbalance; texture analysis

1. Introduction

With the continuous development of science, technology, and computer engineering, many traditional techniques in society are rapidly transitioning to methods utilizing digital technology. The methods for personal identification and authentication are no exception. Biometric recognition technology offers a compelling solution by leveraging an individual’s unique physiological or behavioral traits for identification and verification. Biometric recognition technology, providing a secure, accurate, fast, and convenient means of distinguishing and verifying individuals, has undergone significant advancements across academic research and various industries in response to societal needs and interests.

It is estimated that the global biometrics market will grow to 86.1 billion USD by 2028 from its projected 2023 value of 47.8 billion USD [1], showcasing the increasing influence of digital technology in everyday life. Biometric recognition technology can be classified into physiological features, which directly extract information from the body, and behavioral features, which measure specific individual behavior patterns. Examples of physiological features include fingerprints, faces, irises, retinas, veins, hand geometry, palm prints, and brainwaves, while behavioral features encompass signatures, voiceprints, gait, and keystroke dynamics. Key methods and characteristics of the commonly used biometrics are described in Table 1 [2].

As biometric recognition technology continues to evolve, its security and convenience are expected to improve further. However, addressing privacy concerns and ensuring ethical practices are crucial for its widespread adoption. By striking a balance between innovation and responsibility, biometric authentication has the potential to revolutionize personal identification, enhancing security and convenience across various industries and aspects of daily life. Among these biometric recognition and human identification technologies, face is the most common characteristic used by humans for recognition [3]. Facial recognition technology, due to its simplicity, convenience, and widespread use of high-resolution smart devices, has gained considerable attention in daily life [4]. With the increasing deployment of face recognition in many applications in the field of the cyber physical system and the industrial control system such as remote-sensing entrance guards, surveillance systems, and intelligent human machine interfaces, its security concern becomes increasingly important [5].

Concurrently, the development of computing technology, advancements in personal computers and digital camera capabilities, the emergence of high-performance image processing hardware and software, and the revolutionary progress of 3D printing have empowered individuals to easily create, modify, and manipulate various types of information. Therefore, facial information can also be easily and sophisticatedly forged. Forgery, also known as spoofing, counterfeiting, or manipulation, involves intentionally altering or creating false information with deceptive intent. The use of forged information can lead to the irreparable loss of property and trust, necessitating significant capital, time, and effort for recovery [6]. The relentless march of technological progress has brought about a fascinating paradox. While computing power, PC and digital camera capabilities, and high-performance image processing tools have empowered individuals for legitimate purposes, they have also inadvertently opened a Pandora’s box of manipulation. This very development has made facial information, once considered a reliable identifier, susceptible to sophisticated forgery. Significant facial forgery attack types with their characteristics are described in Table 2 [7,8,9].

Therefore, the verification of the authenticity of biometric information has become a critical issue. In alignment with the utilization and advancement of facial recognition technology, various facial recognition forgery techniques have emerged, involving manipulated photos, digital reproductions on smart devices, 3D masks, and cosmetic makeup [4,10].

Anti-spoofing refers to technologies that detect such forgeries to ascertain the authenticity of facial features. Traditional methods, utilizing techniques like Local Binary Patterns (LBP) or Histogram of Oriented Gradients (HOG), focus on detecting alterations in images. However, these methods are often limited to specific forgery types or specific scenarios, lacking adaptability to newly discovered forgery methods. Additionally, research is actively addressing the detection of human-specific features, such as eye blinking and subtle movements, which are characteristic of genuine individuals [11]. Nevertheless, these methods may require several seconds of continuous video capture and may be vulnerable to video replay attacks. While significant progress has been made in the field of biometric facial recognition, challenges persist in developing comprehensive anti-spoofing measures that can adapt to evolving forgery techniques and effectively address real-world scenarios.

Over the years, advancements in fields such as mathematics, mechanics, and cognitive science have paved the way for the continuous development of various image processing and computer vision algorithms [12]. In particular, recent breakthroughs in deep learning technology have significantly benefited the domain of image processing, witnessing diverse applications. Notably, convolutional neural networks (CNNs) have emerged as one of the most successful models in deep learning, particularly excelling in object recognition and decision-making tasks [13]. This integration of deep learning and computer vision, providing machines with visual perception, offers promising solutions to long-standing problems that humanity has sought to address [14].

Additionally, research continues on deep learning approaches to identify forged features. One notable example of outstanding research is the development of an algorithm that efficiently distinguishes facial forgery by self-learning genuine faces [15]. Additionally, an algorithm is proposed that efficiently distinguishes facial forgery by adjustably learning the weights of the 3 × 3 gradient filter used in convolution [16]. Furthermore, there have been significant advances in face recognition due to deep learning techniques [3]. However, research on facial forgery prevention is still largely focused on addressing known forgery techniques, while studies on effectively responding to emerging forgery techniques remain limited. Despite the advancement of technology, which enables more diverse and sophisticated facial forgery methods, research has not been able to keep pace with these developments. Proactive research and responses are needed to address this issue.

The accuracy and reliability of deep learning results are significantly influenced by the quality and quantity of the data used for training. A key bottleneck in deep learning research is securing a sufficient amount of high-quality data, as this directly impacts the performance of models. Without adequate and well-prepared data, even the most advanced algorithms can underperform, leading to inaccurate or biased predictions. In classification tasks, achieving a balance in the number of data samples across different categories is crucial. Imbalanced data can skew the learning process, causing the model to favor the majority class and neglect the minority class, resulting in poor generalization and inaccurate predictions. Therefore, both data quality and balance are fundamental to deep learning and essential for producing objective and reasonable results. The importance of these factors cannot be overstated, as they form the foundation for reliable and fair analysis in machine learning systems.

Data imbalance refers to the situation where the number of data samples in certain classes is significantly smaller than in other classes. This issue can negatively impact the training of machine learning models, particularly by reducing the prediction performance for the minority class. Models trained on imbalanced data tend to be biased toward the majority class, as they encounter it more frequently during training. This can result in poor generalization and inaccurate predictions for the minority class. Data imbalance is a common problem across various domains, but it is especially prevalent in the following areas [17]:

Medical Field: Medical datasets often suffer from imbalance because rare diseases or abnormal findings are, by nature, scarce. For example, images of rare medical conditions or abnormal test results are much harder to collect than those of normal or healthy cases. This disparity can significantly affect the diagnostic accuracy of deep learning models. Models may become highly adept at identifying common conditions but fail to accurately diagnose rare or critical conditions, which can lead to harmful misdiagnoses.
Finance: In fraud detection or credit risk prediction, fraudulent transactions or cases of high credit risk occur far less frequently than normal transactions. This imbalance creates challenges in developing models that are sensitive enough to detect fraudulent behavior without overfitting to the majority (normal transactions), where the vast majority of data resides. This imbalance can severely compromise the model’s ability to detect fraudulent activities, which are often the most critical to identify.
Cybersecurity: In areas such as malware detection, intrusion detection, or DDoS (Distributed Denial of Service) attack prevention, the amount of malicious activity data is typically much smaller compared to normal network traffic data. Since malicious events are relatively rare compared to regular network activity, this data imbalance can hinder a security system’s ability to accurately detect attacks, leaving the system vulnerable to undetected threats.
Law Enforcement and Forensics: Data imbalance also exists in fields such as crime pattern detection and forensic analysis. For example, rare events such as serial killings or specific criminal behaviors (e.g., anonymous threat letters) are difficult to model due to their infrequency. This scarcity of data makes it hard for predictive models to identify or generalize patterns in such cases, which can have significant real-world implications, such as delays in identifying criminal activity.

To address the issue of data imbalance, various methods have been proposed, each with its own set of advantages and disadvantages [18]. One of the most common approaches is oversampling, where additional samples are generated for the minority class to balance the dataset. This can be achieved by duplicating existing samples or through techniques like data augmentation, where synthetic samples are created by slightly altering existing data points. Image augmentation methods include random shifts, flips, rotations, adjustments of brightness, and combinations of these to the designated image data. While oversampling is simple to implement, it can increase the risk of overfitting, especially if duplicate data leads the model to “memorize” specific patterns in the minority class rather than learning to generalize.

Another approach is undersampling, which involves reducing the number of samples in the majority class to create a more balanced dataset. This can be conducted randomly or using algorithms that selectively remove redundant data. Undersampling reduces the likelihood of overfitting but comes with the disadvantage of potentially discarding valuable information from the majority class, which may negatively affect the model’s overall performance.

Addressing data imbalance is critical for ensuring that machine learning models perform well, especially in applications where the accurate detection of rare events is vital. Choosing the right technique for handling data imbalance depends on the specific problem, the available data, and the performance requirements of the model. Both oversampling and undersampling have their place, but careful consideration must be given to avoid overfitting or the loss of important data. As machine learning continues to be applied in high-stakes domains such as healthcare, finance, and cybersecurity, finding effective solutions to the data imbalance problem remains an essential challenge for researchers and practitioners alike.

In this study, a novel facial forgery detection technique is proposed for effective biometric recognition using innovative deep learning networks and algorithms. The proposed algorithm effectively addresses the chronic issue of data imbalance in deep learning. Additionally, the approach allows for the determination of forgery authenticity without being constrained by specific forgery methods. The implementation and experimental results demonstrate an ACER of approximately 0.06, achieving a maximum performance improvement compared to previous studies.

The rest of this paper is described as follows. In Section 2, dataset acquisition and the proposed algorithm are described. The experimental results are analyzed in Section 3, and the overall discussion is mentioned in Section 4. The terms “forgery”, “spoofing”, “counterfeiting”, and “manipulation” can have different definitions depending on the context, both in general and technical manner. In this paper, the term “forgery” is used to encompass all of the meanings of the terms mentioned above, as they all involve the creation of false or counterfeit facial information. Also, the term “genuine” is used to refer to original, unaltered facial information. This clarification of terms will help to avoid any confusion and ensure that the results of this study are interpreted correctly.

2. Materials and Methods

2.1. Dataset and Experimental Environment

Qualified datasets that have been thoroughly validated and widely used in various studies are employed for the experiment of the proposed algorithm.

CASIA-FASD: CASIA-FASD [19] is a face database which can cover a diverse range of potential attack variations. The database contains 50 genuine subjects, and fake faces are made from those of the genuine faces. Each subject possesses 3 designated qualities (namely the low, normal, and high) and each quality has one genuine, and 3 forgeries (warped photo attack, cut photo attack, and video attack) derived from the genuine. Therefore, each subject contains 12 videos (3 genuine and 9 fake), and the final database contains 600 video clips. To properly apply the deep learning module and improve the level and convenience of learning, appropriate still images are extracted and organized based on the characteristics of the video frames [20].
CelebA-Spoof: CelebA-Spoof [21] is a large-scale face anti-spoofing dataset that has 625,537 images from 10,177 subjects, which includes 43 rich attributes on face, illumination, environment, and forgery types. Among 43 rich attributes, 40 attributes belong to genuine images including all facial components and accessories such as skin, nose, eyes, eyebrows, lips, hair, hat, and eyeglasses. A total of 3 attributes belong to forged images including forgery types, environments, and illumination conditions. The forgery types include print (photo, poster, and A4), papercut (facemask, upper body mask, and region mask), replay (PC, pad, and phone), and 3D masks for various lighting conditions (4 brightness and 2 locations).
NIA-ILD: NIA-ILD is the facial image dataset consisting of over 1.6 million images, obtained from the results of the Images for Liveness Detection [22] by the National Information Society Agency [23]. Facial image data are acquired by capturing videos and then extracting still frames through the capture process. Various capture devices, including smartphones and tablet PCs, are evenly utilized to avoid bias towards a specific device. The lighting conditions are categorized into three levels, and efforts are made to ensure an equal gender distribution among subjects, covering age groups from 10s to 60s, including individuals wearing glasses or masks to simulate real-world facial recognition scenarios. The forgery types include print attacks (categorized as directly attaching a printed photo with flattening and bending, and attaching a printed photo with holes cut in the eye, nose, and mouth areas with flattening and bending), replay attacks (categorized as attacks using a smartphone and a tablet PC), and 3D masks.

Examples of the collected dataset used in this paper are shown in Figure 1, Figure 2 and Figure 3. Details regarding the quantity, types, resolution, and the forgery methods of the collected dataset are provided in Table 3.

As for the experimental environment, the hardware specifications and operating system information used in this study are listed below:

GPU: NVIDIA GeForce RTX 4080 (16 GB VRAM)
CPU: Intel Core i9-13900K
RAM: Samsung DDR5 5600 MHz 32 GB ×2 (total 64 GB)
Operating System: Ubuntu 22.04

2.2. Proposed Algorithm

2.2.1. Classification and Control Criteria

One of the significant challenges in combating facial forgery, particularly with the rise of deepfakes, lies in achieving a balance between specificity and generalizability in deep learning network design. Overly specific classification criteria, while offering detailed information on the forgery method, can be susceptible to overfitting and struggle to adapt to emerging forgery techniques. Conversely, overly broad criteria may lack the necessary discriminatory power to effectively distinguish genuine from forged images.

This paper proposes a novel approach that addresses this challenge by adopting a binary classification strategy. This approach focuses solely on differentiating between authentic and forged images, bypassing the need to categorize specific acquisition methods or individual facial forgery attack techniques. By employing a binary classification framework, the deep learning network can concentrate on learning the core characteristics that differentiate genuine and forged facial data.

This design choice offers several key advantages. First, it reduces overfitting risk. By focusing on a binary classification task, the model complexity is reduced, mitigating the risk of overfitting to the training data. This allows the network to generalize more effectively to unseen examples, including potential future forgery methods. Second, it enhances adaptability to evolving forgery techniques. The binary classification approach prioritizes the identification of core forgery signatures, regardless of the specific manipulation method employed. This enables the network to adapt and detect new forgery techniques, even if they differ in their technical details from those encountered during training, as long as they share fundamental characteristics with previously observed forgeries. Third, it achieves computational efficiency. A binary classification task requires a simpler final layer with only two output nodes, compared to a multi-class approach with numerous output nodes. This translates to a more computationally efficient model, facilitating faster training and deployment.

The final layer of the proposed deep learning network utilizes a Softmax activation function. This function ensures that the network’s output probabilities sum to one, providing a clear distinction between the genuine and forged image classes. This paper advocates for a binary classification approach in deep learning networks for facial forgery detection. This strategy prioritizes generalizability and adaptability to emerging forgery methods, while maintaining effectiveness in distinguishing authentic from manipulated facial data. By striking a balance between specificity and generalizability, the proposed approach offers a robust and future-proof solution for combating the evolving threat of facial forgery.

2.2.2. Proposed Deep Learning Network

Following a rigorous review of the acquired facial image dataset and a comprehensive analysis of the features present in forged samples, a critical observation emerged. While the presence of eyes, nose, mouth, and ears play a role in facial recognition, the texture of the facial surface proved to be a far more reliable indicator of forgery. The term “texture” refers not to specific objects within an image but to signals characterized by relatively high-frequency noise levels distributed across the entire image. This textural information encompasses characteristics such as roughness, color variations, pattern type, size, frequency of occurrence, and the level of randomness or periodicity within the patterns [24,25].

This pivotal insight informed the design of a novel deep learning network specifically optimized for texture analysis. The network architecture prioritizes the extraction of these textural features, enabling it to effectively distinguish between genuine and forged facial images. The two key design choices are shown below.

Focus on Texture over Objects: By prioritizing texture analysis, the network is less susceptible to inconsistencies in the rendering of individual facial features, which can sometimes be meticulously crafted in forgeries. Texture, on the other hand, is often more challenging to replicate perfectly, especially for forgeries generated using less sophisticated techniques.
Small Number of Layers with Wide Feature Maps: To achieve efficient implementation and reduce computational complexity, the network employs a relatively shallow architecture with a smaller number of layers. However, to compensate for the reduced depth, these layers utilize wider feature maps. Wider feature maps allow the network to capture a richer and more nuanced representation of the textural information within the images.

This combination of a focus on texture analysis and a network architecture optimized for efficiency ensures that the proposed solution can effectively detect forgeries while maintaining a practical implementation footprint. The network’s ability to prioritize these subtle textural cues, often overlooked by traditional forgery detection methods, empowers it to achieve a high degree of accuracy in distinguishing genuine from forged facial data.

One of the promising solutions that addresses the data imbalance issue is supervised contrastive learning [26]. Supervised contrastive learning is a contrastive learning technique for representation learning, particularly applied to images. The algorithm positions different data points of the same class closely in the embedding space while pushing data points of different classes farther apart. In other words, it organizes the embedding space based on class labels, enabling efficient and accurate learning, and thereby demonstrating superior performance compared to traditional methods. The algorithm ensures that, within the embedding space, data points belonging to the same class are drawn closer, whereas data points from different classes are repelled from each other. When classification is complete, this approach establishes an embedding space where each class occupies a defined range, regardless of the number of data points per class, providing a solution to the class imbalance issue.

On the other hand, when applying deep learning networks to facial forgery detection, excessively fine-grained classification criteria can lead to overfitting. This overfitting occurs because the model becomes overly focused on existing forgery techniques, resulting in limited capability to detect new types of forgeries. To address this, the proposed method employs a binary classification approach for deep learning networks. Specifically, the model is trained to classify data into two categories: genuine and forged. This approach avoids distinguishing between different acquisition methods for genuine data or various forgery techniques, thus allowing the model to detect forgeries that share similar characteristics with known attack methods. By adopting a binary classification scheme, the proposed method mitigates the overfitting problem and improves the model’s ability to handle novel forgery attacks.

The architecture of the proposed network, as designed by the algorithm mentioned above, is illustrated in Figure 4. The network consists of two stages. In Stage 1, EfficientNetB0 [27] is used as a base model, and supervised contrastive learning is adopted to tackle the data imbalance issue. To effectively address both the wide range of existing forgery methods and the emerging forgery methods, a large-sized CNN designed for texture analysis is introduced. The output from Stage 1, having sufficient dimensionality, is passed as input to Stage 2. In Stage 2, the results from the previous stage are used to effectively perform the binarization task. This two-stage approach serves both purposes of handling diverse and new-coming forgery methods as well as addressing data imbalance.

The proposed deep learning network continuously updates the weights as parameters for each layer, excluding the parameters of the base model, and striving to find optimal values during the training process. The total number of parameters is 13,282,725, weighing approximately 50.67 Megabytes.

3. Results

3.1. Implementation Details

The proposed algorithm and related deep learning network is implemented using the TensorFlow 2.14.0 library in Python 3.9.2. The deep learning network illustrated in Figure 4 is implemented with a batch size of 16 and 100 epochs for training. A total of 70% of the available data are used for training and 20% are used for validating the deep learning network, while the remaining 10% are used for testing. The total training time is approximately 71 h for the complete dataset.

3.2. Internal Analysis

3.2.1. Performance Evaluation

As the main evaluation method for performance assessment, the Average Classification Error Rate (ACER) is employed, which is widely recognized in the evaluation of face forgery detection methods that have been utilized in a representative study [16] referenced in the guidelines of major standardization bodies [28].

The proposed algorithm is trained, evaluated, and tested using the collected dataset, and the overall experimental results based on detailed metrics are presented in Table 4 and Figure 5.

3.2.2. Leave-One-Out Cross-Validation

To verify the generalization capability of the proposed algorithm, a LOOCV (Leave-One-Out Cross-Validation) using the forgery methods present in the collected dataset is conducted. It is examined whether newly introduced forgery data is effectively detected when the model is trained without a specific forgery method. In this experimental analysis, one forgery method is excluded from the training process, the model is trained on the remaining dataset, and the excluded method is used as the test set.

For this evaluation, print attack and replay attack, which are commonly applied forgery techniques in the collected datasets, are used for testing. The results of this experiment are presented in Table 5 and Table 6. As observed in the results, strong generalization capability is demonstrated by the proposed approach, effectively detecting forged data generated using previously unseen methods.

3.3. Comparative Analysis

Table 7 shows a comparative analysis of the proposed approach’s ACER against previous studies in the field of facial forgery detection. The collected dataset, which is used for training, evaluation, and testing of the proposed algorithm, is also employed in the comparative analysis. Compared to the representative previous studies, the proposed deep learning network shows a remarkable achievement in its ACER.

4. Discussion

The experimental outcomes clearly demonstrate the efficacy of the proposed deep learning network in detecting facial forgeries. Notably, the texture-focused approach enables the system to capture subtle anomalies often overlooked by conventional methods. The comparative analysis (Table 7) highlights a substantial improvement over previous approaches, with the proposed algorithm achieving an ACER of 0.06.

The databases used play a crucial role in achieving these results. CASIA-FASD provides diverse attack variations, allowing the model to learn from a range of forgery scenarios. CelebA-Spoof contributes with its vast variety of environmental and forgery attributes, further enhancing model robustness. NIA-ILD, with its real-world liveness detection conditions, ensures that the model generalized well to practical scenarios. These diverse datasets enable the model to adapt effectively to unseen forgery methods, as validated through Leave-One-Out Cross-Validation experiments.

Despite the superior performance, certain factors might have influenced the outcomes. The balance and diversity of the datasets significantly contribute to the model’s learning capability. However, variations in lighting conditions, device types, and environmental factors could have introduced biases. Addressing these potential biases in future research will be critical for improving the model’s adaptability.

Furthermore, computational efficiency is achieved by employing a network architecture with fewer layers but wider feature maps, ensuring that the system can be implemented in real-time applications without significant hardware constraints. The binary classification approach not only simplifies the training process but also enhances the model’s ability to generalize to new types of forgeries.

Funding

This research was supported by Hallym University Research Fund, 2024 (HRF-202410-004).

Institutional Review Board Statement

This research was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Hallym University (protocol code HIRB-2024-108 and date of approval: 8 January 2025).

Informed Consent Statement

Informed consent has been obtained from all subjects involved in the study.

Data Availability Statement

All the data used in this study can be found and downloaded at CASIA-FASD [19], CelebA-Spoof [21], and NIA-ILD [22] by appropriate requests.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Biometrics Market Reports. Available online: https://www.biometricupdate.com/biometric-news/biometric-research (accessed on 2 February 2024).
Rui, Z.; Yan, Z. A Survey on Biometric Authentication: Toward Secure and Privacy-Preserving Identification. IEEE Access 2019, 7, 5994–6009. [Google Scholar] [CrossRef]
Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019, 189, 102805. [Google Scholar] [CrossRef]
Jourabloo, A.; Liu, Y.; Liu, X. Face de-spoofing: Anti-spoofing via noise modeling. Lect. Notes Comput. Sci. 2018, 11217, 297–315. [Google Scholar]
Pei, M.; Yan, B.; Hao, H.; Zhao, M. Person-specific face spoofing detection based on a Siamese network. Pattern Recognit. 2023, 135, 109148. [Google Scholar] [CrossRef]
Kim, H. A study of cross-verification method for authenticating digital image evidence using inconsistent image frequency distribution. J. Korean Assoc. Sci. Crim. Investig. 2018, 12, 167–173. [Google Scholar] [CrossRef]
Naitali, A.; Ridouani, M.; Salahdine, F.; Kaabouch, N. Deepfake attacks: Generation, detection, datasets, challenges, and research directions. Computers 2023, 12, 216. [Google Scholar] [CrossRef]
Bai, T.; Luo, J.; Zhao, J. Inconspicuous adversarial patches for fooling image recognition systems on mobile devices. IEEE Internet Things J. 2022, 9, 9515–9524. [Google Scholar]
Sharma, D.; Selwal, A. A survey on face presentation attack detection mechanisms: Hitherto and future perspectives. Multimed. Syst. 2023, 29, 1527–1577. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Yu, Z.; Zhao, C.; Zhu, X.; Qin, Y.; Zhou, Q.; Zhou, F.; Lei, Z. Deep spatial gradient and temporal depth learning for face anti-spoofing. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14 June 2020. [Google Scholar]
Patel, K.; Han, H.; Jain, A.K. Cross-database face antispoofing with robust feature representation. In Proceedings of the 11th Chinese Conference on Biometric Recognition (CCBR), Chengdu, China, 14 October 2016. [Google Scholar]
Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Pearson: New York, NY, USA, 2018; pp. 17–46. [Google Scholar]
Oh, I.S.; Lee, J.S. Artificial Intelligence by Python; Hanbit Academy: Seoul, Republic of Korea, 2021; pp. 249–320. [Google Scholar]
Kim, H. A skin condition inspection method using convolutional neural network suitable for surface analysis. J. Next-Gener. Converg. Technol. Assoc. 2022, 6, 1526–1531. [Google Scholar]
Qin, Y.; Zhang, W.; Shi, J.; Wang, Z.; Yan, L. One-class adaptation face anti-spoofing with loss function search. Neurocomputing 2020, 417, 384–395. [Google Scholar] [CrossRef]
Wang, C.; Yu, B.; Zhou, J. A learnable gradient operator for face presentation attack detection. Pattern Recognit. 2023, 135, 109146. [Google Scholar] [CrossRef]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on Deep Learning with Class Imbalance. J. Big Data 2019, 6, 27. [Google Scholar]
Kulkarni, A.; Chong, D.; Batarseh, F.A. Data Democracy; Academic Press: Philadelphia, PA, USA, 2020; pp. 83–106. [Google Scholar]
Zhang, Z.; Yan, J.; Liu, S.; Lei, Z.; Yi, D.; Li, S.Z. A face antispoofing database with diverse attacks. In Proceedings of the 5th IAPR International Conference on Biometrics (ICB), New Delhi, India, 29 March 2012. [Google Scholar]
CASIA-FASD. Available online: https://www.kaggle.com/datasets/minhnh2107/casiafasd (accessed on 1 July 2024).
Zhang, Y.; Yin, Z.; Li, Y.; Yin, G.; Yan, J.; Shao, J.; Liu, Z. CelebA-Spoof: Large-scale face anti-spoofing dataset with rich annotations. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Virtual, 23 August 2020. [Google Scholar]
Image for Liveness Detection. Available online: https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=161 (accessed on 9 November 2022).
National Information Society Agency (NIA). Available online: https://www.nia.or.kr/ (accessed on 9 November 2022).
Shang, C.; Lieping, Z.; Gepreel, K.A.; Yi, H. Surface roughness measurement using microscopic vision and deep learning. Front. Phys. 2024, 12, 1444266. [Google Scholar] [CrossRef]
Tatzel, L.; Leóna, F.B. Image-based roughness estimation of laser cut edges with a convolutional neural network. In Proceedings of the 11th CIRP Conference on Photonic Technologies, Virtual, 7 September 2020. [Google Scholar]
Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6 December 2020. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9 June 2019. [Google Scholar]
ISO/IEC 30107-3:2023; Information Technology—Biometric Presentation Attack Detection Part 3: Testing and Reporting Published. ISO: Geneva, Switzerland. Available online: https://www.iso.org/standard/79520.html (accessed on 7 January 2025).
Guo, X.; Liu, Y.; Jain, A.; Liu, X. Multi-domain learning for updating face anti-spoofing models. In Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 23 October 2022. [Google Scholar]
He, X.; Liang, D.; Yang, S.; Hao, Z. Joint physical-digital facial attack detection via simulating spoofing clues. In Proceedings of the 2024 Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 17 June 2024. [Google Scholar]
Ma, Y.; Wu, L.; Li, Z.; Liu, F. A novel face presentation attack detection scheme based on multi-regional convolutional neural networks. Pattern Recognit. Lett. 2020, 131, 261–267. [Google Scholar] [CrossRef]

Figure 1. Examples of the forged and genuine facial image in CASIA-FASD; (a) print attack, (b) print attack with holes cut in the eyes, (c) replay attack, and (d) the genuine image. Note: A blue line is added to secure personal privacy.

Figure 2. Examples of the forged and genuine facial image in CelebA-Spoof; (a) print attack, (b) print attack with a face cut, (c) replay attack, and (d) the genuine image. Note: A blue line is added to secure personal privacy.

Figure 3. Examples of the forged and genuine facial image in NIA-ILD; (a) print attack directly attaching a printed photo, (b) same as (a) with bending, (c) print attack attaching a printed photo with holes cut in the eyes, nose, and mouth areas, (d) same as (c) with bending, (e) replay attack using a smartphone, (f) replay attack using a tablet PC, and (g) the genuine image. Note: A blue line is added to secure personal privacy.

Figure 4. Network architecture of the proposed deep learning algorithm. In Stage 2, the black rounded rectangle labeled “model” represents the entire network of Stage 1.

Figure 5. The confusion matrix as the overall experimental result of the proposed algorithm.

Table 1. Advantages and weaknesses of selected biometrics.

Features	Methods	Advantages	Weaknesses
Physiological	Fingerprint	High resolution low price	Easy to duplicate
	Face	Contactless low price	Sensitive to lighting weak at disguise
	Iris	Not duplicable high resolution	Inconvenient to use high price
	Vein	Contactless	high resolution High price
	Palmprint	Easy to use	Low resolution high price
Behavioral	Signature	Easy to use low price	Low resolution easy to duplicate
Behavioral	Voiceprint	Contactless low price	Sensitive to human variation low resolution

Table 2. Significant facial forgery attacks.

Types	Techniques	Use Cases	Analysis
Deepfake Attacks	Face swapping, expression manipulation, lip sync manipulation, etc.	Fraud, malicious information dissemination, identity theft, etc.	Deepfake technology is becoming increasingly sophisticated, and detection techniques are also evolving. Various techniques are used, including deep learning-based detection models, facial feature analysis, and biometric signal analysis.
GAN (Generative Adversarial Network) Attacks	The generator model creates fake faces similar to real faces, and the discriminator model distinguishes between fake and real faces	Identity theft, fake profile creation, malware injection, etc.	GAN attacks can generate more sophisticated fake faces than Deepfake attacks, and defense techniques against them require more advanced technology.
Presentation Attacks	2D image attacks, 3D mask attacks, replay attacks, etc.	Illegal access, financial fraud, personal information leakage, etc.	Presentation attacks are constantly evolving, and defense techniques must also evolve to counter new attack methods.

Table 3. Characteristics of dataset used in the experiment.

Dataset	CASIA-FASD	CelebA-Spoof	NIA-ILD
number of images	8126 ¹	625,537	1,624,500
image type	JPG	PNG, JPG	JPG
resolution ²	512 × 512	512 × 512	512 × 512
forgery methods	print, papercut, replay	print, papercut, replay, 3D	print, replay, 3D

¹ extracted from 600 video clips; ² In order to maximize the effect of the deep learning algorithm and the accuracy of the results, the face area is extracted from the provided images and resized to the resolution of 512 × 512 pixels.

Table 4. The overall experimental results of the proposed algorithm.

Classification	ACER	Accuracy	Precision	Recall	f1-Score
Genuine	0.057	0.92	0.99	0.91	0.95
Forged	0.057	0.92	0.70	0.98	0.82

Table 5. The experimental results of the proposed algorithm regarding print attack as test dataset.

Classification	ACER	Accuracy	Precision	Recall	f1-Score
Genuine	0.10	0.93	0.97	0.95	0.96
Forged	0.10	0.93	0.74	0.85	0.79

Table 7. The ACER of the proposed algorithm compared to the previous studies.

Algorithm	ACER
One-class adaptation [15]	1.69
Learnable gradient operator [16]	0.8
Multi-domain [29]	2.6
Simulating spoofing clues [30]	2.33
multi-regional [31]	1.3
proposed	0.06

Table 6. The experimental results of the proposed algorithm regarding replay attack as test dataset.

Classification	ACER	Accuracy	Precision	Recall	f1-Score
Genuine	0.15	0.89	0.94	0.92	0.93
Forged	0.15	0.89	0.74	0.79	0.76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, H. Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition. Appl. Sci. 2025, 15, 3613. https://doi.org/10.3390/app15073613

AMA Style

Kim H. Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition. Applied Sciences. 2025; 15(7):3613. https://doi.org/10.3390/app15073613

Chicago/Turabian Style

Kim, Hansoo. 2025. "Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition" Applied Sciences 15, no. 7: 3613. https://doi.org/10.3390/app15073613

APA Style

Kim, H. (2025). Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition. Applied Sciences, 15(7), 3613. https://doi.org/10.3390/app15073613

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel Deep Learning-Based Facial Forgery Detection for Effective Biometric Recognition

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Experimental Environment

2.2. Proposed Algorithm

2.2.1. Classification and Control Criteria

2.2.2. Proposed Deep Learning Network

3. Results

3.1. Implementation Details

3.2. Internal Analysis

3.2.1. Performance Evaluation

3.2.2. Leave-One-Out Cross-Validation

3.3. Comparative Analysis

4. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI