Deepfake Image Forensics for Privacy Protection and Authenticity Using Deep Learning
Abstract
:1. Introduction
1.1. Contributions
- We focused on data augmentation and utilized a variety of techniques to enhance the training of deep learning models, making them scalable and adaptable to different types of media.
- We employed multimedia analysis to develop effective methods for distinguishing between real and fake images, thereby ensuring the reliability of information.
1.2. Artifacts Captured
- CNN: CNNs are used to extract spatial features from images. This can be helpful for texture detection, and identifying anomalies and inconsistencies in facial landmarks (eyes, nose, mouth).
- RNN (LSTM/GRU): These models are used to capture temporal inconsistencies including unnatural eye blinking and mismatched lip movement.
- GAN-based autoencoders: These models help to detect manipulated features by reconstructing input data and comparing discrepancies.
- TCNs (temporal convolutional networks): TCNs are a great way to analyze long-range dependencies in video sequences.
1.3. Feature Fusion
- The extracted spatial features (from CNN) and temporal features (from RNN/TCN) are fused using a concatenation layer, followed by fully connected layers for classification.
- This fusion enhances robustness by combining frame-level features (textures, distortions) with sequence-level cues (motion inconsistencies).
2. Background
- (a)
- CNN-LSTM:
- (b)
- CNN-GRU:
- (c)
- TCN (temporal convolutional network):
- (d)
- GAN-Autoencoded Features:
- Accuracy: Accuracy measures the proportion of true results (both true positives and true negatives) compared to the total number of cases. Equation (1) defines the accuracy metric, which calculates correctly predicted cases among all available instances. The formula is
- Precision: This indicates the proportion of true positives among the total predicted positives. It is defined in Equation (2) as
- Recall: Recall measures the proportion of true positives identified among the actual positives. Recall is measured using the following Equation (3):
- F1 score: This is the harmonic mean of precision and recall, providing a single metric that balances both. The F1 score, represented in Equation (4) below, gives a realistic measure to handle the tradeoff between precision and recall.
3. Related Work
4. Proposed Models and Forensics Techniques
4.1. Dataset Description
4.1.1. Manipulation Methods
- Deepfakes: Utilizes deep learning for face swapping, replacing a target face with one from another video.
- Face2Face: Alters facial expressions in the target video to match those of a source actor.
- FaceSwap: A traditional face-swapping technique that does not rely on deep learning.
- Neural Textures: Uses GAN-based techniques to manipulate facial features, producing highly realistic details.
4.1.2. Dataset Scale
4.2. Data Preprocessing
4.3. Artifact Landmark Detection
4.4. Correlation Between the Artifacts to Identify Correlated Pairs
- A strong positive correlation (features move together) is represented by deep red.
- Deep blue means a strong negative correlation (the first feature increases while the second decreases).
- Strong or positive correlation is indicated by negative shades, while lighter shades suggest weak or no correlation.
- Individual feature analysis:
- Nose: It is apparent that parameters such as width, height, tip location, and nostril symmetry correlate amazingly, which means that these facial dimensions often change at the same time during various movements and facial expressions.
- Mouth: The coordinated variations in the upper and the lower jaw’s height and width, and the changes occurring during the mouth movements (speaking or smiling) suggest very strong correlations among these parameters.
- Eyes: The eye-determined indicators such as eye aspect ratio (EAR), blink frequency, amplitude, and duration, as well as the pupils’ sizes and movements, typically exhibit high correlations, implying that blinks and eye movements are closely related.
- Inter-feature correlation analysis:
- Nose and eyes: Exploring the relationship between nose position/dimensions and eye movements/closures can reveal coordination between these features during blinks or facial expressions.
- Nose and mouth: This analysis checks whether movements of the mouth correlate with changes in the nose area, which might occur during various expressions.
- Eyes and mouth: The focus here is on whether movements in the eyes (like blinking) are synchronized with mouth movements, which would be common during expressions or speech.
- Strength of correlations:
- Strong correlation (>0.7): This indicates that features move in tandem. For example, a strong correlation between the position of the nose tip X and nose bridge X suggests synchronized movements in these features.
- Moderate correlation (0.3 to 0.7): This suggests a relationship but with less consistency. For instance, a moderate correlation between mouth aspect ratio and average eyelid movement might indicate that certain expressions affecting the mouth could also impact eyelid movements.
- Weak correlation (<0.3): This shows little to no linear relationship. For example, a weak correlation between left EAR and nose shape Y implies that eye closures do not consistently correlate with the nose’s vertical dimensions.
4.5. Artifact Annotations
- Collect: Lists relevant artifacts and features along with the similar or related information necessary for deepfake detection.
- Noise Remover: Removes noise and irrelevant data and improves the overall data quality.
- Transform: Formats and standardizes the data to make the data consistent across the dataset.
- Enrich: Augments the dataset to increase the robustness of the deep learning models.
4.6. Data Preparation in Deepfake Forensics
- Noise removal;
- Data transformation;
- Data enrichment.
4.6.1. Noise Removal
4.6.2. Data Transformation
4.6.3. Data Enrichment
4.7. Artifact Sample Augmentation
4.8. Artifact Balancing
- Synthetic handling of imbalanced features with SMOTE:Artifact tags balancing is carried out for handling imbalanced classes. This step deals with removing any possible class imbalances in the dataset, so the model will receive an equal number of authentic and manipulated artifacts. To improve model generalizability, we apply techniques like the Synthetic Minority Over-sampling Technique (SMOTE) for under-represented classes, hence generating synthetic samples.The Synthetic Minority Over-sampling Technique (SMOTE) is applied to the training set. This also entails the creation of synthetic samples in the minority class, thus bringing balance in the composition of the dataset. This happens only for the training data to allow the model to learn the real unbalanced data and unseen data environments.
- Artifact distributions:For analytical or modeling tasks, the data are split into train and test sets. The train set is used to train the model, while the test set evaluates its performance. Table 7 shows that data augmentation techniques have been used to balance the dataset, providing a significantly larger training sample size.
4.8.1. K-Fold Cross-Validation
4.8.2. Artifact Transformation
4.9. Workflow Using Models
4.10. Model Training
Model Training Pseudocode
Algorithm 1 Detailed Pseudocode for GAN and Deep Learning Model Operations |
|
Algorithm 2 Detailed Pseudocode for GAN and Deep Learning Model Operations—Part 2 |
|
5. Results and Discussion
- Experiment 1: Eye landmarks;
- Experiment 2: Fusion of eyes and nose landmark facial region;
- Experiment 3: Fusion of eyes, nose, and mouth landmark facial region.
5.1. Experiment 1: Eye Landmarks
- The training and validation curves of CNN-LSTM demonstrate the highest accuracy and the least gap between the training and validation curves, which further signifies that there is less overfitting occurring.
- Next is TCN, which performs nearly as well and shows stable and reliable learning for the temporal analysis of artifacts.
5.2. Experiment 2: Fusion of Eyes and Nose Landmark Facial Region
5.3. Experiment 3: Fusion of Eyes, Nose, and Mouth Landmark Facial Region
5.4. Ablation Studies
- CNN-only: Extracts spatial features such as texture anomalies and inconsistencies in facial landmarks but lacks temporal awareness.
- CNN + RNN (LSTM/GRU): Incorporates temporal inconsistencies (e.g., unnatural blinking and lip-sync issues) but lacks generative reconstruction for deeper forgery detection.
- CNN + GAN: Detects manipulation artifacts through reconstruction loss but does not capture temporal dependencies.
- CNN + TCN: Captures long-range dependencies in video sequences but lacks generative forgery detection.
- Full model (CNN + RNN + GAN + TCN): Integrates all components to leverage spatial, temporal, generative, and sequential dependencies.
5.5. Comparative Study of Landmark-Based Deepfake Detection Techniques
5.6. State-of-the-Art Table
5.7. Real-World Applicability
6. Conclusions
Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CNN | Convolutional neural network |
GRU | Gated recurrent unit |
GANs | Generative adversarial networks |
TCN | Temporal convolutional network |
AUC | Area under the curve |
RNN | Recurrent neural network |
LSTM | Long short-term memory |
VAE | Variational autoencoder |
MLP | Multi-layer perceptron |
SMOTE | Synthetic Minority Oversampling Technique |
FF++ | FaceForensics++ |
MAR | Mouth aspect ratio |
EAR | Eye aspect ratio |
DNN | Deep neural network |
ROC | Receiver operating characteristic |
MSE | Mean squared error |
FF-LPBH | Fisherface Local Binary Pattern Histogram |
YOLO | You Only Look Once |
FCC-GAN | Fully connected convolutional generative adversarial network |
PGGAN | Progressive Growing of GANs |
CRNN | Convolutional recurrent neural network |
DBN | Deep belief network |
OC-FakeDetect | One-Class Fake Detection |
C-GAN | Conditional generative adversarial network |
AddNets | Attention-based deepfake detection networks |
KL-Divergence | Kullback–Leibler divergence |
IQR | Interquartile range |
CSV | Comma-separated value |
ReLU | Rectified linear unit |
SVM | Support vector machine |
References
- Koopman, M.; Rodriguez, A.M.; Geradts, Z. Detection of deepfake video manipulation. In Proceedings of the 20th Irish Machine Vision and Image Processing Conference (IMVIP), Belfast, Northern Ireland, 10–12 September 2018; pp. 133–136. [Google Scholar]
- Chesney, B.; Citron, D. Deep fakes: A looming challenge for privacy, democracy, and national security. Calif. Law Rev. 2019, 107, 1753. [Google Scholar] [CrossRef]
- Harris, D. Deepfakes: False pornography is here and the law cannot protect you. Duke Law Technol. Rev. 2018, 17, 99. [Google Scholar]
- Masood, M.; Nawaz, M.; Malik, K.M.; Javed, A.; Irtaza, A.; Malik, H. Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward. Appl. Intell. 2022, 53, 3974–4026. [Google Scholar] [CrossRef]
- Guarnera, L.; Giudice, O.; Battiato, S.; Guarnera, F.; Ortis, A.; Puglisi, G.; Paratore, A.; Bui, L.M.Q.; Fontani, M.; Coccomini, D.A.; et al. The Face Deepfake Detection Challenge. J. Imaging 2022, 8, 263. [Google Scholar] [CrossRef] [PubMed]
- Patel, M.; Gupta, A.; Tanwar, S.; Obaidat, M. Trans-DF: A transfer learning-based end-to-end deepfake detector. In Proceedings of the 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 30–31 October 2020; pp. 796–801. [Google Scholar]
- CB Insights. The Future of Information Warfare. 2024. Available online: https://www.cbinsights.com/research/future-of-information-warfare/ (accessed on 3 November 2024).
- Bracken, B. Deepfake Attacks Are About to Surge, Experts Warn. 2021. Available online: https://threatpost.com/deepfake-attacks-surge-experts-warn/165798/ (accessed on 30 November 2022).
- FakeApp. Available online: https://www.fakeapp.org/ (accessed on 30 November 2022).
- FaceApp. Available online: https://www.faceapp.com/ (accessed on 30 November 2022).
- Korshunov, P.; Marcel, S. Deepfakes: A new threat to face recognition? Assessment and detection. arXiv 2018, arXiv:1812.08685. [Google Scholar]
- Verdoliva, L. Media forensics and deepfakes: An overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
- Li, L.; Bao, J.; Zhang, T.; Yang, H.; Chen, D.; Wen, F.; Guo, B. Face X-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5001–5010. [Google Scholar]
- Dang, H.; Liu, F.; Stehouwer, J.; Liu, X.; Jain, A.K. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5781–5790. [Google Scholar]
- Suganthi, S.T.; Ayoobkhan, M.U.A.; Bacanin, N.; Venkatachalam, K.; Štěpán, H.; Pavel, T. Deep learning model for deep fake face recognition and detection. PeerJ Comput. Sci. 2022, 8, e881. [Google Scholar]
- Ismail, A.; Elpeltagy, M.; Zaki, M.; ElDahshan, K.A. Deepfake video detection: YOLO-Face convolution recurrent approach. PeerJ Comput. Sci. 2021, 7, e730. [Google Scholar] [CrossRef] [PubMed]
- Chauhan, S.S.; Jain, N.; Pandey, S.C.; Chabaque, A. Deepfake Detection in Videos and Pictures: Analysis of Deep Learning Models and Dataset. In Proceedings of the 2022 IEEE International Conference on Data Science and Information System (ICDSIS), Hassan, India, 29–30 July 2022; pp. 1–5. [Google Scholar]
- Groh, M.; Epstein, Z.; Firestone, C.; Picard, R. Deepfake detection by human crowds, machines, and machine-informed crowds. Proc. Natl. Acad. Sci. USA 2022, 119, e2110013119. [Google Scholar] [CrossRef]
- Kiran, B.R.; Thomas, D.M.; Parakkal, R. An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 2018, 4, 36. [Google Scholar] [CrossRef]
- Raza, A.; Munir, K.; Almutairi, M. A Novel Deep Learning Approach for Deepfake Image Detection. Appl. Sci. 2022, 12, 9820. [Google Scholar] [CrossRef]
- Khochare, J.; Joshi, C.; Yenarkar, B.; Suratkar, S.; Kazi, F. A deep learning framework for audio deepfake detection. Arab. J. Sci. Eng. 2022, 47, 3447–3458. [Google Scholar]
- Rana, M.S.; Sung, A.H. Deepfakestack: A deep ensemble-based learning technique for deepfake detection. In Proceedings of the 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; pp. 70–75. [Google Scholar]
- Huang, B.; Wang, Z.; Yang, J.; Ai, J.; Zou, Q.; Wang, Q.; Ye, D. Implicit identity driven deepfake face swapping detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4490–4499. [Google Scholar]
- Kong, C.; Chen, B.; Li, H.; Wang, S.; Rocha, A.; Kwong, S. Detect and locate: Exposing face manipulation by semantic-and noise-level telltales. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1741–1756. [Google Scholar] [CrossRef]
- Yan, Z.; Zhang, Y.; Fan, Y.; Wu, B. UCF: Uncovering common features for generalizable deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 22412–22423. [Google Scholar]
- Luo, A.; Cai, R.; Kong, C.; Kang, X.; Huang, J.; Kot, A.C. Forgery-aware adaptive vision transformer for face forgery detection. arXiv 2023, arXiv:2309.11092. [Google Scholar]
- Jia, S.; Lyu, R.; Zhao, K.; Chen, Y.; Yan, Z.; Ju, Y.; Hu, C.; Li, X.; Wu, B.; Lyu, S. Can ChatGPT detect deepfakes? A study of using multimodal large language models for media forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4324–4333. [Google Scholar]
- Matern, F.; Riess, C.; Stamminger, M. Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 83–92. [Google Scholar]
- Nguyen, H.H.; Yamagishi, J.; Echizen, I. Use of a capsule network to detect fake images and videos. arXiv 2019, arXiv:1910.12467. [Google Scholar]
- Sabir, E.; Cheng, J.; Jaiswal, A.; AbdAlmageed, W.; Masi, I.; Natarajan, P. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 2019, 3, 80–87. [Google Scholar]
- Trinh, L.; Tsang, M.; Rambhatla, S.; Liu, Y. Interpretable and trustworthy deepfake detection via dynamic prototypes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021. [Google Scholar]
- Li, Y.; Chang, M.C.; Lyu, S. In ictu oculi: Exposing AI-generated fake face videos by detecting eye blinking. arXiv 2018, arXiv:1806.02877. [Google Scholar]
- Alshaikh, A. Application of Cortical Learning Algorithms to Movement Classification Towards Automated Video Forensics; Diss. Staffordshire University: Stafford, UK, 2019; Available online: https://eprints.staffs.ac.uk/5577/ (accessed on 3 November 2024).
- Chintha, A.; Thai, B.; Sohrawardi, S.J.; Bhatt, K.; Hickerson, A.; Wright, M.; Ptucha, R. Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J. Sel. Top. Signal Process. 2020, 14, 1024–1037. [Google Scholar] [CrossRef]
- Afchar, D.; Nozick, V.; Yamagishi, J.; Echizen, I. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, 11–13 December 2018; pp. 1–7. [Google Scholar]
- Yang, X.; Li, Y.; Lyu, S. Exposing deepfakes using inconsistent head poses. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8261–8265. [Google Scholar]
- Kshirsagar, M.; Suratkar, S.; Kazi, F. Deepfake Video Detection Methods using Deep Neural Networks. In Proceedings of the 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), Kannur, India, 11–12 August 2022; pp. 27–34. [Google Scholar]
- Rana, M.S.; Nobi, M.N.; Murali, B.; Sung, A.H. Deepfake detection: A systematic literature review. IEEE Access 2022, 10, 25494–25513. [Google Scholar]
- KoÇak, A.; Alkan, M. Deepfake Generation, Detection and Datasets: A Rapid-review. In Proceedings of the 2022 15th International Conference on Information Security and Cryptography (ISCTURKEY), Ankara, Turkey, 19–20 October 2022; pp. 86–91. [Google Scholar]
- Zi, B.; Chang, M.; Chen, J.; Ma, X.; Jiang, Y.G. Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2382–2390. [Google Scholar]
- Shahzad, H.F.; Rustam, F.; Flores, E.S.; Mazón, J.L.V.; Diez, I.T.; Ashraf, I. A Review of Image Processing Techniques for Deepfakes. Sensors 2022, 22, 4556. [Google Scholar] [CrossRef]
- Khalid, H.F.; Woo, S.S. OC-FakeDect: Classifying deepfakes using one-class variational autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 656–657. [Google Scholar]
- Malik, Y.S.; Sabahat, N.; Moazzam, M.O. Image Animations on Driving Videos with DeepFakes and Detecting DeepFakes Generated Animations. In Proceedings of the 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan, 5–7 November 2020; pp. 1–6. [Google Scholar]
- Hashmi, M.F.; Ashish, B.K.K.; Keskar, A.G.; Bokde, N.D.; Yoon, J.H.; Geem, Z.W. An exploratory analysis on visual counterfeits using conv-lstm hybrid architecture. IEEE Access 2020, 8, 101293–101308. [Google Scholar] [CrossRef]
- Rössler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. FaceForensics++: Learning to Detect Manipulated Facial Images. 2020. Available online: http://github.com/ondyari/FaceForensics (accessed on 3 November 2024).
- Karras, T.; Laine, S.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. arXiv 2020, arXiv:1912.04958. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G.; et al. Bootstrap Your Own Latent—A New Approach to Self-Supervised Learning. arXiv 2020, arXiv:2006.07733. [Google Scholar]
- Ho, J.; Salimans, T.; Chan, W.; Chen, B.; Schulman, J.; Sutskever, I.; Abbeel, P. Cascaded Diffusion Models for High Fidelity Image Generation. arXiv 2021, arXiv:2102.00732. [Google Scholar]
Training Parameter | Value |
---|---|
Epochs | 10 |
Batch Size | 32 |
Optimizer | Adam |
Loss Function | Binary cross-entropy |
Metric | Accuracy |
Training Parameter | Value |
---|---|
Epochs | 10 |
Batch Size | 32 |
Optimizer | Adam |
Loss Function | Binary cross-entropy |
Metric | Accuracy |
Training Parameter | Value |
---|---|
Epochs | 10 |
Batch Size | 32 |
Optimizer | Adam |
Loss Function | Binary cross-entropy |
Metric | Accuracy |
Training Parameter | Value |
---|---|
Epochs | 10 |
Batch Size | 32 |
Optimizer | Adam |
Loss Function | Mean squared error (MSE) |
Metric | Accuracy |
Ref. | Featured-Based Methodology | Classifier | Best Performance | Datasets |
---|---|---|---|---|
[28] | Combined Visual Features of eyes and teeth | Logistic Regression, MLP | AUC = 0.851 Accuracy = 0.854 Precision = 0.807 Recall = 0.849 F1 Score = 0.828 | FaceForensics++ |
[29] | Deep learning features | Capsule Network | AUC = 0.91 Accuracy = 0.91 F1 Score = 0.91 Precision = 0.92 Recall = 0.08 | FaceForensics++ |
[30] | Image + Temporal features | CNN + RNN | AUC = 0.93 Accuracy = 0.939 Precision = 0.92 Recall = 0.08 F1 Score = 0.91 | FF++ (FaceSwap, DeepFakes, LQ) |
[31] | Image + Temporal features | Dynamic Prototype Network | AUC = 0.718 Accuracy = 0.72 Precision = 0.73 Recall = 0.26 F1 Score = 0.73 | FF++ (Face2Face, FaceSwap, HQ) |
[32] | Eye-blinking features | LRCN | AUC = 0.78 Accuracy = 0.76 Precision = 0.77 Recall = 0.22 | FaceForensics++ (Face Synthesis) |
[33] | Eye-blinking features | Distance | AUC = 0.875 Precision = 0.875 Recall = 0.778 F1 Score = 0.824 Accuracy = 0.85 | FaceForensics++ (Face Synthesis with the unnatural movement of the eye) |
Eyes | Nose | Mouth |
---|---|---|
Eye Aspect Ratio (EAR) | Nose Tip | Mouth Aspect Ratio (MAR) |
Blink Frequency and Amplitude | Nostril Symmetry | Mouth Symmetry |
Pupil Dilation | Nasal Base | Mouth Position (X, Y) |
Eyelid Creases and Movement | Nasal Sides | Lip Spacing |
Iris Texture and Diameter | Nasal Septum | Lip Boundary |
Eye Position and Aspect Ratio | Nasal Shape | Mouth Shape Dynamics |
Sclera-to-Iris Ratio | Nostrils Position (X, Y) | Mouth-to-Face Proportion |
Pupil-to-Iris Ratio | Nose Bridge | Corner of Mouth (Left X, Y; Right X, Y) |
Dataset | Ratio | Samples (Before Augmentation) | Samples (After Augmentation) |
---|---|---|---|
Training set | 80% | 33246 | 66492 |
Testing set | 20% | 7379 | 7379 |
Layer Type | Parameters |
---|---|
Input | shape = (input_dim,) |
Dense | units = 64, activation = ‘relu’ |
Dense | units = 32, activation = ‘relu’ |
Dense | units = 64, activation = ‘relu’ |
Dense | units = input_dim, activation = ‘sigmoid’ |
Training Parameter | Value |
---|---|
Epochs | 50 |
Batch Size | 256 |
Optimizer | Adam |
Loss Function | Mean squared error (MSE) |
Metric | Accuracy |
Model | Precision (without GAN) | Recall (without GAN) | F1 Score (without GAN) | Precision (with GAN) | Recall (with GAN) | F1 Score (with GAN) |
---|---|---|---|---|---|---|
CNN | 0.896 | 0.884 | 0.890 | 0.915 | 0.902 | 0.908 |
CNN-GRU | 0.902 | 0.890 | 0.896 | 0.920 | 0.910 | 0.915 |
CNN-LSTM | 0.910 | 0.902 | 0.906 | 0.928 | 0.916 | 0.922 |
TCN | 0.917 | 0.910 | 0.913 | 0.935 | 0.920 | 0.927 |
Model | Precision (without GAN) | Recall (without GAN) | F1 Score (without GAN) | Precision (with GAN) | Recall (with GAN) | F1 Score (with GAN) |
---|---|---|---|---|---|---|
CNN | 0.875 | 0.860 | 0.867 | 0.895 | 0.880 | 0.887 |
CNN-GRU | 0.890 | 0.875 | 0.882 | 0.910 | 0.895 | 0.902 |
CNN-LSTM | 0.898 | 0.882 | 0.890 | 0.918 | 0.902 | 0.910 |
TCN | 0.905 | 0.890 | 0.897 | 0.925 | 0.910 | 0.917 |
Model | Precision (without GAN) | Recall (without GAN) | F1 Score (without GAN) | Precision (with GAN) | Recall (with GAN) | F1 Score (with GAN) |
---|---|---|---|---|---|---|
CNN | 0.865 | 0.850 | 0.857 | 0.885 | 0.870 | 0.877 |
CNN-GRU | 0.880 | 0.865 | 0.872 | 0.900 | 0.885 | 0.892 |
CNN-LSTM | 0.890 | 0.875 | 0.882 | 0.910 | 0.895 | 0.902 |
TCN | 0.900 | 0.885 | 0.892 | 0.920 | 0.905 | 0.912 |
Ref. | Feature-Based Methodology | Classifier | Best Performance | Datasets |
---|---|---|---|---|
[31] | Image + temporal features | Dynamic Prototype Network | AUC = 0.718; Accuracy = 0.72; Precision = 0.73; Recall = 0.26; F1 score = 0.73 | FF++ (Face2Face, FaceSwap, HQ) |
[32] | Eye-blinking features | LRCN | AUC = 0.78; Accuracy = 0.76; Precision = 0.77; Recall = 0.22 | FaceForensics++ (Face Synthesis) |
[28] | Combined visual features of eyes and teeth | Logistic Regression, MLP | AUC = 0.851; Accuracy = 0.854; Precision = 0.807; Recall = 0.849; F1 Score = 0.828 | FaceForensics++ |
[33] | Eye-blinking features | Distance | AUC = 0.875; Precision = 0.875; Recall = 0.778; F1 Score = 0.824; Accuracy = 0.85 | FaceForensics++ (Face Synthesis with unnatural movement of the eye) |
[29] | Deep learning features | Capsule Network | AUC = 0.91; Accuracy = 0.91; F1 Score = 0.91; Precision = 0.92; Recall = 0.08 | FaceForensics++ |
[30] | Image + temporal features | CNN + RNN | AUC = 0.93; Accuracy = 0.939; Precision = 0.92; Recall = 0.08; F1 score = 0.91 | FF++ (FaceSwap, DeepFakes, LQ) |
This work | Spatiotemporal features + augmented facial landmarks with GAN model | TCN model for spatiotemporal analysis with augmentation + GAN | AUC = 0.93; Accuracy = 0.96; Precision = 0.98; F1 score = 0.98 | FF++ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sohail, S.; Sajjad, S.M.; Zafar, A.; Iqbal, Z.; Muhammad, Z.; Kazim, M. Deepfake Image Forensics for Privacy Protection and Authenticity Using Deep Learning. Information 2025, 16, 270. https://doi.org/10.3390/info16040270
Sohail S, Sajjad SM, Zafar A, Iqbal Z, Muhammad Z, Kazim M. Deepfake Image Forensics for Privacy Protection and Authenticity Using Deep Learning. Information. 2025; 16(4):270. https://doi.org/10.3390/info16040270
Chicago/Turabian StyleSohail, Saud, Syed Muhammad Sajjad, Adeel Zafar, Zafar Iqbal, Zia Muhammad, and Muhammad Kazim. 2025. "Deepfake Image Forensics for Privacy Protection and Authenticity Using Deep Learning" Information 16, no. 4: 270. https://doi.org/10.3390/info16040270
APA StyleSohail, S., Sajjad, S. M., Zafar, A., Iqbal, Z., Muhammad, Z., & Kazim, M. (2025). Deepfake Image Forensics for Privacy Protection and Authenticity Using Deep Learning. Information, 16(4), 270. https://doi.org/10.3390/info16040270