Exposing Face Manipulation Based on Generative Adversarial Network–Transformer and Fake Frequency Noise Traces
Abstract
:1. Introduction
- We introduce a novel fusion framework for deepfake face feature detection—FFC. The framework mainly consists of three modules: GAN and transformer image feature extraction module, frequency domain and noise feature extraction module, and feature true and false identification classification module. The detected face image’s global and local feature coherence and consistency are used to enhance the forgery detection ability.
- We use the reconstructed GAN and transformer blocks to extract features from fake face images. ResNet with added dilated convolutions is used as a generator to generate fake images. Global and local convolutional networks are used as discriminators, and transformer blocks are connected to identify features of the generated fake images, improving the ability to identify subtle features of fake images.
- We designed a frequency domain and noise feature detection module to detect frequency domain anomalies and noise discrepancies present in the forged images.
- Our designed FFC fusion network model for deep forgery face image detection performs more efficiently and robustly than other good models in tests based on FF++, Celeb-DF, and DFDC datasets.
2. Related Works
2.1. Deepfake Methods
2.2. Face Forgery Detection
3. Materials and Methods
3.1. Feature Extraction Backbone
3.2. Frequency Domain and Noise Feature Prediction Module and Classification Module
4. Results
4.1. Datasets and Implementation Details
4.2. Evaluation Metrics
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Perov, I.; Gao, D.; Chervoniy, N.; Liu, K.; Marangonda, S.; Umé, C.; Facenheim, C.S.; RP, L.; Jiang, J.; Zhang, S.; et al. DeepFaceLab: Integrated, flexible and extensible face-swapping framework. arXiv 2020, arXiv:2005.05535. [Google Scholar]
- FaceSwap. Available online: https://github.com/MarekKowalski/FaceSwap/ (accessed on 1 October 2024).
- Facefusion. Available online: https://github.com/facefusion/facefusion (accessed on 1 October 2024).
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Luo, Z.; Chen, D.; Zhang, Y.; Huang, Y.; Wang, L.; Shen, Y.; Zhao, D.; Zhou, J.; Tan, T. Videofusion: Decomposed diffusion models for high-quality video generation. arXiv 2023, arXiv:2303.08320. [Google Scholar]
- Megahed, A.; Han, Q.; Fadl, S. Exposing deepfake using fusion of deep-learned and hand-crafted features. Multimed. Tools Appl. 2024, 83, 26797–26817. [Google Scholar] [CrossRef]
- Theerthagiri, P.; basha Nagaladinne, G. Deepfake Face Detection Using Deep InceptionNet Learning Algorithm. In Proceedings of the 2023 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 18–19 February 2023; pp. 1–6. [Google Scholar]
- Patel, Y.; Tanwar, S.; Bhattacharya, P.; Gupta, R.; Alsuwian, T.; Davidson, I.E.; Mazibuko, T.F. An improved dense CNN architecture for deepfake image detection. IEEE Access 2023, 11, 22081–22095. [Google Scholar] [CrossRef]
- Ismail, A.; Elpeltagy, M.; Zaki, M.; ElDahshan, K.A. Deepfake video detection: YOLO-Face convolution recurrent approach. PeerJ. Comput. Sci. 2021, 7, e730. [Google Scholar] [CrossRef]
- Wu, J.; Zhu, Y.; Jiang, X.; Liu, Y.; Lin, J. Local attention and long-distance interaction of rPPG for deepfake detection. Vis. Comput. 2024, 40, 1083–1094. [Google Scholar] [CrossRef]
- Ismail, A.; Elpeltagy, M.; Zaki, M.; Eldahshan, K. A new deep learning-based methodology for video deepfake detection using XGBoost. Sensors 2021, 21, 5413. [Google Scholar] [CrossRef]
- Soudy, A.H.; Sayed, O.; Tag-Elser, H.; Ragab, R.; Mohsen, S.; Mostafa, T.; Abohany, A.A.; Slim, S.O. Deepfake detection using convolutional vision transformers and convolutional neural networks. Neural Comput. Appl. 2024, 36, 19759–19775. [Google Scholar] [CrossRef]
- Xue, Z.; Jiang, X.; Liu, Q.; Wei, Z. Global–local facial fusion based GAN generated fake face detection. Sensors 2023, 23, 616. [Google Scholar] [CrossRef]
- Bonettini, N.; Cannas, E.D.; Mandelli, S.; Bondi, L.; Bestagini, P.; Tubaro, S. Video face manipulation detection through ensemble of cnns. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5012–5019. [Google Scholar]
- Zhao, T.; Xu, X.; Xu, M.; Ding, H.; Xiong, Y.; Xia, W. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15023–15033. [Google Scholar]
- Beal, J.; Kim, E.; Tzeng, E.; Park, D.H.; Zhai, A.; Kislyuk, D. Toward transformer-based object detection. arXiv 2020, arXiv:2012.09958. [Google Scholar]
- Wang, B.; Wu, X.; Tang, Y.; Shan, Z.; Wei, F. Frequency domain filtered residual network for deepfake detection. Mathematics 2023, 11, 816. [Google Scholar] [CrossRef]
- Verdoliva, L. Media forensics and deepfakes: An overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
- Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2387–2395. [Google Scholar]
- Li, L.; Bao, J.; Yang, H.; Chen, D.; Wen, F. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv 2019, arXiv:1912.13457. [Google Scholar]
- Nirkin, Y.; Keller, Y.; Hassner, T. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republlic of Korea, 27 October–2 November 2019; pp. 7184–7193. [Google Scholar]
- Chen, R.; Chen, X.; Ni, B.; Ge, Y. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2003–2011. [Google Scholar]
- Kim, W.; Suh, S.; Han, J.J. Face liveness detection from a single image via diffusion speed model. IEEE Trans. Image Process. 2015, 24, 2456–2465. [Google Scholar] [CrossRef]
- Li, H.; Li, B.; Tan, S.; Huang, J. Identification of deep network generated images using disparities in color components. Signal Process. 2020, 174, 107616. [Google Scholar] [CrossRef]
- Lugstein, F.; Baier, S.; Bachinger, G.; Uhl, A. PRNU-based deepfake detection. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Virtual, 22–25 June 2021; pp. 7–12. [Google Scholar]
- Yu, M.; Ju, S.; Zhang, J.; Li, S.; Lei, J.; Li, X. Patch-DFD: Patch-based end-to-end DeepFake discriminator. Neurocomputing 2022, 501, 583–595. [Google Scholar] [CrossRef]
- Alkishri, W.; Widyarto, S.; Yousif, J.H. Evaluating the Effectiveness of a Gan Fingerprint Removal Approach in Fooling Deepfake Face Detection. J. Internet Serv. Inf. Secur. (JISIS) 2024, 14, 85–103. [Google Scholar] [CrossRef]
- Zhao, Y.; Jin, X.; Gao, S.; Wu, L.; Yao, S.; Jiang, Q. TAN-GFD: Generalizing face forgery detection based on texture information and adaptive noise mining. Appl. Intell. 2023, 53, 19007–19027. [Google Scholar] [CrossRef]
- Kohli, A.; Gupta, A. Detecting deepfake, faceswap and face2face facial forgeries using frequency cnn. Multimed. Tools Appl. 2021, 80, 18461–18478. [Google Scholar] [CrossRef]
- Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3207–3216. [Google Scholar]
- Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republlic of Korea, 27 October–2 November 2019; pp. 1–11. [Google Scholar]
- Dolhansky, B.; Howes, R.; Pflaum, B. The deepfake detection challenge (dfdc) preview dataset. arXiv 2019, arXiv:1910.08854. [Google Scholar]
- Deng, J.; Guo, J.; Zhou, Y.; Yu, J.; Kotsia, I.; Zafeiriou, S. Retinaface: Single-stage dense face localisation in the wild. arXiv 2019, arXiv:1905.00641. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 20–36. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Wang, X.; Miao, Z.; Zhang, R.; Hao, S. I3d-lstm: A new model for human action recognition. IOP Conf. Ser. Mater. Sci. Eng. 2019, 569, 032035. [Google Scholar] [CrossRef]
- Zhou, P.; Han, X.; Morariu, V.I.; Davis, L.S. Two-stream neural networks for tampered face detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1831–1839. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
- Man, Q.; Cho, Y.I.; Gee, S.J.; Kim, W.J.; Jang, K.A. GAN-Based High-Quality Face-Swapping Composite Network. Electronics 2024, 13, 3092. [Google Scholar] [CrossRef]
Database | Total Videos | Real Videos | Fake Videos |
---|---|---|---|
FaceForensics++ (FF++) | 5000 | 1000 | 4000 |
Celeb-DF | 6229 | 590 | 5639 |
DFDC | 128,154 | 23,957 | 104,500 |
Model | ACC (%) | AUC (%) |
---|---|---|
TSN [35] | 61.1 | 62.8 |
C3D [36] | 64.3 | 65.4 |
I3D [37] | 68.7 | 69.7 |
Two-stream [38] | 70.10 | 72.35 |
Ours | 78.91 | 84.06 |
Model | Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|
FF++ | Celeb-DF | DFDC | |||||||
ACC | AUC | F1-Score | ACC | AUC | F1-Score | ACC | AUC | F1-Score | |
Xception | 84.91 | 85.35 | 84.23 | 67.40 | 70.05 | 68.57 | 69.90 | 70.83 | 68.65 |
Wu et al. | 85.45 | 85.82 | 83.23 | 68.94 | 72.90 | 70.39 | 73.02 | 77.11 | 74.10 |
Patch-DFD | 86.83 | 88.20 | 87.01 | 72.85 | 75.38 | 73.15 | 75.06 | 78.93 | 75.33 |
TAN-GFD | 90.37 | 92.51 | 90.35 | 83.71 | 87.17 | 84.59 | 84.63 | 86.79 | 84.05 |
fCNN | 92.12 | 93.08 | 90.86 | 86.33 | 86.92 | 81.30 | 82.73 | 84.74 | 80.18 |
GANS | 96.09 | 97.95 | 95.17 | 88.05 | 90.30 | 87.93 | 86.01 | 89.43 | 86.13 |
FEB (ours) | 97.42 | 97.57 | 96.88 | 88.39 | 90.83 | 89.34 | 85.95 | 89.62 | 86.54 |
FEB+FFM (ours) | 98.75 | 99.43 | 98.67 | 89.54 | 91.81 | 89.70 | 86.79 | 90.67 | 87.45 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Man, Q.; Cho, Y.-I. Exposing Face Manipulation Based on Generative Adversarial Network–Transformer and Fake Frequency Noise Traces. Sensors 2025, 25, 1435. https://doi.org/10.3390/s25051435
Man Q, Cho Y-I. Exposing Face Manipulation Based on Generative Adversarial Network–Transformer and Fake Frequency Noise Traces. Sensors. 2025; 25(5):1435. https://doi.org/10.3390/s25051435
Chicago/Turabian StyleMan, Qiaoyue, and Young-Im Cho. 2025. "Exposing Face Manipulation Based on Generative Adversarial Network–Transformer and Fake Frequency Noise Traces" Sensors 25, no. 5: 1435. https://doi.org/10.3390/s25051435
APA StyleMan, Q., & Cho, Y.-I. (2025). Exposing Face Manipulation Based on Generative Adversarial Network–Transformer and Fake Frequency Noise Traces. Sensors, 25(5), 1435. https://doi.org/10.3390/s25051435