MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion
Abstract
:1. Introduction
- We are the first to design a custom convolution that adaptively learns textural and frequency features for the deepfake detection task with reference to the signal processing method, which brings a novel perspective on the use of textural and frequency features.
- We propose a new multi-feature fusion network to combine RGB features with textural and frequency features. We also introduce a new diversity loss to encourage the feature extraction module to learn features of different scales and directions.
- Extensive experiments demonstrate that our method outperforms the binary classification baselines and achieves state-of-the-art detection performance.
2. Related Work
2.1. Deepfake Generation Technology
2.2. Deepfake Detection Technology
3. Background
3.1. Discrete Cosine Transform
3.2. Frequency Domain Defects
3.3. Gabor Filter
- Wavelength (): represents the wavelength of the sinusoidal factor. Its value is specified in pixels and is usually not less than 2.
- Direction (): represents the orientation of the normal to the parallel stripes of a Gabor function.
- Phase shift (): is the maximum offset in the process of modulating the signal.
- Aspect ratio (): is the spatial aspect ratio and specifies the ellipticity of the support of the Gabor function.
- : is the sigma/standard deviation of the Gaussian envelope.
4. Method
4.1. Overview
4.2. Multi-Feature Fusion Framework
4.2.1. Feature Extraction Module
4.2.2. Textural Feature Enhancement
4.2.3. Attention Module
4.3. Diversity Loss
5. Experiments
5.1. Experimental Setup
5.1.1. Datasets
5.1.2. Evaluation Standard
5.1.3. Experimental Parameters
5.2. Within-Dataset Experiment
5.3. Ablation Experiment
5.4. Generalization Ability Evaluation
5.5. Robustness Experiment
5.5.1. Experimental Setup
- Blur: Filtered by a Gaussian filter with a kernel size randomly sampled from (3, 5, 7, 9);
- Cropping: The picture was randomly cropped along the x- and y-axes. The cropping percentage was sampled from U(5, 20), and the cropped image was resized to the original resolution;
- Compression(JPEG): JPEG compression was applied, and the remaining quality factor was sampled from U(8,80);
- Noise: Inner-diameter Gaussian noise was added to the image. The Gaussian distribution variance was randomly sampled from U(5.0, 20.0).
5.5.2. Experimental Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April 30–3 May 2018. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- West, J.; Bergstrom, C. Which Face is Real? 2019. Available online: http://www.whichfaceisreal.com (accessed on 11 May 2021).
- github. FaceAPP. Available online: https://faceapp.com/app (accessed on 11 May 2021).
- github. faceswap. Available online: https://github.com/MarekKowalski/FaceSwap/ (accessed on 11 May 2021).
- Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 1–11. [Google Scholar]
- Thies, J.; Zollhöfer, M.; Nießner, M. Deferred neural rendering: Image synthesis using neural textures. Acm Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Afchar, D.; Nozick, V.; Yamagishi, J.; Echizen, I. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; pp. 1–7. [Google Scholar]
- Li, L.; Bao, J.; Zhang, T.; Yang, H.; Chen, D.; Wen, F.; Guo, B. Face X-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 5001–5010. [Google Scholar]
- Matern, F.; Riess, C.; Stamminger, M. Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 83–92. [Google Scholar]
- Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Morales, A.; Ortega-Garcia, J. Deepfakes and beyond: A survey of face manipulation and fake detection. Inf. Fusion 2020, 64, 131–148. [Google Scholar] [CrossRef]
- Nguyen, H.H.; Fang, F.; Yamagishi, J.; Echizen, I. Multi-task learning for detecting and segmenting manipulated facial images and videos. In Proceedings of the 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), Tampa, FL, USA, 23–26 September 2019; pp. 1–8. [Google Scholar]
- Frank, J.; Eisenhofer, T.; Schönherr, L.; Fischer, A.; Kolossa, D.; Holz, T. Leveraging frequency analysis for deep fake image recognition. In Proceedings of the International Conference on Machine Learning, PMLR, Montréal, QC, Canada, 13–18 July 2020; pp. 3247–3258. [Google Scholar]
- Zhang, X.; Karaman, S.; Chang, S.F. Detecting and simulating artifacts in gan fake images. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Delft, The Netherlands, 9–12 December 2019; pp. 1–6. [Google Scholar]
- Liu, Z.; Qi, X.; Torr, P.H. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8060–8069. [Google Scholar]
- Durall, R.; Keuper, M.; Pfreundt, F.J.; Keuper, J. Unmasking deepfakes with simple features. arXiv 2019, arXiv:1911.00686. [Google Scholar]
- Qian, Y.; Yin, G.; Sheng, L.; Chen, Z.; Shao, J. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 86–103. [Google Scholar]
- Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3207–3216. [Google Scholar]
- Deepfakedetection. Available online: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html (accessed on 11 May 2021).
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International conference on machine learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Berthelot, D.; Schumm, T.; Metz, L. Began: Boundary equilibrium generative adversarial networks. arXiv 2017, arXiv:1703.10717. [Google Scholar]
- Kodali, N.; Abernethy, J.; Hays, J.; Kira, Z. On convergence and stability of gans. arXiv 2017, arXiv:1705.07215. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral Normalization for Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Li, C.L.; Chang, W.C.; Cheng, Y.; Yang, Y.; Póczos, B. MMD GAN: Towards deeper understanding of moment matching network. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 2200–2210. [Google Scholar]
- Yang, X.; Li, Y.; Lyu, S. Exposing deep fakes using inconsistent head poses. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8261–8265. [Google Scholar]
- Agarwal, S.; Farid, H.; Gu, Y.; He, M.; Nagano, K.; Li, H. Protecting World Leaders Against Deep Fakes. In Proceedings of the CVPR Workshops, Long Beach, CA, USA, 16 June 2019; pp. 38–45. [Google Scholar]
- Carvalho, T.; Faria, F.A.; Pedrini, H.; Torres, R.d.S.; Rocha, A. Illuminant-based transformed spaces for image forensics. IEEE Trans. Inf. Forensics Secur. 2015, 11, 720–733. [Google Scholar] [CrossRef]
- Durall, R.; Keuper, M.; Keuper, J. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7890–7899. [Google Scholar]
- Huang, Y.; Juefei-Xu, F.; Wang, R.; Guo, Q.; Ma, L.; Xie, X.; Li, J.; Miao, W.; Liu, Y.; Pu, G. Fakepolisher: Making deepfakes more detection-evasive by shallow reconstruction. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1217–1226. [Google Scholar]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and checkerboard artifacts. Distill 2016, 1, e3. [Google Scholar] [CrossRef]
- Zhao, H.; Zhou, W.; Chen, D.; Wei, T.; Zhang, W.; Yu, N. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2185–2194. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- Fridrich, J.; Kodovsky, J. Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 2012, 7, 868–882. [Google Scholar] [CrossRef] [Green Version]
- Cozzolino, D.; Poggi, G.; Verdoliva, L. Recasting residual-based local descriptors as convolutional neural networks: An application to image forgery detection. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, Philadelphia, PA, USA, 20–21 June 2017; pp. 159–164. [Google Scholar]
- Bayar, B.; Stamm, M.C. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, Vigo, Spain, 20–22 June 2016; pp. 5–10. [Google Scholar]
- Rahmouni, N.; Nozick, V.; Yamagishi, J.; Echizen, I. Distinguishing computer graphics from natural images using convolution neural networks. In Proceedings of the 2017 IEEE Workshop on Information Forensics and Security (WIFS), Rennes, France, 4–7 December 2017; pp. 1–6. [Google Scholar]
- Gunawan, T.S.; Hanafiah, S.A.M.; Kartiwi, M.; Ismail, N.; Za’bah, N.F.; Nordin, A.N. Development of photo forensics algorithm by detecting photoshop manipulation using error level analysis. Indones. J. Electr. Eng. Comput. Sci. 2017, 7, 131–137. [Google Scholar] [CrossRef]
- Chen, M.; Sedighi, V.; Boroumand, M.; Fridrich, J. JPEG-phase-aware convolutional neural network for steganalysis of JPEG images. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, Philadelphia, PA, USA, 20–22 June 2017; pp. 75–84. [Google Scholar]
- Liu, H.; Li, X.; Zhou, W.; Chen, Y.; He, Y.; Xue, H.; Zhang, W.; Yu, N. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 772–781. [Google Scholar]
- github. Deepfakes. Available online: https://github.com/deepfakes/faceswap (accessed on 11 May 2021).
- Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2387–2395. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 6202–6211. [Google Scholar]
- Trinh, L.; Tsang, M.; Rambhatla, S.; Liu, Y. Interpretable and trustworthy deepfake detection via dynamic prototypes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikola, HI, USA, 5–9 January 2021; pp. 1973–1983. [Google Scholar]
- Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; Su, J.K. This looks like that: Deep learning for interpretable image recognition. Adv. Neural Inf. Process. Syst. 2019, 32, 8930–8941. [Google Scholar]
- Haliassos, A.; Vougioukas, K.; Petridis, S.; Pantic, M. Lips Don’t Lie: A Generalisable and Robust Approach To Face Forgery Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 5039–5049. [Google Scholar]
Method | LQ | HQ | ||
---|---|---|---|---|
ACC | AUC | ACC | AUC | |
Steg.Features [39] | 55.98% | - | 70.97% | - |
LD-CNN [40] | 58.69% | - | 78.45% | - |
Constrained Conv [41] | 66.84% | - | 82.97% | - |
CustomPooling CNN [42] | 61.18% | - | 79.08% | - |
MesoNet [10] | 70.47% | - | 83.10% | - |
Face X-ray [11] | - | 61.60% | - | 87.40% |
Xception [36] | 86.86% | 89.30% | 95.73% | 96.30% |
Xception-ELA [43] | 79.63% | 82.90% | 93.86% | 94.80% |
Xception-PAFilters [44] | 87.16% | 90.20% | - | - |
SPSL [45] | 81.57% | 82.82% | 91.50% | 95.32% |
F-net [19] | 90.43% | 93.30% | 97.52% | 98.10% |
Multi-attentional Detection [35] | 88.69% | 90.40% | 97.60% | 99.29% |
MFF-net | 92.21% | 95.58% | 98.18% | 99.62% |
Method | DF | F2F | FS | NT |
---|---|---|---|---|
Steg.Features [39] | 67.00% | 48.00% | 49.00% | 56% |
LD-CNN [40] | 75.00% | 56.00% | 51.00% | 62.00% |
Constrained Conv [41] | 87.00% | 82.00% | 74.00% | 74.00% |
CustomPooling CNN [42] | 80.00% | 62.00% | 59.00% | 59.00% |
MesoNet [10] | 90.00% | 83.00% | 83.00% | 75.00% |
Xception [36] | 96.01% | 93.29% | 94.71% | 79.14% |
Slowfast [48] | 97.53% | 94.93% | 95.01% | 82.55% |
SPSL [45] | 93.48% | 86.02% | 92.26% | 76.78% |
F-net(Xception) [19] | 97.97% | 95.32% | 96.53% | 83.32% |
F-net(Slowfast) [19] | 98.62% | 95.84% | 97.23% | 86.01% |
MFF-Net | 99.73% | 96.38% | 98.20% | 91.79% |
Method | LQ | HQ | ||
---|---|---|---|---|
ACC | AUC | ACC | AUC | |
backbone(xception) | 86.86% | 89.30% | 95.73% | 96.30% |
+Feature extraction and enchancement module | 91.10% | 93.39% | 97.60% | 98.74% |
+Attention module | 91.32% | 94.23% | 97.94% | 99.15% |
+Diversity loss | 92.21% | 95.58% | 98.18% | 99.62% |
Method | FF++ | DFD | Celeb-DF |
---|---|---|---|
Xception [36] | 96.30% | 91.27% | 65.50% |
ProtoPNet [50] | 97.95% | 84.46% | 69.33% |
DPNet [49] | 99.20% | 92.44% | 68.20% |
SPSL [45] | 96.91% | - | 76.88% |
F-net [19] | 98.10% | - | 65.17% |
Multi-attentional Detection [35] | 99.80% | - | 67.44% |
MFF-Net | 99.73% | 92.53% | 75.07% |
Method | Train | Test | Blur | Cropping | JPEG | Noise | Combined |
---|---|---|---|---|---|---|---|
Res-Net | Sngan vs. CelebA | Sngan vs. CelebA | 82.87% | 94.40% | 97.12% | 87.37% | 88.98% |
LipForensics [51] | FF++ | FF++ | 96.10% | 96.21% | 95.60% | 73.80% | - |
Gram-Net [17] | Stylegan vs. CelebA-HQ | Stylegan vs. CelebA-HQ | 94.20% | 97.10% | 99.05% | 92.47% | - |
MFF-Net | Sngan vs. CelebA (CD) | Sngan vs. CelebA | 94.64% | 99.99% | 99.98% | 98.80% | 98.73% |
Sngan vs. CelebA (PD) | Sngan vs. CelebA | 97.95% | 99.23% | 99.38% | 98.79% | 99.74% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, L.; Zhang, M.; Ding, H.; Cui, X. MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion. Entropy 2021, 23, 1692. https://doi.org/10.3390/e23121692
Zhao L, Zhang M, Ding H, Cui X. MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion. Entropy. 2021; 23(12):1692. https://doi.org/10.3390/e23121692
Chicago/Turabian StyleZhao, Lei, Mingcheng Zhang, Hongwei Ding, and Xiaohui Cui. 2021. "MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion" Entropy 23, no. 12: 1692. https://doi.org/10.3390/e23121692
APA StyleZhao, L., Zhang, M., Ding, H., & Cui, X. (2021). MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion. Entropy, 23(12), 1692. https://doi.org/10.3390/e23121692