Detection of Image Level Forgery with Various Constraints Using DFDC Full and Sample Datasets
Abstract
:1. Introduction
2. Methods
2.1. State of the Art Pretrained Models
- The last fully connected layer (fc2) with 4096-dimensional feature output of VGG-19 model, was updated to accommodate the dense layer having 1024-dimensional feature output. Furthermore, the last output layer was designed in such a way which resulted in two classes (real or fake), rather than 1000 classes.
- We did not remove or modify any layer of Inception-ResNet-v2. We just enhanced the model by optimizing the last output layer, for predicting two classes only instead of 1000 classes. Apart from this, all the parameters and layers were kept as in case of Inception-ResNet-v2.
- Unlike in Inception-ResNet-v2, for the Xception model we added one extra dense layer with an output dimension of 1024 before output layer, and after average pooling layer of the Xception model. We also modified the output layer the same as for VGG-19 and Inception-ResNet-v2.
2.2. Proposed CNN Model Architecture
2.2.1. Model-A Architecture
2.2.2. Model-B
2.2.3. Model-C
3. Experiments and Results
3.1. Experimental Procedure
3.1.1. DFDC Sample Dataset Preprocessing
3.1.2. DFDC Full Dataset Preprocessing
3.1.3. Training Procedure and Parameter Setting for Proposed CNN and Pretrained Models for Deepfake Image Detection
3.2. Performance Evaluations Metrics
3.3. Experimental Results and Discusssion
4. Limitations and Future Work
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Results Obtained with Model-A and Model-C
Training Results | Validation Results | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Model | Input Image Resolution | F1 Score | Accuracy (%) | AUC | Recall | Precision | F1 Score | Accuracy (%) | AUC | Recall | Precision |
Model-A | 1080 × 1920 | 0.904 | 89.08 | 0.913 | 0.903 | 0.905 | 0.959 | 95.25 | 0.970 | 0.990 | 0.930 |
540 × 960 | 0.808 | 78.18 | 0.866 | 0.807 | 0.808 | 0.873 | 84.64 | 0.925 | 0.926 | 0.825 | |
270 × 480 | 0.756 | 72.13 | 0.805 | 0.758 | 0.754 | 0.813 | 78.15 | 0.861 | 0.836 | 0.791 | |
135 × 240 | 0.767 | 73.56 | 0.823 | 0.765 | 0.769 | 0.817 | 76.50 | 0.868 | 0.922 | 0.733 | |
224 × 224 | 0.721 | 67.89 | 0.759 | 0.729 | 0.714 | 0.753 | 64.52 | 0.774 | 0.949 | 0.624 | |
299 × 299 | 0.660 | 53.95 | 0.498 | 0.783 | 0.570 | 0.726 | 57.03 | 0.5 | 1.0 | 0.570 | |
Model-C | 1080 × 1920 | 0.924 | 91.35 | 0.950 | 0.924 | 0.924 | 0.924 | 91.57 | 0.974 | 0.903 | 0.946 |
540 × 960 | 0.876 | 85.84 | 0.937 | 0.879 | 0.873 | 0.902 | 88.51 | 0.963 | 0.935 | 0.872 | |
270 × 480 | 0.856 | 83.78 | 0.924 | 0.850 | 0.863 | 0.884 | 86.51 | 0.948 | 0.900 | 0.867 | |
135 × 240 | 0.829 | 80.60 | 0.902 | 0.829 | 0.830 | 0.853 | 82.40 | 0.929 | 0.898 | 0.812 | |
224 × 224 | 0.833 | 81.24 | 0.903 | 0.822 | 0.844 | 0.855 | 82.80 | 0.924 | 0.893 | 0.820 | |
299 × 299 | 0.869 | 85.16 | 0.936 | 0.867 | 0.871 | 0.889 | 87.18 | 0.950 | 0.903 | 0.875 |
References
- Verdoliva, L. Media Forensics and DeepFakes: An Overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
- Westerlund, M. The Emergence of Deepfake Technology: A Review. Technol. Innov. Manag. Rev. 2019, 9, 39–52. [Google Scholar] [CrossRef]
- FaceApp—Free Neural Face Transformation Filters. Available online: https://www.faceapp.com/ (accessed on 15 September 2022).
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Zamorski, M.; Zięba, M.; Klukowski, P.; Nowak, R.; Kurach, K.; Stokowiec, W.; Trzciński, T. Adversarial Autoencoders for Compact Representations of 3D Point Clouds. Comput. Vis. Image Underst. 2020, 193, 102921. [Google Scholar] [CrossRef] [Green Version]
- Kaggle Deepfake Detection Challenge. Available online: https://www.kaggle.com/c/deepfake-detection-challenge/data (accessed on 20 July 2021).
- Li, L.; Bao, J.; Zhang, T.; Yang, H.; Chen, D.; Wen, F.; Guo, B. Face X-Ray for More General Face Forgery Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5001–5010. [Google Scholar]
- Pashine, S.; Mandiya, S.; Gupta, P.; Sheikh, R. Deep Fake Detection: Survey of Facial Manipulation Detection Solutions. Int. Res. J. Eng. Technol. IRJET 2021, 8, 12605. [Google Scholar]
- Bonettini, N.; Bondi, L.; Cannas, E.D.; Bestagini, P.; Mandelli, S.; Tubaro, S. Video Face Manipulation Detection through Ensemble of CNNs. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milano, Italy, 10–15 January 2021; pp. 5012–5019. [Google Scholar]
- Hashmi, M.F.; Ashish, B.K.K.; Keskar, A.G.; Bokde, N.D.; Yoon, J.H.; Geem, Z.W. An Exploratory Analysis on Visual Counterfeits Using Conv-LSTM Hybrid Architecture. IEEE Access 2020, 8, 101293–101308. [Google Scholar] [CrossRef]
- Zhu, X.; Wang, H.; Fei, H.; Lei, Z.; Li, S.Z. Face Forgery Detection by 3D Decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVRP), Nashville, TN, USA, 20–25 June 2021; pp. 2929–2939. [Google Scholar]
- Skibba, R. Accuracy Eludes Competitors in Facebook Deepfake Detection Challenge. Engineering 2020, 6, 1339–1340. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the ICLR 2015, San Diego, CA, USA, 4 September 2015; pp. 1–14. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4278–4284. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Shah, Y.; Shah, P.; Patel, M.; Khamkar, C.; Kanani, P. Deep Learning Model-Based Multimedia Forgery Detection. In Proceedings of the 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 7–9 October 2020; pp. 564–572. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the ICML’15: 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 6, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Van Rossum, P. Development Team the Python Language Reference Release 3.6.4, 12th ed.; Media Services: Hong Kong, China, 2018. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 6 May 2020).
- Marra, F.; Gragnaniello, D.; Verdoliva, L.; Poggi, G. A Full-Image Full-Resolution End-to-End-Trainable CNN Framework for Image Forgery Detection. IEEE Access 2020, 8, 133488–133502. [Google Scholar] [CrossRef]
- Alsaffar, M.; Jarallah, E.M. Isolation and characterization of lytic bacteriophages infecting Pseudomonas aeruginosa from sewage water. Int. J. PharmTech Res. 2016, 9, 220–230. [Google Scholar]
- Fekri-Ershad, S.; Ramakrishnan, S. Cervical cancer diagnosis based on modified uniform local ternary patterns and feed forward multilayer network optimized by genetic algorithm. Comput. Biol. Med. 2022, 144, 105392. [Google Scholar] [CrossRef] [PubMed]
Model-A | |||||
---|---|---|---|---|---|
Layer | Type | Output Shape | Kernel Size | Strides | Dropout Rate |
1 | Convolution + ReLU | 1076 × 1916 × 32 | 5 × 5 | 1 × 1 | |
2 | Batch -Normalization | 1076 × 1916 × 32 | |||
3 | Max-Pooling | 538 × 958 × 32 | 2 × 2 | 2 × 2 | |
4 | Dropout | 538 × 958 × 32 | 0.8 | ||
5 | Convolution + ReLU | 536 × 956 × 64 | 3 × 3 | 1 × 1 | |
6 | Batch-Normalization | 536 × 956 × 64 | |||
7 | Max-Pooling | 268 × 478 × 64 | 2 × 2 | 2 × 2 | |
8 | Dropout | 268 × 478 × 64 | 0.6 | ||
9 | Flatten | 8,198,656 | |||
10 | Dense + Sigmoid | 1 |
Model-B | |||||
---|---|---|---|---|---|
Layer | Type | Output Shape | Kernel Size | Strides | Dropout Rate |
1 | Convolution + ReLU | 1070 × 1910 × 32 | 11 × 11 | 1 × 1 | |
2 | Batch -Normalization | 1070 × 1910 × 32 | |||
3 | Max-Pooling | 535 × 955 × 32 | 2 × 2 | 2 × 2 | |
4 | Dropout | 535 × 955 × 32 | 0.8 | ||
5 | Convolution + ReLU | 531 × 951 × 64 | 5 × 5 | 1 × 1 | |
6 | Batch-Normalization | 531 × 951 × 64 | |||
7 | Max-Pooling | 265 × 475 × 64 | 2 × 2 | 2 × 2 | |
8 | Dropout | 265 × 475 × 64 | 0.6 | ||
9 | Flatten | 8,056,000 | |||
10 | Dense + Sigmoid | 1 |
Model-C | |||||
---|---|---|---|---|---|
Layer | Type | Output Shape | Kernel Size | Strides | Dropout Rate |
1 | Convolution + ReLU | 1078 × 1918 × 32 | 3 × 3 | 1 × 1 | |
2 | Batch-Normalization | 1078 × 1918 × 32 | |||
3 | Max-Pooling | 539 × 959 × 32 | 2 × 2 | 2 × 2 | |
4 | Dropout | 539 × 959 × 32 | 0.8 | ||
5 | Convolution + ReLU | 537 × 957 × 64 | 3 × 3 | 1 × 1 | |
6 | Batch-Normalization | 537 × 957 × 64 | |||
7 | Max-Pooling | 268 × 478 × 64 | 2 × 2 | 2 × 2 | |
8 | Dropout | 268 × 478 × 64 | 0.6 | ||
9 | Convolution + ReLU | 266 × 476 × 128 | 3 × 3 | ||
10 | Batch-Normalization | 266 × 476 × 128 | |||
11 | Max-Pooling | 133 × 238 × 128 | 2 × 2 | ||
12 | Dropout | 133 × 238 × 128 | 0.25 | ||
13 | Flatten | 4,051,712 | |||
14 | Dense + Sigmoid | 1 |
Training Results | Validation Results | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Model | Input Image Resolution | F1 Score | Accuracy (%) | AUC | Recall | Precision | F1 Score | Accuracy (%) | AUC | Recall | Precision |
VGG-19 [13] | 224 × 224 | 0.716 | 68.17 | 0.751 | 0.705 | 0.727 | 0.751 | 67.95 | 0.773 | 0.849 | 0.673 |
Inception- ResNet-v2 [14] | 299 × 299 | 0.789 | 75.97 | 0.850 | 0.790 | 0.788 | 0.798 | 74.41 | 0.854 | 0.890 | 0.724 |
Xception [15] | 299 × 299 | 0.948 | 94.15 | 0.988 | 0.946 | 0.950 | 0.916 | 90.12 | 0.967 | 0.953 | 0.882 |
Model-B | 1080 × 1920 | 0.922 | 91.14 | 0.936 | 0.921 | 0.923 | 0.973 | 96.96 | 0.984 | 0.989 | 0.958 |
540 × 960 | 0.856 | 83.57 | 0.911 | 0.856 | 0.855 | 0.904 | 88.83 | 0.941 | 0.924 | 0.884 | |
270 × 480 | 0.750 | 71.75 | 0.803 | 0.745 | 0.756 | 0.790 | 73.02 | 0.823 | 0.891 | 0.709 | |
135 × 240 | 0.724 | 68.54 | 0.771 | 0.725 | 0.723 | 0.761 | 68.71 | 0.791 | 0.875 | 0.673 | |
224 × 224 | 0.727 | 69.05 | 0.777 | 0.723 | 0.731 | 0.767 | 71.24 | 0.794 | 0.832 | 0.711 | |
299 × 299 | 0.738 | 70.42 | 0.791 | 0.732 | 0.744 | 0.771 | 69.86 | 0.804 | 0.891 | 0.679 |
DFDC Data Type | Input Image Resolution | F1 Score | Accuracy | Recall | Precision |
---|---|---|---|---|---|
Full DFDC | 1080 × 1920 | 0.268 | 0.3756 | 0.708 | 0.165 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lamichhane, B.; Thapa, K.; Yang, S.-H. Detection of Image Level Forgery with Various Constraints Using DFDC Full and Sample Datasets. Sensors 2022, 22, 9121. https://doi.org/10.3390/s22239121
Lamichhane B, Thapa K, Yang S-H. Detection of Image Level Forgery with Various Constraints Using DFDC Full and Sample Datasets. Sensors. 2022; 22(23):9121. https://doi.org/10.3390/s22239121
Chicago/Turabian StyleLamichhane, Barsha, Keshav Thapa, and Sung-Hyun Yang. 2022. "Detection of Image Level Forgery with Various Constraints Using DFDC Full and Sample Datasets" Sensors 22, no. 23: 9121. https://doi.org/10.3390/s22239121
APA StyleLamichhane, B., Thapa, K., & Yang, S. -H. (2022). Detection of Image Level Forgery with Various Constraints Using DFDC Full and Sample Datasets. Sensors, 22(23), 9121. https://doi.org/10.3390/s22239121