Real-Time Advanced Computational Intelligence for Deep Fake Video Detection
Abstract
:1. Introduction
- We propose a new model architecture, which consists of a linear stack of separable convolution 2D, max-pooling layers with XGBoost as the classifier and modification Swish as an activation function.
- Rather than focusing on any single facial manipulation technique, we focused on generating a robust, scalable, and generalizable model for deepfake video detection by training the model on an augmented and generalized dataset.
- The proposed model outperforms on deep fake dataset with less training and validation loss.
2. Literature Review
3. Preliminaries
3.1. Generative Adversarial Network (Gan)
3.2. Efficient Net Model
3.3. Xception Model
4. Proposed Methodology
4.1. Image Augmentation
4.2. Construction of Fully Connected Output Layers
4.3. Architecture of the Proposed DeepFake Detection Model
Algorithm 1: Algorithm for DeepFake Video Detection |
Result: Real/Fake Video Read a video from the dataset. Divide the video into frames using OpenCV and label all the image frames as fake or real using one-to-one mapping. Face is detected from these frames using the BlazeFace [31,32] library. Apply image augmentation techniques while training the model. Provide frames with faces as input to our model for classification. Our model classifies the image as real (1) or fake (0). |
4.4. Computational Complexity of Our Model
5. Experiments
5.1. Dataset
5.2. Experimental Parameters
5.3. Experimental Settings
- Drop Out: This parameter is used to reduce over fitting. To train all nodes equally, we drop out some neurons for a particular epoch. To do so, we randomly drop neurons in our model during training. This forces the network to share information between weights, which leads to an increase in its ability to generalize to new data. Some nodes having more weight may get turned off multiple times, and some nodes having less weight may not turn off even once. The chance is between 0–1. This determines the fraction of neurons that should be turned off from the previous layer. In our proposed model, we used a dropout value of 0.45.
- Learning Rate: This parameter is tuned to control the change in the model after each time the weights are updated. The learning rate generally decides the time taken by the model to converge. It is the step that is taken by the model each time to move close to convergence. If the learning rate is too large, the model takes larger steps, and it will converge fast; as a result, there is the chance that the model may not converge exactly on the minimum, and the minimum may get missed. If the learning rate is too small, then the model will take a very large time to converge. The best learning rate is one that decreases as the model gets closer to the solution. We used Reduce LR on the plateau learning rate scheduler in Pytorch. It reduces the learning by itself if accuracy does not improve in 2–3 epochs. In our proposed model, we used a learning rate of 0.001.
- Epochs: This parameter is the count of iterations. In each iteration, the model is trained on the full training dataset. A greater number of iterations leads to high-performance gains of the model to some extent, as in each iteration the weights are adjusted based on error calculation on the previous iteration. This reduces the error in each following iteration. To maintain the trade-off between the time taken in training the model and performance gains, the number of epochs must be selected wisely. In our proposed model, we used 30 epochs.
- Batch Size: This parameter defines the number of input samples that will be sent to the model at once for training. The complete training dataset is divided into batches, each with a size equal to that of batch size. The model is trained on one batch at a time. A large batch size trains the model fast and takes less memory, but the model has less learning, whereas too small a size may lead to large memory requirements and more training time for the model. Thus, the batch size must be selected, maintaining the tradeoff between accuracy and training time of the model. In our proposed model, we used 128 as the batch size.
- Optimizer: We used a loss function to calculate the loss or the wrong predictions made by our model. Then we tried to reduce the loss function by tuning the hyper parameters. Optimizers tie loss function with the parameters of the model. They update the model in response to the outcome of the loss function. We used Adam Optimizer to optimize our model. It is an adaptive optimizer. It is the combination of AdaGrad and RMSProp, and hence combines the advantages of both.
- Activation Function: Activation functions are an integral part of the artificial neural network. They help the model to learn complex patterns, and decide what will be fired as an input to the next neuron. The Swish activation function is a multiplication of linear and sigmoid activation functions. It has solved the problem of ReLu where the negative values are nullified to zero. We found in our model architecture that Swish performs better than the ReLu activation function.
5.4. Interpretation of Our Model
5.5. Performance Evaluation
5.6. Performance Evaluation and Comparison of Different Architectures
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Faceswap: Deepfakes Software for All. Available online: https://github.com/deepfakes/faceswap (accessed on 10 November 2020).
- Jan, K.; Lee, L.W.; McCarthy, I.P.; Kietzmann, T.C. Deepfakes: Trick or treat? Bus. Horiz. 2020, 63, 135–146. [Google Scholar]
- Supasorn, S.; Seitz, S.M.; Kemelmacher-Shlizerman, I. Synthesizing obama: Learning lip sync from audio. ACM Trans. Graph. 2017, 36, 1–13. [Google Scholar]
- FakeApp 2.2.0. Available online: https://www.malavida.com/en/soft/fakeapp/ (accessed on 28 October 2020).
- Bloomberg (11 September 2018). How Faking Videos Became Easy and Why That’s So Scary. Available online: https://fortune.com/2018/09/11/deep-fakes-obama-video/ (accessed on 2 October 2020).
- Robert, C.; Citron, D. Deepfakes and the new disinformation war: The coming age of post-truth geopolitics. Foreign Aff. 2019, 98, 147. [Google Scholar]
- Patrick, T. The Newest AI-Enabled Weapon: Deep-Faking Photos of the Earth. Defense One. March 2019. Available online: https://www.defenseone.com/technology/2019/03/next-phase-ai-deep-faking-whole-world-and-china-ahead/155944/ (accessed on 3 January 2023).
- Kumar, A.; Dadheech, P.; Chaudhary, U. Energy conservation in WSN: A review of current techniques. In Proceedings of the 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE), Jaipur, India, 7–8 February 2020. [Google Scholar]
- Zifeng, W.; Shen, C.; Van Den Hengel, A. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit. 2019, 90, 119–133. [Google Scholar]
- François, C. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Yadav, D.P.; Jalal, A.S.; Prakash, V. Human burn depth and grafting prognosis using ResNeXt topology based deep learning network. Multimed. Tools Appl. 2022, 81, 18897–18914. [Google Scholar] [CrossRef]
- Rathor, S.; Agrawal, S. Sense understanding of text conversation using temporal convolution neural network. Multimed. Tools Appl. 2022, 81, 9897–9914. [Google Scholar] [CrossRef]
- Singh, L.K.; Garg, H.; Khanna, M. Deep learning system applicability for rapid glaucoma prediction from fundus images across various data sets. Evol. Syst. 2022, 13, 807–836. [Google Scholar] [CrossRef]
- Gupta, N.; Garg, H.; Agarwal, R. A robust framework for glaucoma detection using CLAHE and EfficientNet. Vis. Comput. 2022, 38, 2315–2328. [Google Scholar] [CrossRef]
- Ruben, T.; Vera-Rodriguez, R.; Fierrez, J.; Morales, A.; Ortega-Garcia, J. Deepfakes and beyond: A survey of face manipulation and fake detection. Inf. Fusion 2020, 64, 131–148. [Google Scholar]
- Ismail, A.; Elpeltagy, M.; Zaki, M.S.; Eldahshan, K. A New Deep Learning-Based Methodology for Video Deepfake Detection Using XGBoost. Sensors 2021, 21, 5413. [Google Scholar] [CrossRef]
- Chih-Chung, H.; Zhuang, Y.-X.; Lee, C.-Y. Deep fake image detection based on pairwise learning. Appl. Sci. 2020, 10, 370. [Google Scholar]
- Sumit, C.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 539–546. [Google Scholar]
- Mingxing, T.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Li, Y.; Lyu, S. Exposing deepfake videos by detecting face warping artifacts. arXiv 2018, arXiv:1811.00656. [Google Scholar]
- Yisroel, M.; Lee, W. The creation and detection of deepfakes: A survey. ACM Comput. Surv. 2021, 54, 1–41. [Google Scholar]
- David, G.; Delp, E.J. Deepfake video detection using recurrent neural networks. In Proceedings of the 15th IEEE International Conference on Advanced Video and Signal-BASED Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Tackhyun, J.; Kim, S.; Kim, K. DeepVision: Deepfakes Detection Using Human Eye Blinking Pattern. IEEE Access 2020, 8, 83144–83154. [Google Scholar]
- Thi, N.T.; Nguyen, C.M.; Nguyen, D.T.; Nguyen, D.T.; Nahavandi, S. Deep learning for deepfakes creation and detection. arXiv 2019, arXiv:1909.11573. [Google Scholar]
- Sergey, I.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Aminollah, K.; Yuan, J.-S. ADD: Attention-Based DeepFake Detection Approach. Big Data Cogn. Comput. 2021, 5, 49. [Google Scholar]
- Montserrat, D.M.; Hao, H.; Yarlagadda, S.K.; Baireddy, S.; Shao, R.; Horvath, J.; Bartusiak, E.; Yang, J.; Guera, D.; Zhu, F.; et al. Deepfakes Detection with Automatic Face Weighting. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 2851–2859. [Google Scholar]
- Yu, P.; Xia, Z.; Fei, J.; Lu, Y. A Survey on Deepfake Video Detection. IET Biom. 2021, 10, 607–624. [Google Scholar] [CrossRef]
- Su, Y.; Xia, H.; Liang, Q.; Nie, W. Exposing DeepFake Videos Using Attention Based Convolutional LSTM Network. Neural Process. Lett. 2021, 53, 4159–4175. [Google Scholar] [CrossRef]
- Wodajo, D.; Atnafu, S. Deep fake video detection using convolutional vision transformer. arXiv 2021, arXiv:2102.11126. [Google Scholar]
- Bonettini, N.; Cannas, E.D.; Mandelli, S.; Bondi, L.; Bestagini, P.; Tubaro, S. Video face manipulation detection through ensemble of CNNs. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5012–5019. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unifified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Chen, W.; Huang, H.; Peng, S.; Zhou, C.; Zhang, C. YOLO-Face: A Real-Time Face Detector. Vis. Comput. 2020, 37, 805–813. Available online: https://link.springer.com/article/10.1007/s00371-020-01831-7 (accessed on 7 August 2021). [CrossRef]
- Kumar, R.; Arora, R.; Bansal, V.; Sahayasheela, V.J.; Buckchash, H.; Imran, J.; Narayanan, N.; Pandian, G.N.; Raman, B. Accurate prediction of COVID-19 using chest X-ray images through deep feature learning model with SMOTE and machine learning classififiers. MedRxiv 2020. [Google Scholar]
- Kumar, A.; Kumar, A.; Bashir, A.K.; Rashid, M.; Kumar, V.A.; Kharel, R. Distance based pattern driven mining for outlier detection in high dimensional big dataset. ACM Trans. Manag. Inf. Syst. (TMIS) 2021, 13, 1–17. [Google Scholar] [CrossRef]
- Kumar, A.; Dadheech, P.; Singh, V.; Raja, L. Performance modeling for secure migration processes of legacy systems to the cloud computing. In Data Deduplication Approaches; Academic Press: Cambridge, MA, USA, 2021; pp. 255–279. [Google Scholar]
- Afchar, D.; Nozick, V.; Yamagishi, J.; Echizen, I. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; pp. 1–7. [Google Scholar]
- Kumar, A.; Dadheech, P.; Beniwal, M.K.; Agarwal, B.; Patidar, P.K. A fuzzy logic-based control system for detection and mitigation of blackhole attack in vehicular Ad Hoc network. In Microservices in Big Data Analytics: Proceedings of the Second International, ICETCE 2019, Rajasthan, India, 1–2 February 2019; Springer: Singapore, 2019; pp. 163–178. [Google Scholar]
- Yang, X.; Li, Y.; Lyu, S. Exposing deep fakes using inconsistent head poses. In Proceedings of the ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8261–8265. [Google Scholar]
- Charitidis, P.; Kordopatis-Zilos, G.; Papadopoulos, S.; Kompatsiaris, I. A face preprocessing approach for improved deepfake detection. arXiv 2020, arXiv:2006.07084. [Google Scholar]
- Kumar, A.; Bhavsar, A.; Verma, R. Detecting deepfakes with metric learning. In Proceedings of the 2020 8th International Workshop on Biometrics and Forensics (IWBF), Porto, Portugal, 29–30 April 2020; pp. 1–6. [Google Scholar]
- Li, Y.; Chang, M.C.; Lyu, S. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; pp. 1–7. [Google Scholar]
- Vamsi VV VN, S.; Shet, S.S.; Reddy, S.S.M.; Rose, S.S.; Shetty, S.R.; Sathvika, S.; Supriya, M.S.; Shankar, S.P. Deepfake Detection in Digital Media Forensics. Glob. Transit. Proc. 2022, 3, 74–79. [Google Scholar] [CrossRef]
- Rana, M.S.; Nobi, M.N.; Murali, B.; Sung, A.H. Deepfake Detection: A Systematic Literature Review. IEEE Access 2022, 10, 25494–25513. [Google Scholar] [CrossRef]
Parameters | Used Value |
---|---|
Drop Out | 0.65 |
Learning Rate | 0.001 |
Epochs | 15 |
Batch Size | 64 |
Optimizer | Adam |
Parameters | Used Value |
---|---|
Drop Out | 0.75 |
Learning Rate | 0.001 |
Epochs | 20 |
Batch Size | 128 |
Optimizer | Adam |
Dataset | DFDC (DeepFake Detection Challenge) | |
---|---|---|
Size | 470 GB | |
Training | Frames having faces (Real) | 65,234 |
Frames having faces (Fake) | 68,258 | |
Validation | Frames having faces (Real) | 5876 |
Frames having faces (Fake) | 5698 | |
Testing | Frames having faces (Real) | 9785 |
Frames having faces (Fake) | 9542 |
Parameters | Used Value |
---|---|
Drop Out | 0.45 |
Learning Rate | 0.001 |
Epochs | 30 |
Batch Size | 128 |
Optimizer | Adam |
Activation Function | Swish |
N = 19,327 | Predicted Real | Predicted Fake |
---|---|---|
Actual Real | 8543 | 1242 |
Actual Fake | 1094 | 8448 |
N = 19,327 | Predicted Real | Predicted Fake |
---|---|---|
Actual Real | 8792 | 993 |
Actual Fake | 922 | 8620 |
N = 19,327 | Predicted Real | Predicted Fake |
---|---|---|
Actual Real | 8908 | 877 |
Actual Fake | 663 | 8879 |
Models | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
BlazeFace + Efficient Net B5 [43] + SVM [39] | 0.7985 | 0.8094 | 0.8021 | 0.8246 |
YOLO + Xception [36] + SVM [39] | 0.8030 | 0.7902 | 0.7853 | 0.8337 |
MTCNN [31,41,43] + InceptionResNetV2 + XGBoost | 0.8265 | 0.8045 | 0.8218 | 0.8445 |
YOLO + InceptionResNetV2 + XGBoost [YIX] [17] | 0.8736 | 0.8539 | 0.8636 | 0.9073 |
YOLO + ResNet152 [38] + SVM [39] | 0.7828 | 0.8012 | 0.7878 | 0.8250 |
YOLO + ResNet152 [38] + XGBoost | 0.8043 | 0.8129 | 0.8083 | 0.8488 |
BlazeFace + Efficient Net B5 [43] + XGBoost | 0.8285 | 0.8145 | 0.8012 | 0.8458 |
BlazeFace + Xception [36] + XGBoost | 0.8728 | 0.8945 | 0.8986 | 0.9037 |
YOLO + Xception [36] + Log Reg | 0.7645 | 0.7724 | 0.7954 | 0.8102 |
YOLO + Efficient Net B5 [43] + Log Reg | 0.7827 | 0.8152 | 0.8021 | 0.8342 |
BlazeFace + Efficient Net B5 + XGBoost | 0.8985 | 0.9094 | 0.9021 | 0.9146 |
YOLO + DFN+ SVM | 0.8503 | 0.8469 | 0.8517 | 0.8528 |
YOLO + DFN + XGBoost | 0.8627 | 0.8745 | 0.8468 | 0.8762 |
BlazeFace + DFN+ SVM | 0.8971 | 0.9069 | 0.8823 | 0.9028 |
BlazeFace + DFN + DenseLayer [36,37,38] | 0.8192 | 0.8363 | 0.8241 | 0.8152 |
BlazeFace + DFN + Log Reg | 0.9078 | 0.9186 | 0.9254 | 0.9105 |
Proposed: BlazeFace + DFN+ XGBoost | 0.9103 | 0.9269 | 0.9217 | 0.9328 |
Parameters | Efficient Net | Xception | Proposed Model |
---|---|---|---|
Training Accuracy | 0.9562 | 0.9645 | 0.9676 |
Testing Accuracy | 0.9086 | 0.9263 | 0.9328 |
Training Loss | 0.1624 | 0.1345 | 0.1422 |
Testing Loss | 0.2172 | 0.2062 | 0.1865 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bansal, N.; Aljrees, T.; Yadav, D.P.; Singh, K.U.; Kumar, A.; Verma, G.K.; Singh, T. Real-Time Advanced Computational Intelligence for Deep Fake Video Detection. Appl. Sci. 2023, 13, 3095. https://doi.org/10.3390/app13053095
Bansal N, Aljrees T, Yadav DP, Singh KU, Kumar A, Verma GK, Singh T. Real-Time Advanced Computational Intelligence for Deep Fake Video Detection. Applied Sciences. 2023; 13(5):3095. https://doi.org/10.3390/app13053095
Chicago/Turabian StyleBansal, Nency, Turki Aljrees, Dhirendra Prasad Yadav, Kamred Udham Singh, Ankit Kumar, Gyanendra Kumar Verma, and Teekam Singh. 2023. "Real-Time Advanced Computational Intelligence for Deep Fake Video Detection" Applied Sciences 13, no. 5: 3095. https://doi.org/10.3390/app13053095
APA StyleBansal, N., Aljrees, T., Yadav, D. P., Singh, K. U., Kumar, A., Verma, G. K., & Singh, T. (2023). Real-Time Advanced Computational Intelligence for Deep Fake Video Detection. Applied Sciences, 13(5), 3095. https://doi.org/10.3390/app13053095