Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection
Abstract
:1. Introduction
- A golden frame selection technique is introducedidentifying the most informative video frames based on grayscale intensity differences between consecutive frames. This approach improves training efficiency while maintaining a good classification performance.
- An ensemble of deep learning models is constructed, combining ResNet50, EfficientNetB0, Xception, InceptionV3, and FaceNet architectures to capture diverse spatial and textural features relevant to deepfake detection.
- A meta-classification model based on XGBoost is applied to aggregate predictions from the base models, enhancing overall accuracy and robustness.
- The application of Grad-CAM enables the interpretation of model behavior by visualizing the regions of interest across different neural networks.
2. Literature Review
- First, most deep learning models, particularly CNN architectures, demonstrate high accuracy on specific datasets but significantly lose effectiveness when the data source or video quality changes.
- Second, some approaches, such as methods based on spectral analysis or manual feature extraction, are sensitive to lighting changes and variations in video streams.
- Third, most methods process videos frame by frame, which results in significant computational costs and limits their practical applicability in real-time scenarios.
3. Method Description
- Check whether the file exists in directory .
- If the file exists, apply the golden frame extraction function , where is the maximum number of frames, to extract the following:
- Assign the class label based on the label field in :
- Append all frames in to the list , and append the label to list for each frame.
4. Implementation
4.1. Case Study
4.2. Feature Impact
4.3. Accuracy Evaluation
4.4. Experimental Results
- With golden frames: using selected keyframes from videos that provide the most important visual features.
- Without golden frames: training without selected keyframes.
- Without batch optimization: turning off batch-level optimizations during training.
- Without dropout: removing dropout layers to see how it affects regularization.
- Without Mish activation: replacing Mish activation functions with ReLU to check their impact.
4.5. Module Description and Operation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Teneo. Deepfakes in 2024 Are Suddenly Deeply Real: An Executive Briefing on the Threat and Trends. 2024. Available online: https://www.teneo.com/insights/articles/deepfakes-in-2024-are-suddenly-deeply-real-an-executive-briefing-on-the-threat-and-trends/ (accessed on 24 February 2025).
- Centre for Strategic & International Studies (CSIS). The Future of Hybrid Warfare. 2024. Available online: https://www.csis.org/analysis/future-hybrid-warfare (accessed on 24 February 2025).
- Brookings. Deepfakes and International Conflict. 2023. Available online: https://www.brookings.edu/wp-content/uploads/2023/01/FP_20230105_deepfakes_international_conflict.pdf (accessed on 24 February 2025).
- Geneva Centre for Security Policy (GCSP). The War in Ukraine: Reality Check for Emerging Technologies and the Future of Warfare. 2024. Available online: https://www.gcsp.ch/publications/war-ukraine-reality-check-emerging-technologies-and-future-warfare (accessed on 24 February 2025).
- RAND Corporation. Ukraine’s Lessons for the Future of Hybrid Warfare. 2024. Available online: https://www.rand.org/pubs/commentary/2022/11/ukraines-lessons-for-the-future-of-hybrid-warfare.html (accessed on 24 February 2025).
- Lipianina-Honcharenko, K.; Maika, N.; Sachenko, S.; Kopania, L.; Soia, M. A Cyclical Approach to Legal Document Analysis: Leveraging AI for Strategic Policy Evaluation. CEUR-WS 2024, 3736, 201–211. [Google Scholar]
- Babeshko, I.; Illiashenko, O.; Kharchenko, V.; Leontiev, K. Towards Trustworthy Safety Assessment by Providing Expert and Tool-Based XMECA Techniques. Mathematics 2022, 10, 2297. [Google Scholar] [CrossRef]
- Tran, V.-N.; Lee, S.-H.; Le, H.-S.; Kwon, K.-R. High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction. Appl. Sci. 2021, 11, 7678. [Google Scholar] [CrossRef]
- Zhang, J.; Cheng, K.; Sovernigo, G.; Lin, X. A Heterogeneous Feature Ensemble Learning Based Deepfake Detection Method. In Proceedings of the ICC 2022—IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 2084–2089. [Google Scholar] [CrossRef]
- Nadimpalli, A.V.; Rattani, A. Facial Forgery-Based Deepfake Detection Using Fine-Grained Features. In Proceedings of the 2023 International Conference on Machine Learning and Applications, Jacksonville, FL, USA, 15–17 December 2023; pp. 2174–2181. [Google Scholar] [CrossRef]
- Guan, L.; Liu, F.; Zhang, R.; Liu, J.; Tang, Y. MCW: A Generalizable Deepfake Detection Method for Few-Shot Learning. Sensors 2023, 23, 8763. [Google Scholar] [CrossRef] [PubMed]
- Khan, R.; Sohail, M.; Usman, I.; Sandhu, M.; Raza, M.; Yaqub, M.A.; Liotta, A. Comparative study of deep learning techniques for DeepFake video detection. ICT Express 2024, 10, 1226–1239. [Google Scholar] [CrossRef]
- Chakraborty, R.; Naskar, R. Role of human physiology and facial biomechanics towards building robust deepfake detectors: A comprehensive survey and analysis. Comput. Sci. Rev. 2024, 54, 100677. [Google Scholar] [CrossRef]
- Abbas, F.; Taeihagh, A. Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence. Expert Syst. Appl. 2024, 252, 124260. [Google Scholar] [CrossRef]
- Casu, M.; Guarnera, L.; Caponnetto, P.; Battiato, S. GenAI mirage: The impostor bias and the deepfake detection challenge in the era of artificial illusions. Forensic Sci. Int. Digit. Investig. 2024, 50, 301795. [Google Scholar] [CrossRef]
- Firc, A.; Malinka, K.; Hanáček, P. Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors. Heliyon 2023, 9, e15090. [Google Scholar] [CrossRef] [PubMed]
- Lee, E.-G.; Lee, I.; Yoo, S.-B. ClueCatcher: Catching Domain-Wise Independent Clues for Deepfake Detection. Mathematics 2023, 11, 3952. [Google Scholar] [CrossRef]
- Naitali, A.; Ridouani, M.; Salahdine, F.; Kaabouch, N. Deepfake Attacks: Generation, Detection, Datasets, Challenges, and Research Directions. Computers 2023, 12, 216. [Google Scholar] [CrossRef]
- Dincer, S.; Ulutas, G.; Ustubioglu, B.; Tahaoglu, G.; Sklavos, N. Golden ratio based deep fake video detection system with fusion of capsule networks. Comput. Electr. Eng. 2024, 117, 109234. [Google Scholar] [CrossRef]
- Sumanth, S.; Durga, T.C.; Sai, C.Y.; Manne, S. Temporal Convulutional Network & Content-Based Frame Sampling Fusion for Semantically Enriched Video Summarization. Research Square. 2023. Available online: https://www.researchsquare.com/article/rs-3010938/latest (accessed on 24 February 2025).
- Gong, B.; Chao, W.-L.; Grauman, K. Diverse Sequential Subset Selection for Supervised Video Summarization. NeurIPS. 2014. Available online: https://proceedings.neurips.cc/paper_files/paper/2014/file/5d3b9e06117de70a7e5076cc3ed89e18-Paper.pdf (accessed on 24 February 2025).
- Leszczuk, M.I.; Duplaga, M. Algorithm for video summarization of bronchoscopy procedures. BioMed. Eng. OnLine 2011, 10, 110. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.H.; Kim, Y.H. Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J. Am. Soc. Inf. Sci. Technol. 2010, 61, 1130–1142. [Google Scholar] [CrossRef]
- Alarfaj, F.K.; Khan, J.A. Deep Dive into Fake News Detection: Feature-Centric Classification with Ensemble and Deep Learning Methods. Algorithms 2023, 16, 507. [Google Scholar] [CrossRef]
- Koonce, B. ResNet 50. In Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Apress: Berkeley, CA, USA, 2021; pp. 63–72. [Google Scholar]
- Kansal, K.; Chandra, T.B.; Singh, A. ResNet-50 vs. EfficientNet-B0: Multi-Centric Classification of Various Lung Abnormalities Using Deep Learning “Session id: ICMLDsE. 004”. Procedia Comput. Sci. 2024, 235, 70–80. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Xia, X.; Xu, C.; Nan, B. Inception-v3 for flower classification. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 783–787. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Peng, C.; Liu, Y.; Yuan, X.; Chen, Q. Research of image recognition method based on enhanced inception-ResNet-V2. Multimed. Tools Appl. 2022, 81, 34345–34365. [Google Scholar] [CrossRef]
- Komar, M.; Dorosh, V.; Hladiy, G.; Sachenko, A. Deep neural network for detection of cyber attacks. In Proceedings of the 2018 IEEE 1st International Conference on System Analysis and Intelligent Computing, SAIC 2018—Proceedings, Kyiv, Ukraine, 8–12 October 2018; p. 8516753. [Google Scholar]
- Lipianina-Honcharenko, K.; Yarych, V.; Ivasechko, A.; Filinyuk, A.; Yurkiv, K.; Lebid, T.; Soia, M. Evaluating the Effectiveness of Attention-Gated-CNN-BGRU Models for Historical Manuscript Recognition in Ukraine. In Proceedings of the First International Workshop of Young Scientists on Artificial Intelligence for Sustainable Development, Ternopil, Ukraine, 10–11 May 2024; pp. 99–108. [Google Scholar]
- Lipianina-Honcharenko, K.; Telka, M.; Melnyk, N. Comparison of ResNet, EfficientNet, and Xception architectures for deepfake detection. In Proceedings of the 1st International Workshop on Advanced Applied Information Technologies CEUR-WS, Khmelnytskyi, Ukraine, Zilina, Slovakia, 5 December 2024; pp. 26–34. Available online: https://ceur-ws.org/Vol-3899/paper3.pdf (accessed on 24 February 2025).
- Ni, W.; Wang, T.; Wu, Y.; Liu, X.; Li, Z.; Yang, R.; Zhang, K.; Yang, J.; Zeng, M.; Hu, N.; et al. Multi-task deep learning model for quantitative volatile organic compounds analysis by feature fusion of electronic nose sensing. Sens. Actuators B Chem. 2024, 417, 136206. [Google Scholar] [CrossRef]
- Kaggle. Deepfake Detection Challenge. 2020. Available online: https://www.kaggle.com/competitions/deepfake-detection-challenge/data (accessed on 24 February 2025).
- Ni, W.; Wang, T.; Wu, Y.; Chen, X.; Cai, W.; Zeng, M.; Yang, J.; Hu, N.; Yang, Z. Classification and concentration predictions of volatile organic compounds using an electronic nose based on XGBoost-random forest algorithms. IEEE Sens. J. 2023, 24, 671–678. [Google Scholar] [CrossRef]
- TruScanAI. (n.d.). Available online: https://sci-proj.wunu.edu.ua/truscanai/ (accessed on 24 February 2025).
- Illiashenko, O.; Kharchenko, V.; Kovalenko, A. Cyber security lifecycle and assessment technique for FPGA-based I&C systems. In Proceedings of the East-West DesignTest Symposium (EWDTS 2013), Rostov-on-Don, Russia, 27–30 September 2013; pp. 1–5. [Google Scholar] [CrossRef]
Author(s) | Year | Method Description | Method Features | Method Novelty |
---|---|---|---|---|
Tran et al. [8] | 2021 | The use of CNN with a focus on specific target regions. | High performance achieved through manual feature distillation. | Model size optimization for improved performance. |
Zhang et al. [9] | 2022 | Heterogeneous feature ensemble for detection. | Integration of various features (gray gradient, spectral features, and texture features) to improve accuracy. | Improving accuracy through feature ensemble. |
Guan et al. [11] | 2023 | Multifunctional weighted model based on meta-learning. | Utilizing RGB and frequency domains to enhance generalization. | High generalization and adaptation to diverse data. |
Lee et al. [17] | 2023 | “ClueCatcher” with domain-independent feature selection. | Focus on facial color mismatches, synthesis boundaries artifacts, and quality differences between faces and non-facial regions. | Effectiveness in real-world scenarios. |
Feature Type | ResNet50 | EfficientNetB0 | Xception | InceptionV3 | Facenet |
---|---|---|---|---|---|
Edges and contours of objects | Object edge detection. | Edge detection with computational resource optimization. | Contour detection with deep detailing. | Edge detection through multichannel operations for general objects. | Precise edge detection of faces for feature recognition. |
Textural patterns | Surface texture detection. | Textural details at different scales. | Complex texture detection. | Comprehensive texture analysis, adapted for general objects. | Texture extraction tailored for facial features. |
Shapes and sizes | Recognition of basic shapes and sizes. | Efficient shape recognition at multiple levels. | Detection of complex shapes and geometric features. | Generalized shape detection for diverse objects. | Detailed face shape recognition for accurate identification. |
Lighting and shadows | Lighting variation analysis. | Balancing local and global lighting variations. | High sensitivity to shadows and lighting. | Robustness to lighting changes through multiscaling. | Analysis of subtle lighting variations specific to faces. |
Conceptual features | Face, vehicle, and other object recognition. | Object recognition based on simplified features. | High efficiency in recognizing complex objects. | Generalized object recognition across different categories. | Optimized for face recognition and identity identification. |
Depth-related features | Complex patterns (shapes, textures, contours). | Complex patterns at different scales. | Structural and textural features. | Generalized features at deep levels, effective for a wide range of objects. | Deep facial traits for differentiating even similar faces. |
Spatial features | Geometric details (straight lines, angles) | Local details and global structure. | Complex interactions between elements. | Geometric patterns are adapted for different object scales. | Spatial features for precise facial detail detection. |
Color features | - | Color channel correlations. | Separate extraction of spatial and channel features. | Sensitivity to color channels for general objects. | Consideration of subtle color nuances is important for faces. |
Contextual features | Interaction of objects in the scene. | Local and global relationships between objects. | Interactions between objects in the scene. | Detection of relationships between objects in complex scenes. | Determining the context of faces in the scene to improve accuracy. |
Features across different levels | - | Balance between details and general patterns. | Local details and general characteristics | Different feature levels for different object categories. | Local and global features for face identification. |
Detailing features | General characteristics of objects. | High detailing of small objects. | Precise extraction of details and textures. | Different levels of detailing for a wide range of objects. | Super-detailing of facial features for accurate recognition. |
Local structure features | Focus on general shapes and textures. | Balance between local and global features. | Focus on local details. | Local and global features for accurate scene analysis. | Focus on local facial features for reliable identification. |
Model | Main Activation Zones | Main Type of Anomalies |
---|---|---|
ResNet50 | Forehead, cheeks | Textural distortions, uneven lighting |
EfficientNetB0 | Contours of the mouth, eyes | Facial contour deformations, unnatural light transitions |
Xception | Eye area, lips | Motion anomalies, unnatural texture details |
InceptionV3 | Central part of the face | Color instability, blurring of details |
FaceNet | Entire face structure | Global distortions of shape and proportions |
ResNet50 | Forehead, cheeks | Texture distortions, uneven lighting |
Model (Accuracy/Loss) | With Golden Frames | Without Golden Frames | Without Batch Optimization | Without Dropout | Without Mish Activation |
---|---|---|---|---|---|
ResNet50 | 0.597/0.673 | 0.486/0.701 | 0.431/0.700 | 0.528/0.686 | 0.528/0.692 |
EfficientNetB0 | 0.486/0.700 | 0.597/0.679 | 0.569/0.662 | 0.528/0.684 | 0.514/0.725 |
Xception | 0.625/0.654 | 0.486/0.714 | 0.528/0.691 | 0.556/0.689 | 0.583/0.675 |
InceptionV3 | 0.500/0.722 | 0.486/0.767 | 0.583/0.704 | 0.431/0.746 | 0.514/1.088 |
Facenet | 0.597/0.676 | 0.569/0.714 | 0.583/0.678 | 0.486/0.688 | 0.528/0.778 |
Meta-Model XGBoost | 0.911/0.224 | 0.905/0.334 | 0.903/0.354 | 0.885/0.386 | 0.874/0.456 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lipianina-Honcharenko, K.; Melnyk, N.; Ivasechko, A.; Telka, M.; Illiashenko, O. Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection. Big Data Cogn. Comput. 2025, 9, 109. https://doi.org/10.3390/bdcc9040109
Lipianina-Honcharenko K, Melnyk N, Ivasechko A, Telka M, Illiashenko O. Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection. Big Data and Cognitive Computing. 2025; 9(4):109. https://doi.org/10.3390/bdcc9040109
Chicago/Turabian StyleLipianina-Honcharenko, Khrystyna, Nazar Melnyk, Andriy Ivasechko, Mykola Telka, and Oleg Illiashenko. 2025. "Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection" Big Data and Cognitive Computing 9, no. 4: 109. https://doi.org/10.3390/bdcc9040109
APA StyleLipianina-Honcharenko, K., Melnyk, N., Ivasechko, A., Telka, M., & Illiashenko, O. (2025). Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection. Big Data and Cognitive Computing, 9(4), 109. https://doi.org/10.3390/bdcc9040109