MDPI - Publisher of Open Access Journals

21 pages, 13964 KB

Open AccessArticle

Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness

by Hao Chen, Zhengxu Zhang, Qin Li and Chunhui Feng

Algorithms 2026, 19(4), 270; https://doi.org/10.3390/a19040270 - 1 Apr 2026

Viewed by 240

As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized [...] Read more.

As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized facial tampering artifacts. Meanwhile, traditional convolutional methods excel at extracting local image features but struggle to incorporate prior knowledge about facial anatomy, resulting in limited representational capability. To address these limitations, this paper proposes LGMamba, a novel detection framework that integrates facial guidance focusing on key facial components and fine-grained detail regions commonly manipulated in deepfakes with global modeling. First, we introduce an innovative Landmark-Guided Convolution (LGConv), which adaptively adjusts convolutional sampling positions using facial landmark information. This allows the model to attend to forgery-prone facial regions, such as the eyes and mouth. Second, we design a parallel Facial Structure Awareness Block (FSAB) to operate alongside the VMamba-based visual State-Space Model. Equipped with a multi-stage residual design and a CBAM attention mechanism, FSAB enhances the model’s sensitivity to subtle facial artifacts, enabling joint exploitation of global semantic consistency and fine-grained forgery cues within a unified architecture. The proposed LGMamba achieves superior performance compared to existing mainstream approaches. In cross-dataset evaluations, it attains AUC scores of 92.34% on CD1 and 96.01% on CD2, outperforming all compared methods. Full article

► Show Figures

Figure 1

22 pages, 3493 KB

Open AccessArticle

Deepfake Detection Using Multimodal CLIP-Based SigLIP-2 Vision Transformers

by Joe Soundararajan and Dong Xu

AI 2026, 7(3), 115; https://doi.org/10.3390/ai7030115 - 19 Mar 2026

Viewed by 874

Abstract

Background: Deepfakes pose a growing threat to the integrity of visual media, motivating detectors that remain reliable as forgeries become increasingly realistic. Methods: We propose a deepfake detection framework built on CLIP-derived SigLIP-2 vision transformers and a multi-task design that jointly performs (i) [...] Read more.

Background: Deepfakes pose a growing threat to the integrity of visual media, motivating detectors that remain reliable as forgeries become increasingly realistic. Methods: We propose a deepfake detection framework built on CLIP-derived SigLIP-2 vision transformers and a multi-task design that jointly performs (i) classification and (ii) manipulated-region localization when pixel-level supervision is available. We evaluated the approach on three public benchmarks of increasing complexity—HiDF, SID_Set (SIDA), and CiFake—using each dataset’s official partitions where provided (SID_Set uses the predefined train/validation split) and a standardized preprocessing and training pipeline across experiments. Results: On HiDF, our model achieved strong performance on both video and image tracks (AUC up to 0.931 on video and 0.968 on images), yielding large gains relative to previously reported HiDF baselines under their published settings. On SID_Set, the model achieved 99.1% three-class accuracy (real/synthetic/tampered) and produced accurate localization masks for many tampered regions, while we explicitly documented the split protocol and leakage checks to support the validity of the evaluation. On CiFake, the model exceeded 95% accuracy and attained an AUC of 0.986. Conclusions: Overall, the results indicate that SigLIP-2 representations combined with multi-task training can deliver high detection accuracy and interpretable localization on challenging, realistic forgeries, while highlighting the importance of clearly stated evaluation protocols for fair comparison. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

25 pages, 3276 KB

Open AccessArticle

SIDWA: Synthetic Image Detection Based on Discrete Wavelet Transform Stem and Deformable Sliding Window Cross-Attention

by Luo Li, Tianyi Lu, Jiaxin Song and Ke Cheng

Electronics 2026, 15(4), 891; https://doi.org/10.3390/electronics15040891 - 21 Feb 2026

Viewed by 319

Abstract

With the rapid evolution of Generative Adversarial Networks (GANs) and diffusion models (DMs), the detection of synthetic images faces significant challenges due to non-rigid artifacts and complex frequency biases. In this paper, we propose SIDWA, a novel dual-branch detection framework that leverages the [...] Read more.

With the rapid evolution of Generative Adversarial Networks (GANs) and diffusion models (DMs), the detection of synthetic images faces significant challenges due to non-rigid artifacts and complex frequency biases. In this paper, we propose SIDWA, a novel dual-branch detection framework that leverages the synergy between frequency and spatial domains. Within the spatial branch, we design a Deformable Sliding Window Cross-Attention (DSWA) module, which utilizes a learnable offset mechanism to dynamically warp the receptive field, effectively capturing distorted edges and non-linear texture features. Simultaneously, the Discrete Wavelet Transform (DWT) Stem decomposes input images into multi-scale sub-bands to preserve crucial high-frequency residues. Through a Frequency-Semantic Resonance Projector (FSRP) strategy, the semantic priors from the spatial branch act as queries to guide the model toward localized frequency anomalies, achieving a unified “where to look” and “how to analyze” approach. Experimental results for the SIDataset (SIDset) benchmark demonstrate that Synthetic Image Detection based on Discrete Wavelet Transform Stem and Deformable Sliding Window Cross-Attention (SIDWA) achieves superior performance, with an average accuracy exceeding 95% and a competitive inference time of 18.2 ms on an NVIDIA A100 GPU. Ablation studies further validate the critical role of learnable offsets and frequency integration in enhancing robustness and generalization. SIDWA offers an efficient and reliable forensic solution for combating the growing threats of sophisticated generative forgeries. Full article

► Show Figures

Figure 1

21 pages, 3872 KB

Open AccessArticle

IoT-Oriented Security for Small Sensor Systems Using DnCNN Denoising and Multimodal Feature Fusion for Image Forgery Detection

by Nimra Nasir, Syeda Sitara Waseem, Muhammad Bilal and Syed Rizwan Hassan

Sensors 2026, 26(4), 1172; https://doi.org/10.3390/s26041172 - 11 Feb 2026

Viewed by 330

Abstract

With ongoing growth in the implementation of CCTV networks, miniature sensors, and IoT devices, the quality of captured images in terms of authenticity has become a major security issue. Through advanced editing tools and generative models, the capability now exists to perform highly [...] Read more.

With ongoing growth in the implementation of CCTV networks, miniature sensors, and IoT devices, the quality of captured images in terms of authenticity has become a major security issue. Through advanced editing tools and generative models, the capability now exists to perform highly advanced forgeries that fail both human perception and traditional algorithms, and especially in terms of sensor-generated content. State-of-the-art algorithms typically use a single-cue characteristic in their models to stabilize performance, including local noise statistics or structural disruption patterns, making them susceptible to varied forms of manipulation. As a solution to this issue, we have developed MultiFusion, a new forgery detection framework which combines complementary forensic cues in images: SRM-based noise residuals, hierarchical texture features based on EfficientNet-B0, and global structural relationships from a vision transformer. A special DnCNN denoising preprocessing layer represses sensor noise and maintains fine traces of tampering. To achieve better interpretability, we combine Grad-cam images of the convolutional flow and transformer attention maps to create on-unit interpretable heatmaps, the areas of which identify regions of manipulation. Experimental verification on the CASIA 2.0 standard shows high detection accuracy (96.69) and good generalization. Via normalized denoising, multimodal feature fusion, and explainable AI, our framework takes CCTV, sensor forensics, and IoT image authentication to the next level. Full article

(This article belongs to the Special Issue Secure and Resilient Solutions for CCTV, Small Sensor and IoT Device Security)

► Show Figures

Figure 1

22 pages, 4477 KB

Open AccessArticle

Robust Detection and Localization of Image Copy-Move Forgery Using Multi-Feature Fusion

by Kaiqi Lu and Qiuyu Zhang

J. Imaging 2026, 12(2), 75; https://doi.org/10.3390/jimaging12020075 - 10 Feb 2026

Viewed by 526

Abstract

Copy-move forgery detection (CMFD) is a crucial image forensics analysis technique. The rapid development of deep learning algorithms has led to impressive advancements in CMFD. However, existing models suffer from two key limitations: Their feature fusion modules insufficiently exploit the complementary nature of [...] Read more.

Copy-move forgery detection (CMFD) is a crucial image forensics analysis technique. The rapid development of deep learning algorithms has led to impressive advancements in CMFD. However, existing models suffer from two key limitations: Their feature fusion modules insufficiently exploit the complementary nature of features from the RGB domain and noise domain, resulting in suboptimal feature representations. During decoding, they simply classify pixels as authentic or forged, without aggregating cross-layer information or integrating local and global attention mechanisms, leading to unsatisfactory detection precision. To overcome these limitations, a robust detection and localization approach to image copy-move forgery using multi-feature fusion is proposed. Firstly, a Multi-Feature Fusion Network (MFFNet) was designed. Within its feature fusion module, features from both the RGB domain and noise domain were fused to enable mutual complementarity between distinct characteristics, yielding richer feature information. Then, a Lightweight Multi-layer Perceptron Decoder (LMPD) was developed for image reconstruction and forgery localization map generation. Finally, by aggregating information from different layers and combining local and global attention mechanisms, more accurate prediction masks were obtained. The experimental results demonstrate that the proposed MFFNet model exhibits enhanced robustness and superior detection and localization performance compared to existing methods when faced with JPEG compression, noise addition, and resizing operations. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

19 pages, 1747 KB

Open AccessArticle

Video Deepfake Detection Based on Multimodality Semantic Consistency Fusion

by Fang Sun, Xiaoxuan Guo, Tong Zhang, Yang Liu and Jing Zhang

Future Internet 2026, 18(2), 67; https://doi.org/10.3390/fi18020067 - 23 Jan 2026

Viewed by 672

Abstract

Deepfake detection in video data typically relies on mining deep embedded representations across multiple modalities to obtain discriminative fused features and thereby improve detection accuracy. However, existing approaches predominantly focus on how to exploit complementary information across modalities to ensure effective fusion, while [...] Read more.

Deepfake detection in video data typically relies on mining deep embedded representations across multiple modalities to obtain discriminative fused features and thereby improve detection accuracy. However, existing approaches predominantly focus on how to exploit complementary information across modalities to ensure effective fusion, while often overlooking the impact of noise and interference present in the data. For instance, issues such as small objects, blurring, and occlusions in the visual modality can disrupt the semantic consistency of the fused features. To address this, we propose a Multimodality Semantic Consistency Fusion model for video forgery detection. The model introduces a semantic consistency gating mechanism to enhance the embedding of semantically aligned information across modalities, thereby improving the discriminability of the fused representations. Furthermore, we incorporate an event-level weakly supervised loss to strengthen the global semantic discrimination of the video data. Extensive experiments on standard video forgery detection benchmarks demonstrate the effectiveness of the proposed method, achieving superior performance in both forgery event detection and localization compared to state-of-the-art approaches. Full article

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—4th Edition)

► Show Figures

Figure 1

22 pages, 3358 KB

Open AccessArticle

Driving into the Unknown: Investigating and Addressing Security Breaches in Vehicle Infotainment Systems

by Minrui Yan, George Crane, Dean Suillivan and Haoqi Shan

Sensors 2026, 26(1), 77; https://doi.org/10.3390/s26010077 - 22 Dec 2025

Viewed by 1615

Abstract

The rise of connected and automated vehicles has transformed in-vehicle infotainment (IVI) systems into critical gateways linking user interfaces, vehicular networks, and cloud-based fleet services. A concerning architectural reality is that hardcoded credentials like access point names (APNs) in IVI firmware create a [...] Read more.

The rise of connected and automated vehicles has transformed in-vehicle infotainment (IVI) systems into critical gateways linking user interfaces, vehicular networks, and cloud-based fleet services. A concerning architectural reality is that hardcoded credentials like access point names (APNs) in IVI firmware create a cross-layer attack surface where local exposure can escalate into entire vehicle fleets being remotely compromised. To address this risk, we propose a cross-layer security framework that integrates firmware extraction, symbolic execution, and targeted fuzzing to reconstruct authentic IVI-to-backend interactions and uncover high-impact web vulnerabilities such as server-side request forgery (SSRF) and broken access control. Applied across seven diverse automotive systems, including major original equipment manufacturers (OEMs) (Mercedes-Benz, Tesla, SAIC, FAW-VW, Denza), Tier-1 supplier Bosch, and advanced driver assistance systems (ADAS) vendor Minieye, our approach exposes systemic anti-patterns and demonstrates a fully realized exploit that enables remote control of approximately six million Mercedes-Benz vehicles. All 23 discovered vulnerabilities, including seven CVEs, were patched within one month. In closed automotive ecosystems, we argue that the true measure of efficacy lies not in maximizing code coverage but in discovering actionable, fleet-wide attack paths, which is precisely what our approach delivers. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

14 pages, 2365 KB

Open AccessArticle

Seam Carving Forgery Detection Through Multi-Perspective Explainable AI

by Miguel José das Neves, Felipe Rodrigues Perche Mahlow, Renato Dias de Souza, Paulo Roberto G. Hernandes, José Remo Ferreira Brega and Kelton Augusto Pontara da Costa

J. Imaging 2025, 11(11), 416; https://doi.org/10.3390/jimaging11110416 - 18 Nov 2025

Viewed by 747

Abstract

This paper addresses the critical challenge of detecting content-aware image manipulations, specifically focusing on seam carving forgery. While deep learning models, particularly Convolutional Neural Networks (CNNs), have shown promise in this area, their black-box nature limits their trustworthiness in high-stakes domains like digital [...] Read more.

This paper addresses the critical challenge of detecting content-aware image manipulations, specifically focusing on seam carving forgery. While deep learning models, particularly Convolutional Neural Networks (CNNs), have shown promise in this area, their black-box nature limits their trustworthiness in high-stakes domains like digital forensics. To address this gap, we propose and validate a framework for interpretable forgery detection, termed E-XAI (Ensemble Explainable AI). Conceptually inspired by Ensemble Learning, our framework’s novelty lies not in combining predictive models, but in integrating a multi-perspective ensemble of explainability techniques. Specifically, we combine SHAP for fine-grained, pixel-level feature attribution with Grad-CAM for region-level localization to create a more robust and holistic interpretation of a single, custom-trained CNN’s decisions. Our approach is validated on a purpose-built, balanced, binary-class dataset of 10,300 images. The results demonstrate high classification performance on an unseen test set, with a 95% accuracy and a 99% precision for the forged class. Furthermore, we analyze the model’s robustness against JPEG compression, a common real-world perturbation. More importantly, the application of the E-XAI framework reveals how the model identifies subtle forgery artifacts, providing transparent, visual evidence for its decisions. This work contributes a robust end-to-end pipeline for interpretable image forgery detection, enhancing the trust and reliability of AI systems in information security. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

24 pages, 1034 KB

Open AccessArticle

MMFD-Net: A Novel Network for Image Forgery Detection and Localization via Multi-Stream Edge Feature Learning and Multi-Dimensional Information Fusion

by Haichang Yin, KinTak U, Jing Wang and Zhuofan Gan

Mathematics 2025, 13(19), 3136; https://doi.org/10.3390/math13193136 - 1 Oct 2025

Cited by 2 | Viewed by 1405

Abstract

With the rapid advancement of image processing techniques, digital image forgery detection has emerged as a critical research area in information forensics. This paper proposes a novel deep learning model based on Multi-view Multi-dimensional Forgery Detection Networks (MMFD-Net), designed to simultaneously determine whether [...] Read more.

With the rapid advancement of image processing techniques, digital image forgery detection has emerged as a critical research area in information forensics. This paper proposes a novel deep learning model based on Multi-view Multi-dimensional Forgery Detection Networks (MMFD-Net), designed to simultaneously determine whether an image has been tampered with and precisely localize the forged regions. By integrating a Multi-stream Edge Feature Learning module with a Multi-dimensional Information Fusion module, MMFD-Net employs joint supervised learning to extract semantics-agnostic forgery features, thereby enhancing both detection performance and model generalization. Extensive experiments demonstrate that MMFD-Net achieves state-of-the-art results on multiple public datasets, excelling in both pixel-level localization and image-level classification tasks, while maintaining robust performance in complex scenarios. Full article

(This article belongs to the Special Issue Applied Mathematics in Data Science and High-Performance Computing)

► Show Figures

Figure 1

32 pages, 28257 KB

Open AccessArticle

Reconstruction of Security Patterns Using Cross-Spectral Constraints in Smartphones

by Tianyu Wang, Hong Zheng, Zhenhua Xiao and Tao Tao

Appl. Sci. 2025, 15(18), 10085; https://doi.org/10.3390/app151810085 - 15 Sep 2025

Viewed by 792

Abstract

The widespread presence of security patterns in modern anti-forgery systems has given rise to an urgent need for reliable smartphone authentication. However, persistent recognition inaccuracies occur because of the inherent degradation of patterns during smartphone capture. These acquisition-related artifacts are manifested as both [...] Read more.

The widespread presence of security patterns in modern anti-forgery systems has given rise to an urgent need for reliable smartphone authentication. However, persistent recognition inaccuracies occur because of the inherent degradation of patterns during smartphone capture. These acquisition-related artifacts are manifested as both spectral distortions in high-frequency components and structural corruption in the spatial domain, which essentially limit current verification systems. This paper addresses these two challenges through four key innovative aspects: (1) It introduces a chromatic-adaptive coupled oscillation mechanism to reduce noise. (2) It develops a DFT-domain processing pipeline. This pipeline includes micro-feature degradation modeling to detect high-frequency pattern elements and directional energy concentration for characterizing motion blur. (3) It utilizes complementary spatial-domain constraints. These involve brightness variation for local consistency and edge gradients for local sharpness, which are jointly optimized by combining maximum a posteriori estimation and maximum likelihood estimation. (4) It proposes an adaptive graph-based partitioning strategy. This strategy enables spatially variant kernel estimation, while maintaining computational efficiency. Experimental results showed that our method achieved excellent performance in terms of deblurring effectiveness, runtime, and recognition accuracy. This achievement enables near real-time processing on smartphones, without sacrificing restoration quality, even under difficult blurring conditions. Full article

► Show Figures

Figure 1

30 pages, 16517 KB

Open AccessArticle

An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform

by Yinfei Xiao, Yanbing Zhou, Pengzhan Cheng, Leqian Ni, Xusheng Wu and Tianxiang Zheng

Mathematics 2025, 13(16), 2576; https://doi.org/10.3390/math13162576 - 12 Aug 2025

Viewed by 2458

Abstract

As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, [...] Read more.

As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, whereas frequency-based analyses exhibit promise in identifying nuanced local cues; however, the absence of global contexts impedes the capacity of detection methods to improve generalization. This study introduces a hybrid architecture that integrates Efficient-ViT and multi-level wavelet transform to dynamically merge spatial and frequency features through a dynamic adaptive multi-branch attention (DAMA) mechanism, thereby improving the deep interaction between the two modalities. We innovatively devise a joint loss function and a training strategy to address the imbalanced data issue and improve the training process. Experimental results on the FaceForensics++ and Celeb-DF (V2) have validated the effectiveness of our approach, attaining 97.07% accuracy in intra-dataset evaluations and a 74.7% AUC score in cross-dataset assessments, surpassing our baseline Efficient-ViT by 14.1% and 7.7%, respectively. The findings indicate that our approach excels in generalization across various datasets and methodologies, while also effectively minimizing feature redundancy through an innovative orthogonal loss that regularizes the feature space, as evidenced by the ablation study and parameter analysis. Full article

► Show Figures

Figure 1

14 pages, 1632 KB

Open AccessArticle

Try It Before You Buy It: A Non-Invasive Authenticity Assessment of a Purported Phoenician Head-Shaped Pendant (Cáceres, Spain)

by Valentina Lončarić, Pedro Barrulas, José Miguel González Bornay and Mafalda Costa

Heritage 2025, 8(8), 308; https://doi.org/10.3390/heritage8080308 - 1 Aug 2025

Viewed by 839

Abstract

Museums may acquire archaeological artefacts discovered by non-specialists or amateur archaeologists, holding the potential to promote the safeguarding of cultural heritage by integrating the local community in their activities. However, this also creates an opportunity for the fraudulent sale of modern forgeries presented [...] Read more.

Museums may acquire archaeological artefacts discovered by non-specialists or amateur archaeologists, holding the potential to promote the safeguarding of cultural heritage by integrating the local community in their activities. However, this also creates an opportunity for the fraudulent sale of modern forgeries presented as archaeological artefacts, resulting in the need for a critical assessment of the artefact’s authenticity prior to acquisition by the museum. In 2019, the regional museum in Cáceres (Spain) was offered the opportunity to acquire a Phoenician-Punic head pendant, allegedly discovered in the vicinity of the city. The artefact’s authenticity was assessed by traditional approaches, including typological analysis and analysis of manufacture technique, which raised doubts about its purported age. VP-SEM-EDS analysis of the chemical composition of the different glass portions comprising the pendant was used for non-invasive determination of glassmaking recipes, enabling the identification of glass components incompatible with known Iron Age glassmaking recipes from the Mediterranean. Further comparison with historical and modern glassmaking recipes allowed for the identification of the artefact as a recent forgery made from glasses employing modern colouring and opacifying techniques. Full article

► Show Figures

Figure 1

18 pages, 5013 KB

Open AccessArticle

Enhancing Document Forgery Detection with Edge-Focused Deep Learning

by Yong-Yeol Bae, Dae-Jea Cho and Ki-Hyun Jung

Symmetry 2025, 17(8), 1208; https://doi.org/10.3390/sym17081208 - 30 Jul 2025

Cited by 1 | Viewed by 6928

Abstract

Detecting manipulated document images is essential for verifying the authenticity of official records and preventing document forgery. However, forgery artifacts are often subtle and localized in fine-grained regions, such as text boundaries or character outlines, where visual symmetry and structural regularity are typically [...] Read more.

Detecting manipulated document images is essential for verifying the authenticity of official records and preventing document forgery. However, forgery artifacts are often subtle and localized in fine-grained regions, such as text boundaries or character outlines, where visual symmetry and structural regularity are typically expected. These manipulations can disrupt the inherent symmetry of document layouts, making the detection of such inconsistencies crucial for forgery identification. Conventional CNN-based models face limitations in capturing such edge-level asymmetric features, as edge-related information tends to weaken through repeated convolution and pooling operations. To address this issue, this study proposes an edge-focused method composed of two components: the Edge Attention (EA) layer and the Edge Concatenation (EC) layer. The EA layer dynamically identifies channels that are highly responsive to edge features in the input feature map and applies learnable weights to emphasize them, enhancing the representation of boundary-related information, thereby emphasizing structurally significant boundaries. Subsequently, the EC layer extracts edge maps from the input image using the Sobel filter and concatenates them with the original feature maps along the channel dimension, allowing the model to explicitly incorporate edge information. To evaluate the effectiveness and compatibility of the proposed method, it was initially applied to a simple CNN architecture to isolate its impact. Subsequently, it was integrated into various widely used models, including DenseNet121, ResNet50, Vision Transformer (ViT), and a CAE-SVM-based document forgery detection model. Experiments were conducted on the DocTamper, Receipt, and MIDV-2020 datasets to assess classification accuracy and F1-score using both original and forged text images. Across all model architectures and datasets, the proposed EA–EC method consistently improved model performance, particularly by increasing sensitivity to asymmetric manipulations around text boundaries. These results demonstrate that the proposed edge-focused approach is not only effective but also highly adaptable, serving as a lightweight and modular extension that can be easily incorporated into existing deep learning-based document forgery detection frameworks. By reinforcing attention to structural inconsistencies often missed by standard convolutional networks, the proposed method provides a practical solution for enhancing the robustness and generalizability of forgery detection systems. Full article

(This article belongs to the Special Issue Deep Learning and Deep Learning Synergy of Transformers and Symmetry in Small Object Detection and Tracking)

► Show Figures

Figure 1

18 pages, 2502 KB

Open AccessArticle

Learning Local Texture and Global Frequency Clues for Face Forgery Detection

by Xin Jin, Yuru Kou, Yuhao Xie, Yuying Zhao, Miss Laiha Mat Kiah, Qian Jiang and Wei Zhou

Biomimetics 2025, 10(8), 480; https://doi.org/10.3390/biomimetics10080480 - 22 Jul 2025

Cited by 1 | Viewed by 1364

Abstract

In recent years, the rapid advancement of deep learning techniques has significantly propelled the development of face forgery methods, drawing considerable attention to face forgery detection. However, existing detection methods still struggle with generalization across different datasets and forgery techniques. In this work, [...] Read more.

In recent years, the rapid advancement of deep learning techniques has significantly propelled the development of face forgery methods, drawing considerable attention to face forgery detection. However, existing detection methods still struggle with generalization across different datasets and forgery techniques. In this work, we address this challenge by leveraging both local texture cues and global frequency domain information in a complementary manner to enhance the robustness of face forgery detection. Specifically, we introduce a local texture mining and enhancement module. The input image is segmented into patches and a subset is strategically masked, then texture enhanced. This joint masking and enhancement strategy forces the model to focus on generalizable localized texture traces, mitigates overfitting to specific identity features and enabling the model to capture more meaningful subtle traces of forgery. Additionally, we extract multi-scale frequency domain features from the face image using wavelet transform, thereby preserving various frequency domain characteristics of the image. And we propose an innovative frequency-domain processing strategy to adjust the contributions of different frequency-domain components through frequency-domain selection and dynamic weighting. This Facilitates the model’s ability to uncover frequency-domain inconsistencies across various global frequency layers. Furthermore, we propose an integrated framework that combines these two feature modalities, enhanced with spatial attention and channel attention mechanisms, to foster a synergistic effect. Extensive experiments conducted on several benchmark datasets demonstrate that the proposed technique demonstrates superior performance and generalization capabilities compared to existing methods. Full article

(This article belongs to the Special Issue Exploration of Bioinspired Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 2308 KB

Open AccessArticle

Forgery-Aware Guided Spatial–Frequency Feature Fusion for Face Image Forgery Detection

by Zhenxiang He, Zhihao Liu and Ziqi Zhao

Symmetry 2025, 17(7), 1148; https://doi.org/10.3390/sym17071148 - 18 Jul 2025

Cited by 1 | Viewed by 2739

Abstract

The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they [...] Read more.

The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they still suffer from limited sensitivity to local forgery regions and inadequate interaction between spatial and frequency information in practical applications. To address these challenges, we propose a novel forgery-aware guided spatial–frequency feature fusion network. A lightweight U-Net is employed to generate pixel-level saliency maps by leveraging structural symmetry and semantic consistency, without relying on ground-truth masks. These maps dynamically guide the fusion of spatial features (from an improved Swin Transformer) and frequency features (via Haar wavelet transforms). Cross-domain attention, channel recalibration, and spatial gating are introduced to enhance feature complementarity and regional discrimination. Extensive experiments conducted on two benchmark face forgery datasets, FaceForensics++ and Celeb-DFv2, show that the proposed method consistently outperforms existing state-of-the-art techniques in terms of detection accuracy and generalization capability. The future work includes improving robustness under compression, incorporating temporal cues, extending to multimodal scenarios, and evaluating model efficiency for real-world deployment. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI