MDPI - Publisher of Open Access Journals

25 pages, 4385 KB

Open AccessArticle

Robust DeepFake Audio Detection via an Improved NeXt-TDNN with Multi-Fused Self-Supervised Learning Features

by Gul Tahaoglu

Appl. Sci. 2025, 15(17), 9685; https://doi.org/10.3390/app15179685 - 3 Sep 2025

Viewed by 1000

Deepfake audio refers to speech that has been synthetically generated or altered through advanced neural network techniques, often with a degree of realism sufficient to convincingly imitate genuine human voices. As these manipulations become increasingly indistinguishable from authentic recordings, they present significant threats to security, undermine media integrity, and challenge the reliability of digital authentication systems. In this study, a robust detection framework is proposed, which leverages the power of self-supervised learning (SSL) and attention-based modeling to identify deepfake audio samples. Specifically, audio features are extracted from input speech using two powerful pretrained SSL models: HuBERT-Large and WavLM-Large. These distinctive features are then integrated through an Attentional Multi-Feature Fusion (AMFF) mechanism. The fused features are subsequently classified using a NeXt-Time Delay Neural Network (NeXt-TDNN) model enhanced with Efficient Channel Attention (ECA), enabling improved temporal and channel-wise feature discrimination. Experimental results show that the proposed method achieves a 0.42% EER and 0.01 min-tDCF on ASVspoof 2019 LA, a 1.01% EER on ASVspoof 2019 PA, and a pooled 6.56% EER on the cross-channel ASVspoof 2021 LA evaluation, thus highlighting its effectiveness for real-world deepfake detection scenarios. Furthermore, on the ASVspoof 5 dataset, the method achieved a 7.23% EER, outperforming strong baselines and demonstrating strong generalization ability. Moreover, the macro-averaged F1-score of 96.01% and balanced accuracy of 99.06% were obtained on the ASVspoof 2019 LA dataset, while the proposed method achieved a macro-averaged F1-score of 98.70% and balanced accuracy of 98.90% on the ASVspoof 2019 PA dataset. On the highly challenging ASVspoof 5 dataset, which includes crowdsourced, non-studio-quality audio, and novel adversarial attacks, the proposed method achieves macro-averaged metrics exceeding 92%, with a precision of 92.07%, a recall of 92.63%, an F1-measure of 92.35%, and a balanced accuracy of 92.63%. Full article

► Show Figures

Figure 1

20 pages, 32621 KB

Open AccessArticle

A Novel Rapeseed Mapping Framework Integrating Image Fusion, Automated Sample Generation, and Deep Learning in Southwest China

by Ruolan Jiang, Xingyin Duan, Song Liao, Ziyi Tang and Hao Li

Land 2025, 14(1), 200; https://doi.org/10.3390/land14010200 - 19 Jan 2025

Cited by 2 | Viewed by 1785

Abstract

Rapeseed mapping is crucial for refined agricultural management and food security. However, existing remote sensing-based methods for rapeseed mapping in Southwest China are severely limited by insufficient training samples and persistent cloud cover. To address the above challenges, this study presents an automatic rapeseed mapping framework that integrates multi-source remote sensing data fusion, automated sample generation, and deep learning models. The framework was applied in Santai County, Sichuan Province, Southwest China, which has typical topographical and climatic characteristics. First, MODIS and Landsat data were used to fill the gaps in Sentinel-2 imagery, creating time-series images through the object-level processing version of the spatial and temporal adaptive reflectance fusion model (OL-STARFM). In addition, a novel spectral phenology approach was developed to automatically generate training samples, which were then input into the improved TS-ConvNeXt ECAPA-TDNN (NeXt-TDNN) deep learning model for accurate rapeseed mapping. The results demonstrated that the OL-STARFM approach was effective in rapeseed mapping. The proposed automated sample generation method proved effective in producing reliable rapeseed samples, achieving a low Dynamic Time Warping (DTW) distance (<0.81) when compared to field samples. The NeXt-TDNN model showed an overall accuracy (OA) of 90.12% and a mean Intersection over Union (mIoU) of 81.96% in Santai County, outperforming other models such as random forest, XGBoost, and UNet-LSTM. These results highlight the effectiveness of the proposed automatic rapeseed mapping framework in accurately identifying rapeseed. This framework offers a valuable reference for monitoring other crops in similar environments. Full article

(This article belongs to the Special Issue Integrating Remote Sensing and Geospatial Big Data for Land Use Mapping and Monitoring (Second Edition))

► Show Figures

Figure 1

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI