Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (2)

Search Parameters:
Keywords = NeXt-TDNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 4385 KB  
Article
Robust DeepFake Audio Detection via an Improved NeXt-TDNN with Multi-Fused Self-Supervised Learning Features
by Gul Tahaoglu
Appl. Sci. 2025, 15(17), 9685; https://doi.org/10.3390/app15179685 - 3 Sep 2025
Viewed by 1000
Abstract
Deepfake audio refers to speech that has been synthetically generated or altered through advanced neural network techniques, often with a degree of realism sufficient to convincingly imitate genuine human voices. As these manipulations become increasingly indistinguishable from authentic recordings, they present significant threats [...] Read more.
Deepfake audio refers to speech that has been synthetically generated or altered through advanced neural network techniques, often with a degree of realism sufficient to convincingly imitate genuine human voices. As these manipulations become increasingly indistinguishable from authentic recordings, they present significant threats to security, undermine media integrity, and challenge the reliability of digital authentication systems. In this study, a robust detection framework is proposed, which leverages the power of self-supervised learning (SSL) and attention-based modeling to identify deepfake audio samples. Specifically, audio features are extracted from input speech using two powerful pretrained SSL models: HuBERT-Large and WavLM-Large. These distinctive features are then integrated through an Attentional Multi-Feature Fusion (AMFF) mechanism. The fused features are subsequently classified using a NeXt-Time Delay Neural Network (NeXt-TDNN) model enhanced with Efficient Channel Attention (ECA), enabling improved temporal and channel-wise feature discrimination. Experimental results show that the proposed method achieves a 0.42% EER and 0.01 min-tDCF on ASVspoof 2019 LA, a 1.01% EER on ASVspoof 2019 PA, and a pooled 6.56% EER on the cross-channel ASVspoof 2021 LA evaluation, thus highlighting its effectiveness for real-world deepfake detection scenarios. Furthermore, on the ASVspoof 5 dataset, the method achieved a 7.23% EER, outperforming strong baselines and demonstrating strong generalization ability. Moreover, the macro-averaged F1-score of 96.01% and balanced accuracy of 99.06% were obtained on the ASVspoof 2019 LA dataset, while the proposed method achieved a macro-averaged F1-score of 98.70% and balanced accuracy of 98.90% on the ASVspoof 2019 PA dataset. On the highly challenging ASVspoof 5 dataset, which includes crowdsourced, non-studio-quality audio, and novel adversarial attacks, the proposed method achieves macro-averaged metrics exceeding 92%, with a precision of 92.07%, a recall of 92.63%, an F1-measure of 92.35%, and a balanced accuracy of 92.63%. Full article
Show Figures

Figure 1

20 pages, 32621 KB  
Article
A Novel Rapeseed Mapping Framework Integrating Image Fusion, Automated Sample Generation, and Deep Learning in Southwest China
by Ruolan Jiang, Xingyin Duan, Song Liao, Ziyi Tang and Hao Li
Land 2025, 14(1), 200; https://doi.org/10.3390/land14010200 - 19 Jan 2025
Cited by 2 | Viewed by 1785
Abstract
Rapeseed mapping is crucial for refined agricultural management and food security. However, existing remote sensing-based methods for rapeseed mapping in Southwest China are severely limited by insufficient training samples and persistent cloud cover. To address the above challenges, this study presents an automatic [...] Read more.
Rapeseed mapping is crucial for refined agricultural management and food security. However, existing remote sensing-based methods for rapeseed mapping in Southwest China are severely limited by insufficient training samples and persistent cloud cover. To address the above challenges, this study presents an automatic rapeseed mapping framework that integrates multi-source remote sensing data fusion, automated sample generation, and deep learning models. The framework was applied in Santai County, Sichuan Province, Southwest China, which has typical topographical and climatic characteristics. First, MODIS and Landsat data were used to fill the gaps in Sentinel-2 imagery, creating time-series images through the object-level processing version of the spatial and temporal adaptive reflectance fusion model (OL-STARFM). In addition, a novel spectral phenology approach was developed to automatically generate training samples, which were then input into the improved TS-ConvNeXt ECAPA-TDNN (NeXt-TDNN) deep learning model for accurate rapeseed mapping. The results demonstrated that the OL-STARFM approach was effective in rapeseed mapping. The proposed automated sample generation method proved effective in producing reliable rapeseed samples, achieving a low Dynamic Time Warping (DTW) distance (<0.81) when compared to field samples. The NeXt-TDNN model showed an overall accuracy (OA) of 90.12% and a mean Intersection over Union (mIoU) of 81.96% in Santai County, outperforming other models such as random forest, XGBoost, and UNet-LSTM. These results highlight the effectiveness of the proposed automatic rapeseed mapping framework in accurately identifying rapeseed. This framework offers a valuable reference for monitoring other crops in similar environments. Full article
Show Figures

Figure 1

Back to TopTop