Next Article in Journal
An Investigation of Fly Ash and Slag Processing and Fiber Production Using Plasma Technology
Next Article in Special Issue
An Audio Copy-Move Forgery Localization Model by CNN-Based Spectral Analysis
Previous Article in Journal
Particle Sizing and Surface Area Measurements: A Comparative Assessment of Commercial Air Permeability and Laser Light Diffraction Instruments
Previous Article in Special Issue
Branch-Transformer: A Parallel Branch Architecture to Capture Local and Global Features for Language Identification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pyramid Feature Attention Network for Speech Resampling Detection

1
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
2
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(11), 4803; https://doi.org/10.3390/app14114803
Submission received: 6 May 2024 / Revised: 28 May 2024 / Accepted: 29 May 2024 / Published: 1 June 2024
(This article belongs to the Special Issue Deep Learning for Speech, Image and Language Processing)

Abstract

Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.
Keywords: speech resampling detection; feature pyramid network; SE attention; robustness speech resampling detection; feature pyramid network; SE attention; robustness

Share and Cite

MDPI and ACS Style

Zhou, X.; Zhang, Y.; Wang, Y.; Tian, J.; Xu, S. Pyramid Feature Attention Network for Speech Resampling Detection. Appl. Sci. 2024, 14, 4803. https://doi.org/10.3390/app14114803

AMA Style

Zhou X, Zhang Y, Wang Y, Tian J, Xu S. Pyramid Feature Attention Network for Speech Resampling Detection. Applied Sciences. 2024; 14(11):4803. https://doi.org/10.3390/app14114803

Chicago/Turabian Style

Zhou, Xinyu, Yujin Zhang, Yongqi Wang, Jin Tian, and Shaolun Xu. 2024. "Pyramid Feature Attention Network for Speech Resampling Detection" Applied Sciences 14, no. 11: 4803. https://doi.org/10.3390/app14114803

APA Style

Zhou, X., Zhang, Y., Wang, Y., Tian, J., & Xu, S. (2024). Pyramid Feature Attention Network for Speech Resampling Detection. Applied Sciences, 14(11), 4803. https://doi.org/10.3390/app14114803

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop