Pyramid Feature Attention Network for Speech Resampling Detection

Zhou, Xinyu; Zhang, Yujin; Wang, Yongqi; Tian, Jin; Xu, Shaolun

doi:10.3390/app14114803

Open AccessArticle

Pyramid Feature Attention Network for Speech Resampling Detection

by

Xinyu Zhou

¹,

Yujin Zhang

^1,*

,

Yongqi Wang

¹,

Jin Tian

¹ and

Shaolun Xu

²

¹

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

²

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4803; https://doi.org/10.3390/app14114803

Submission received: 6 May 2024 / Revised: 28 May 2024 / Accepted: 29 May 2024 / Published: 1 June 2024

(This article belongs to the Special Issue Deep Learning for Speech, Image and Language Processing)

Download

Browse Figures

Versions Notes

Abstract

Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.

Keywords: speech resampling detection; feature pyramid network; SE attention; robustness

Share and Cite

MDPI and ACS Style

Zhou, X.; Zhang, Y.; Wang, Y.; Tian, J.; Xu, S. Pyramid Feature Attention Network for Speech Resampling Detection. Appl. Sci. 2024, 14, 4803. https://doi.org/10.3390/app14114803

AMA Style

Zhou X, Zhang Y, Wang Y, Tian J, Xu S. Pyramid Feature Attention Network for Speech Resampling Detection. Applied Sciences. 2024; 14(11):4803. https://doi.org/10.3390/app14114803

Chicago/Turabian Style

Zhou, Xinyu, Yujin Zhang, Yongqi Wang, Jin Tian, and Shaolun Xu. 2024. "Pyramid Feature Attention Network for Speech Resampling Detection" Applied Sciences 14, no. 11: 4803. https://doi.org/10.3390/app14114803

APA Style

Zhou, X., Zhang, Y., Wang, Y., Tian, J., & Xu, S. (2024). Pyramid Feature Attention Network for Speech Resampling Detection. Applied Sciences, 14(11), 4803. https://doi.org/10.3390/app14114803

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pyramid Feature Attention Network for Speech Resampling Detection

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI