Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform

Chen, Yan-Tong; Wu, Zong-Tai; Hung, Jeih-Weih

doi:10.3390/app13105992

Open AccessArticle

Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform^†

by

Yan-Tong Chen

,

Zong-Tai Wu

and

Jeih-Weih Hung

^*

Department of Electrical Engineering, National Chi Nan University, Nantou 54561, Taiwan

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in the Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING), Taipei, Taiwan, 21–22 November 2022.

Appl. Sci. 2023, 13(10), 5992; https://doi.org/10.3390/app13105992

Submission received: 20 March 2023 / Revised: 4 May 2023 / Accepted: 10 May 2023 / Published: 12 May 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

Nowadays, time-domain features see wide use in speech enhancement (SE) networks such as frequency-domain features to achieve excellent performance in eliminating noise from input utterances. This study primarily investigates how to extract information from time-domain utterances to create more effective features in SE. We extend our recent work by employing sub-signals which dwell in multiple acoustic frequency bands in the time domain and integrating them into a unified time-domain feature set. The discrete wavelet transform (DWT) is applied to decompose each input frame signal to obtain sub-band signals, and a projection fusion process is performed on these signals to create the ultimate features. The corresponding fusion strategy is either bi-projection fusion (BPF) or multiple projection fusion (MPF). In short, MPF exploits the softmax function to replace the sigmoid function in order to create ratio masks for multiple feature sources. The concatenation of fused DWT features and time features serves as the encoder output of two celebrated SE frameworks, the fully convolutional time-domain audio separation network (Conv-TasNet) and the dual-path transformer network (DPTNet), to estimate the mask and then produce the enhanced time-domain utterances. The evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks, and the results reveal that the proposed method achieves higher speech quality and intelligibility than the original Conv-TasNet that uses time features only, indicating that the fusion of DWT features created from the input utterances can benefit time features to learn a superior Conv-TasNet/DPTNet network in SE.

Keywords: speech enhancement; discrete wavelet transform; cross-domain; Conv-TasNet; bi-projection fusion; multiple projection fusion

Share and Cite

MDPI and ACS Style

Chen, Y.-T.; Wu, Z.-T.; Hung, J.-W. Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform. Appl. Sci. 2023, 13, 5992. https://doi.org/10.3390/app13105992

AMA Style

Chen Y-T, Wu Z-T, Hung J-W. Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform. Applied Sciences. 2023; 13(10):5992. https://doi.org/10.3390/app13105992

Chicago/Turabian Style

Chen, Yan-Tong, Zong-Tai Wu, and Jeih-Weih Hung. 2023. "Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform" Applied Sciences 13, no. 10: 5992. https://doi.org/10.3390/app13105992

APA Style

Chen, Y.-T., Wu, Z.-T., & Hung, J.-W. (2023). Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform. Applied Sciences, 13(10), 5992. https://doi.org/10.3390/app13105992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform^†

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform †

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Cross-Domain Conv-TasNet Speech Enhancement Model with Two-Level Bi-Projection Fusion of Discrete Wavelet Transform^†