Noise Reduction Combining a General Microphone and a Throat Microphone
Abstract
:1. Introduction
2. Binary Mask Using Two Microphones
2.1. Problem Formulation
2.2. Noise Reduction Using Binary Mask
3. Proposed Approach
3.1. Overview of the Proposed Method
3.2. Formulation by Equation
4. Experiment
4.1. Experimental Method
4.2. Quantitative Evaluation of Experiments
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Loizou, P.C. Speech Enhancement: Theory and Practice, 2nd ed.; CRC Press: London, UK, 2007. [Google Scholar]
- Weiss, M.; Aschkenasy, E.; Parsons, T. Study and Development of the INTEL Technique for Improving Speech Intelligibility; Technical Report NSC-FR/4023; Nicolet Scientific Corporation: Northvale, NJ, USA, 1975. [Google Scholar]
- Boll, S.F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process 1979, 27, 113–120. [Google Scholar] [CrossRef] [Green Version]
- Yamashita, K.; Ogata, S.; Shimamura, T. Improved spectral subtraction utilizing iterative processing. IEICE Trans Fundametals 2005, J88-A, 1246–1257. [Google Scholar] [CrossRef]
- McAulay, R.J.; Malpass, M.L. Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process 1980, 28, 37–145. [Google Scholar] [CrossRef]
- Dendrinos, M.; Bakamides, S.; Carayannis, G. Speech enhancement from noise: A regenerative approach. Speech Commun. 1991, 10, 45–57. [Google Scholar] [CrossRef]
- Ephraim, Y.; Van Trees, H.L. A signal subspace approach for speech enhancement. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing, Minneapolis, MN, USA, 27–30 April 1993; pp. 355–358. [Google Scholar]
- Grais, E.M.; Sen, M.U.; Erdogan, H. Deep neural networks for single channel source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 4–9 May 2014; pp. 3734–3738. [Google Scholar]
- Xu, Y.; Du, J.; Dai, L.-R.; Lee, C.-H. An Experimental Study on Speech Enhancement Based on Deep Neural Networks. IEEE Signal Processing Lett. 2013, 21, 65–68. [Google Scholar] [CrossRef]
- Liu, Q.; Wang, W.; Jackson, P.B.; Tang, Y. A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions. In Proceedings of the 25th European Signal Processing Conference, Kos, Greece, 28 August–2 September 2017; pp. 1270–1274. [Google Scholar]
- Jarrett, D.P. Theory and Applications of Spherical Microphone Array Processing; Springer: New York, NY, USA, 2017. [Google Scholar]
- Benesty, J.; Chen, J.; Huang, Y. Microphone Array Signal Processing; Springer: New York, NY, USA, 2010. [Google Scholar]
- Zhao, Q.; Guo, F.; Zu, X.; Chang, Y.; Li, B.; Yuan, X. An Acoustic Signal Enhancement Method Based on Independent Vector Analysis for Moving Target Classification in the Wild. Sensors 2017, 17, 2224. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nordhausen, K.; Oja, H. Independent component analysis: A statistical perspective. Wires Comput. Stat. 2018, 10, e1440. [Google Scholar] [CrossRef]
- Addisson, S.; Luis, V. Independent Component Analysis (ICA): Algorithm, Applications and Ambiguities; Nova Science Publishers: Hauppauge, NY, USA, 2018. [Google Scholar]
- Makino, S.; Lee, T.W.; Sawada, H. (Eds.) Blind Speech Separation; Springer: New York, NY, USA, 2007. [Google Scholar]
- Taseska, M.; Habets, E.A.P. Blind Source Separation of Moving Sources Using Sparsity-Based Source Detection and Tracking. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 657–670. [Google Scholar] [CrossRef]
- Dekens, T.; Verhelst, W.; Capman, F.; Beaugendre, F. Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection. In Proceedings of the 18th European Signal Processing Conference, Aalborg, Denmark, 23–27 August 2010; pp. 1978–1982. [Google Scholar]
- Eisemann, E.; Durand, F. Flash photography enhancement via intrinsic relighting. ACM Trans. Graph. 2004, 23, 673–678. [Google Scholar] [CrossRef]
- Petschnigg, G.; Agrawala, M.; Hoppe, H.; Szeliski, R.; Cohen, M.; Toyama, K. Digital photography with flash and no-flash image pairs. ACM Trans. Graph. 2004, 23, 664–672. [Google Scholar] [CrossRef]
- Wang, S.; Liu, Z.; Lv, S.; Lv, Y.; Wu, G.; Peng, P.; Chen, F.; Wang, X. A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference. IEEE Trans. Multimed. 2010, 12, 682–691. [Google Scholar] [CrossRef]
- John, V.; Tsuchizawa, S.; Mita, S. Fusion of thermal and visible cameras for the application of pedestrian detection. Signal Image Video Process. 2017, 11, 517–524. [Google Scholar] [CrossRef]
- Fendri, E.; Boukhriss, R.R.; Hammami, M. Fusion of thermal infrared and visible spectra for robust moving object detection. Pattern Anal. Appl. 2017, 20, 907–926. [Google Scholar] [CrossRef]
- Rickard, S.; Yilmaz, O. On the approximate w-disjoint orthogonality of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orland, CA, USA, 13–17 May 2002; pp. 529–532. [Google Scholar]
- Ihara, T.; Handa, M.; Nagai, T.; Kurematsu, A. Multi-channel speech separation and localization by frequency assignment. IEICE Trans Fundam. 2003, J86-A, 998–1009. [Google Scholar]
- Aoki, M.; Yamaguchi, Y.; Furuya, K.; Kataoka, A. Modifying SAFIA: Separation of the target source close to the microphones and noise sources far from the microphones. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2005, J88-A, 468–479. [Google Scholar]
- Sound Effect Lab. Available online: https://soundeffect-lab.info/sound/environment/ (accessed on 23 December 2021).
- Hashimoto Tech. Available online: https://hashimoto-tech.jp/local/advan/signwav (accessed on 23 December 2021).
- Fukui, M.; Shimauchi, S.; Hioka, Y.; Nakagawa, A.; Haneda, Y.; Ohmuro, H.; Kataoka, A. Noise-power estimation based on ratio of stationary noise to input signal for noise reduction. J. Signal Processing 2014, 18, 17–28. [Google Scholar] [CrossRef] [Green Version]
Target Speech | Male Voice |
---|---|
Noise signal | Intersection noise, white noise (0, −15 [dB]) |
Noise reduction threshold | −90 to −30 [dB] (in 10 [dB] steps) |
Threshold (dB) | SDR | SIR |
---|---|---|
−90 | 0.704 | −3.347 |
−80 | 2.632 | 0.008068 |
−70 | 8.384 | 7.795 |
−60 | 9.593 | 12.97 |
−50 | 7.393 | 15.50 |
−40 | 4.351 | 17.10 |
−30 | 1.527 | 20.83 |
Threshold (dB) | SDR | SIR |
---|---|---|
−90 | 1.665 | −9.767 |
−80 | 3.199 | −7.361 |
−70 | 5.007 | −4.756 |
−60 | 6.838 | −1.369 |
−50 | 6.646 | 2.083 |
−40 | 4.236 | 5.033 |
−30 | 1.522 | 8.821 |
Threshold (dB) | SDR | SIR |
---|---|---|
−90 | 7.510 | 2.233 |
−80 | 9.579 | 4.639 |
−70 | 10.66 | 7.244 |
−60 | 9.764 | 10.63 |
−50 | 7.534 | 14.08 |
−40 | 4.415 | 17.03 |
−30 | 1.540 | 20.82 |
Noise Speech | SNR |
---|---|
Intersection noise | −4.158 |
White noise (0 dB) | −22.96 |
White noise (−15 dB) | −10.96 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kawaguchi, J.; Matsumoto, M. Noise Reduction Combining a General Microphone and a Throat Microphone. Sensors 2022, 22, 4473. https://doi.org/10.3390/s22124473
Kawaguchi J, Matsumoto M. Noise Reduction Combining a General Microphone and a Throat Microphone. Sensors. 2022; 22(12):4473. https://doi.org/10.3390/s22124473
Chicago/Turabian StyleKawaguchi, Junki, and Mitsuharu Matsumoto. 2022. "Noise Reduction Combining a General Microphone and a Throat Microphone" Sensors 22, no. 12: 4473. https://doi.org/10.3390/s22124473