Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin
Abstract
:1. Introduction
2. Methods
2.1. Apparatus
2.2. Stimuli
2.3. Method of Speech Recognition
2.4. Procedure
2.5. Data Analysis
3. Results
3.1. Recording Distance: 0.2 m
3.2. Recording Distance: 0.6 m
3.3. Summary
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Abbreviation | Full Name |
ASR | Automatic speech recognition |
pp | Percentage points |
SNR | Signal-to-noise ratio |
ASRD | Baidu Cloud automatic speech recognition |
ASRT | Tencent Cloud automatic speech recognition |
M0 | Without a mask |
M1 | Surgical mask |
M2 | Activated-carbon mask |
M3 | Hanging-ear medical protective mask |
M4 | Headwear medical protective mask |
M5 | Anti-particulate mask (with a breather valve) |
M6 | Cloth mask |
ACC | Word accuracy |
References
- Bai, J. Wear masks scientifically to protect public health. People’s Daily, 14 April 2023; p. 004. [Google Scholar]
- Korayem, M.H.; Azargoshasb, S.; Korayem, A.H.; Tabibian, S. Design and Implementation of the Voice Command Recognition and the Sound Source Localization System for Human–Robot Interaction. Robotica 2021, 39, 1779–1790. [Google Scholar] [CrossRef]
- Alonso-Martín, F.; Salichs, M.A. Integration of a voice recognition system in a social robot. Cybern. Syst. 2011, 42, 215–245. [Google Scholar] [CrossRef]
- Bingol, M.C.; Aydogmus, O. Performing predefined tasks using the human–robot interaction on speech recognition for an industrial robot. Eng. Appl. Artif. Intell. 2020, 95, 103903. [Google Scholar] [CrossRef]
- Irugalbandara, I.B.C.; Naseem, A.S.M.; Perera, M.S.H.; Logeeshan, V. HomeIO: Offline Smart Home Automation System with Automatic Speech Recognition and Household Power Usage Tracking. In Proceedings of the 2022 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 6–9 June 2022; pp. 571–577. [Google Scholar]
- Xu, H. Intelligent automobile auxiliary propagation system based on speech recognition and AI driven feature extraction techniques. Int. J. Speech Technol. 2022, 25, 893–905. [Google Scholar] [CrossRef]
- Hwang, E.J.; Ahn, B.K.; Lim, J.Y.; Macdonald, B.A.; Ahn, H.S. Robot Dialog System in the Context of Hospital Receptionist and its Demonstration. Int. J. Soc. Robot. 2023, 15, 679–687. [Google Scholar] [CrossRef]
- Pranto, S.I.; Nabid, R.A.; Samin, A.M.; Mohammed, N.; Sarker, F.; Huda, M.N.; Mamun, K.A. Human-Robot Interaction in Bengali language for Healthcare Automation integrated with Speaker Recognition and Artificial Conversational Entity. In Proceedings of the 2021 3rd International Conference on Electrical & Electronic Engineering (ICEEE), Rajshahi, Bangladesh, 22–24 December 2021; pp. 13–16. [Google Scholar]
- Bottalico, P.; Murgia, S.; Puglisi, G.E.; Astolfi, A.; Kirk, K.I. Effect of masks on speech intelligibility in auralized classrooms. J. Acoust. Soc. Am. 2020, 148, 2878–2884. [Google Scholar] [CrossRef]
- Corey, R.M.; Jones, U.; Singer, A.C. Acoustic effects of medical, cloth, and transparent face masks on speech signals. J. Acoust. Soc. Am. 2020, 148, 2371–2375. [Google Scholar] [CrossRef] [PubMed]
- Porschmann, C.; Lubeck, T.; Arend, J.M. Impact of face masks on voice radiation. J. Acoust. Soc. Am. 2020, 148, 3663–3670. [Google Scholar] [CrossRef]
- Sumby, W.H.; Pollack, I. Visual Contribution to Speech Intelligibility in Noise. J. Acoust. Soc. Am. 1954, 26, 212–215. [Google Scholar] [CrossRef]
- Wittum, K.J.; Feth, L.; Hoglund, E. The effects of surgical masks on speech perception in noise. Proc. Mtgs. Acoust. 2013, 19, 060125. [Google Scholar]
- Palmiero, A.J.; Symons, D.; Morgan, J.W., 3rd; Shaffer, R.E. Speech intelligibility assessment of protective facemasks and air-purifying respirators. J. Occup. Environ. Hyg. 2016, 13, 960–968. [Google Scholar] [CrossRef] [PubMed]
- Barrett, M.E.; Gordon-Salant, S.; Brungart, D.S. The cafeteria study: Effects of facial masks, hearing protection, and real-world noise on speech recognition. J. Acoust. Soc. Am. 2021, 150, 4244–4255. [Google Scholar] [CrossRef] [PubMed]
- Atcherson, S.R.; Mendel, L.L.; Baltimore, W.J.; Patro, C.; Lee, S.; Pousson, M.; Spann, M.J. The Effect of Conventional and Transparent Surgical Masks on Speech Understanding in Individuals with and without Hearing Loss. J. Am. Acad. Audiol. 2017, 28, 58–67. [Google Scholar] [CrossRef] [PubMed]
- Mendel, L.L.; Gardino, J.A.; Atcherson, S.R. Speech Understanding Using Surgical Masks: A Problem in Health Care? J. Am. Acad. Audiol. 2008, 19, 686–695. [Google Scholar] [CrossRef] [PubMed]
- Bandela, S.R.; Sadhu, S.S.; Rathore, V.S.; Jagini, S.K. Development of Noise Robust Automatic Speech Recognition System. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–6. [Google Scholar]
- Jaroslavceva, J.; Wake, N.; Sasabuchi, K.; Ikeuchi, K. Robot Ego-Noise Suppression with Labanotation-Template Subtraction. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 407–415. [Google Scholar] [CrossRef]
- Zhang, P.; Huang, Y.; Yang, C.; Jiang, W. Estimate the noise effect on automatic speech recognition accuracy for mandarin by an approach associating articulation index. Appl. Acoust. 2023, 203, 109217. [Google Scholar] [CrossRef]
- Feng, S.; Kudina, O.; Halpern, B.M.; Scharenborg, O. Quantifying Bias in Automatic Speech Recognition; Cornell University Library: Ithaca, NY, USA, 2021. [Google Scholar]
- Shao, Q.; Guo, P.; Yan, J.; Hu, P.; Xie, L. Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2024, 32, 459–470. [Google Scholar] [CrossRef]
- Barcovschi, A.; Jain, R.; Corcoran, P. A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition. In Proceedings of the 2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania, 25–27 October 2023; pp. 42–47. [Google Scholar]
- Jin, Z.; Geng, M.; Deng, J.; Wang, T.; Hu, S.; Li, G.; Liu, X. Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2024, 32, 413–429. [Google Scholar] [CrossRef]
- Shi, M.; Zhang, J.; Du, Z.; Yu, F.; Chen, Q.; Zhang, S.; Dai, L.R. A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings. In Proceedings of the 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 1943–1948. [Google Scholar]
- GB/T 41813.1-2022; Information Technology-Intelligent Speech Interaction Testing Method-Part 1: Speech Recognition. Standardization Administration of China: Beijing, China, 2022.
- YY 0469-2011; Surgical Mask. Standards Press of China: Beijing, China, 2011.
- Q/JY001-2021; Disposable Protective Face Mask (Non-Medical). Dongguan Junyi Labor Insurance Products Ltd.: Dongguan, China, 2021.
- GB 19083-2010; Technical Requirements for Protective Face Mask for Medical Use. Standardization Administration of China: Beijing, China, 2010.
- GB 2626-2019; Respiratory Protection—Non-Powered Air-Purifying Particle Respirator. Standardization Administration of China: Beijing, China, 2019.
- Bu, H.; Du, J.; Na, X.; Wu, B.; Zheng, H. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline. In Proceedings of the 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), Seoul, Republic of Korea, 1–3 November 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Yang, W.; Shen, Y.; Liu, Z. The average spectrum of Chinese speech. J. Nanjing Univ. (Nat. Sci.) 2012, 48, 1–7. [Google Scholar] [CrossRef]
- GB/T 7347-1987; The Standard Spectrum of Chinese Speech. Standardization Administration of China: Beijing, China, 1987.
- Chu, C. Mask, who works best? When should we wear it? How to maintain? CNKI, 8 December 2016; p. 016. [Google Scholar]
- Pan, J.; Harb, C.; Leng, W.; Marr, L.C. Inward and outward effectiveness of cloth masks, a surgical mask, and a face shield. Aerosol Sci. Technol. 2021, 55, 718–733. [Google Scholar] [CrossRef]
- Toscano, J.C.; Toscano, C.M. Effects of face masks on speech recognition in multi-talker babble noise. PLoS ONE 2021, 16, e0246842. [Google Scholar] [CrossRef] [PubMed]
- Moore, B. An Introduction to the Psychology of Hearing, 5th ed.; Academic Press: Cambridge, MA, USA, 2003. [Google Scholar]
- Cooke, M.; Garcia Lecumberri, M.L.; Barker, J. The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception. J. Acoust. Soc. Am. 2008, 123, 414–427. [Google Scholar] [CrossRef] [PubMed]
- Magee, M.; Lewis, C.; Noffs, G.; Reece, H.; Chan, J.C.S.; Zaga, C.J.; Paynter, C.; Birchall, O.; Rojas Azocar, S.; Ediriweera, A.; et al. Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols. J. Acoust. Soc. Am. 2020, 148, 3562–3568. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Arora, A.; Le, D.; Yeh, C.-F.; Fuegen, C.; Kalinli, O.; Seltzer, M.L. Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding; Cornell University Library: Ithaca, NY, USA, 2021. [Google Scholar]
- Rugayan, J.; Svendsen, T.; Salvi, G. Semantically Meaningful Metrics for Norwegian ASR Systems. Interspeech 2022, 2283–2287. [Google Scholar]
Mask Condition | ASRD | ASRT | ||
---|---|---|---|---|
Restaurant Noise | Speech-Shaped Noise | Restaurant Noise | Speech-Shaped Noise | |
M0 | 80.63 | 66.67 | 92.58 | 88.56 |
M1 | 76.92 | 60.00 | 93.33 | 81.80 |
M2 | 72.47 | 58.96 | 91.67 | 80.48 |
M3 | 86.19 | 59.55 | 93.54 | 81.82 |
M4 | 75.96 | 61.32 | 87.08 | 72.73 |
M5 | 77.35 | 66.67 | 96.88 | 80.63 |
M6 | 84.41 | 70.59 | 88.89 | 84.62 |
Mask Condition | ASRD | ASRT | ||
---|---|---|---|---|
Restaurant Noise | Speech-Shaped Noise | Restaurant Noise | Speech-Shaped Noise | |
M0 | 74.34 | 42.86 | 92.58 | 80.91 |
M1 | 55.05 | 34.17 | 90.45 | 74.46 |
M2 | 52.94 | 30.38 | 83.33 | 50.00 |
M3 | 60.00 | 38.46 | 90.00 | 66.67 |
M4 | 59.55 | 29.41 | 84.62 | 63.64 |
M5 | 63.07 | 39.23 | 84.62 | 61.32 |
M6 | 72.08 | 43.93 | 90.91 | 80.00 |
Mask Condition | ASRD | ASRT | ||
---|---|---|---|---|
Restaurant Noise | Speech-Shaped Noise | Restaurant Noise | Speech-Shaped Noise | |
M0 | 58.52 | 38.18 | 89.74 | 65.15 |
M1 | 44.16 | 25.24 | 84.41 | 60.15 |
M2 | 38.19 | 30.77 | 75.00 | 42.26 |
M3 | 53.82 | 37.09 | 85.16 | 48.53 |
M4 | 44.95 | 20.71 | 75.96 | 55.28 |
M5 | 62.50 | 33.33 | 80.00 | 44.57 |
M6 | 69.38 | 42.86 | 92.33 | 76.33 |
Mask Condition | ASRD | ASRT | ||
---|---|---|---|---|
Restaurant Noise | Speech-Shaped Noise | Restaurant Noise | Speech-Shaped Noise | |
M0 | 78.95 | 50.00 | 100.00 | 90.00 |
M1 | 66.67 | 42.50 | 91.67 | 80.65 |
M2 | 66.67 | 30.00 | 92.12 | 74.47 |
M3 | 67.54 | 40.06 | 97.50 | 74.34 |
M4 | 66.67 | 41.43 | 89.18 | 75.00 |
M5 | 63.96 | 45.56 | 100.00 | 78.89 |
M6 | 82.35 | 46.06 | 90.45 | 83.61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Ni, K.; Huang, Y. Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin. Appl. Sci. 2024, 14, 3273. https://doi.org/10.3390/app14083273
Li X, Ni K, Huang Y. Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin. Applied Sciences. 2024; 14(8):3273. https://doi.org/10.3390/app14083273
Chicago/Turabian StyleLi, Xiaoya, Ke Ni, and Yu Huang. 2024. "Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin" Applied Sciences 14, no. 8: 3273. https://doi.org/10.3390/app14083273
APA StyleLi, X., Ni, K., & Huang, Y. (2024). Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin. Applied Sciences, 14(8), 3273. https://doi.org/10.3390/app14083273