Skip to Content
ElectronicsElectronics
  • Review
  • Open Access

16 October 2023

A Survey of Automatic Speech Recognition for Dysarthric Speech

and
1
School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China
2
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
*
Author to whom correspondence should be addressed.

Abstract

Dysarthric speech has several pathological characteristics, such as discontinuous pronunciation, uncontrolled volume, slow speech, explosive pronunciation, improper pauses, excessive nasal sounds, and air-flow noise during pronunciation, which differ from healthy speech. Automatic speech recognition (ASR) can be very helpful for speakers with dysarthria. Our research aims to provide a scoping review of ASR for dysarthric speech, covering papers in this field from 1990 to 2022. Our survey found that the development of research studies about the acoustic features and acoustic models of dysarthric speech is nearly synchronous. During the 2010s, deep learning technologies were widely applied to improve the performance of ASR systems. In the era of deep learning, many advanced methods (such as convolutional neural networks, deep neural networks, and recurrent neural networks) are being applied to design acoustic models and lexical and language models for dysarthric-speech-recognition tasks. Deep learning methods are also used to extract acoustic features from dysarthric speech. Additionally, this scoping review found that speaker-dependent problems seriously limit the generalization applicability of the acoustic model. The scarce available speech data cannot satisfy the amount required to train models using big data.

1. Introduction

Speech is generated via the coordinated movements of articulators, which are regulated via neural activities in the speech-related functional areas of the brain [1]. Speech plays an important role in people’s daily communication and is a crucial medium for social interaction [2,3]. Dysarthria, a speech disorder, refers to neuromuscular impairments affecting the strength, speed, tone, steadiness, or accuracy of the speech-production muscles. Cortical lesions can cause a series of neuropathological characteristics in dysarthric speakers, and the severity of dysarthria varies depending on the location and severity of the neuropathies [1]. Unfortunately, dysarthric speakers often have difficulty pronouncing words correctly, making communication with others extremely challenging. This can not only cause significant physical and psychological distress to patients but may also impede their ability to participate fully in society [4]. Therefore, it is essential to investigate effective rehabilitation strategies for dysarthric speakers to improve their communication skills and help them resume productively participating in society.
Dysarthria, also known as motor dysarthria, is a speech disorder caused by neuropathy, muscle paralysis, decreased contractility, or uncoordinated movement related to speech production [5]. Generally, dysarthria is categorized as “mild, moderate and severe (or additionally, extremely severe)”. Speakers with severe dysarthria suffer a serious impairment of speech function and have difficulty recovering communication even with speech rehabilitation [6]. Dysarthria can also be categorized as “spastic, atonic, ataxic, hyperkinetic, hypokinetic, and mixed” [1]. Spastic dysarthria is the form most commonly seen in clinical practice. The speech pronounced by speakers with spastic dysarthria is discontinuous, and the volume of the speech is uncontrolled. The speakers’ bilateral superior motor neurons suffer damage. As the tension of the articulation muscles increases and the muscle strength decreases, the elevation of the soft palate decreases, and the lips and tongue show poor movement. Atonic dysarthria and ataxic dysarthria are less common. The speech of speakers with atonic dysarthria has some defects, such as slow speed, abnormal volume, explosive pronunciation, improper pauses and excessive nasal phonemes. The speaker’s cerebellum suffers damages. The speaker’s articulatory muscles lack control over the direction and scale of movements. Therefore, the speaker’s tongue is easily raised, and the direction of alternating movement is poorly regulated. Speakers with ataxic dysarthria suffer damage in their lower motor neurons, which leads to the paralysis of the pharyngeal muscles and soft palate. Due to impaired lip muscles, the speech of speakers with ataxic dysarthria has some defects, such as poor pronunciation continuity, air-flow noise, and low volume [1].
Intensive therapy is an effective means of speech rehabilitation for dysarthric speakers [7,8,9,10]. Speech rehabilitation can be easily realized with the help of a computer-aided training system [11]. Moreover, the direct application of speech signal processing to improve the intelligibility of dysarthric speech is also an effective and powerful research route. In particular, automatic speech recognition (ASR) is now one of the most popular tools [12,13,14,15,16,17,18,19,20,21,22]. Dysarthric speakers are easily exhausted, are less able to express emotions, and are prone to drooling and dysphagia. As a result, collecting dysarthric speech is extremely difficult [23]. This difficulty leads to the scarcity of dysarthric speech data, which adds to the difficulty of implementing ASR for dysarthric speech. In addition, as the pathogeneses of their dysarthria differ, dysarthric speakers vary significantly in their pronunciation, which also results in a larger and more complex variations in the acoustic space of dysarthric speech compared with normal speech [24,25].
Many researchers have made efforts to improve the performance of ASR for dysarthric speech. The trend in the development of ASR for dysarthric speech can be found in Figure 1. In Figure 1, the top part of the dashed line shows the research studies conducted before deep learning, and the bottom part of the dashed line shows the research studies conducted in the era of deep learning.
Figure 1. The trend of ASR for dysarthric speech.

1.1. ASR Technologies for Dysarthric Speech before Deep Learning

Before deep learning, technologies relating to ASR for dysarthric speech were limited by the computational abilities of devices. Machine learning methods have been widely used to design acoustic models for ASR for dysarthric speech. For example, Jayaram and Abdelhamied [26] used an artificial neural network (ANN) and analysed the experimental results of ASR for dysarthric speech. Polur and Miller [27] used the Hidden Markov Model (HMM) to design ASR for dysarthric speech and compared the results of different acoustic features such as fast Fourier transform, linear predictive and cepstral coefficients. The most remarkable characteristic before deep learning is that the research developed slowly.

1.2. ASR Technologies for Dysarthric Speech during Deep Learning

With the development of technologies for ASR using deep learning methods and the great improvement of computational abilities, a large amount of research has been carried out to improve the performance of ASR for dysarthric speech. In this era, representative research has come to the fore of this field. For example, Takashima et al. [20] used the convolutional Restricted Boltzmann Machine (CRBM) to address the local overfitting problem when a pre-trained convolutional bottleneck neural network was applied to obtain acoustic features. Yilmaz et al. [28] proposed the use of “bottleneck features and pronunciation features” to reduce the acoustic space variation caused by the poor pronunciation ability of dysarthric speakers, aiming to improve the accuracy of ASR for dysarthric speech. Second, researchers have improved the extraction of acoustic features to improve acoustic expression, using speaker-adaptive models to reduce the differences among acoustic spaces of dysarthric speech and improving the mapping of acoustic features to the phonemes. Kim et al. used Kullback–Leibler divergence HMM (KL-HMM) [29] and convolutional long short-term memory recurrent neural network (CLSTM-RNN) [30] to automatically recognize dysarthric speech. Overall, these advancements in ASR technology have significant potential to improve communication outcomes for dysarthric individuals, helping them to express themselves more effectively and enhance their quality of life.
Presently, the previous reviews of ASR for dysarthric speech mainly discussed the difficulties of ASR applications for elderly people with dysarthria [31] and explored what general and specific factors influence the accuracy of ASR for dysarthric speech [32]. Moreover, ASR technologies for dysarthric speech have greatly developed, especially in the era of deep learning. The main objective of our survey is to discuss the trends of ASR technologies for dysarthric speech, including research about dysarthric speech databases, acoustic features, acoustic models, language–lexical models and end-to-end ASR models. Our survey provides a more comprehensive and systematic review of the development of ASR for dysarthric speech, highlighting the latest advancements and future directions in this field.
Regarding the main objective, this review follows, where possible, well-established practices for conducting and reporting scoping reviews, as suggested by the PRISMA statement [33]. Of the 27 items on the PRISMA checklist, we were able to follow 13 items in the title, introduction, methods, results, discussion and funding sections.

1.3. Retrieval Strategy of Papers

In the course of the scoping paper retrieval, the following databases were searched: “Web of Science”, “Engineering Village” and “IEEE Xplore”. The time limitation of the above databases was set from 1900 to 2022. Keywords, including “Dysarthric Speech Recognition”, “Dysarthria Speech Recognition”, “Automatic Speech Recognition of Dysarthric Speech” and “Automatic Speech Recognition of Dysarthria Speech”, were used as the retrieval conditions.

1.4. Selection Strategy of Papers

All authors jointly decided on the following selection criteria to reduce possible deviations during selection. Firstly, we excluded the papers that are not cited by other researchers, as their contribution may be insufficient. Secondly, we excluded less related papers. At the screening stage, all authors jointly decided whether one paper is relevant to our research. We also excluded duplicated papers due to eligibility concerns. Finally, we selected 63 representative papers fitting the theme of our research. The papers fit in five categories of “dysarthric speech databases”, “acoustic feature extraction”, “acoustic model”, “language-lexical model” and “End-to-End ASR”. The whole selection process can be found in Figure 2.
Figure 2. PRISMA flow diagram of search methods.
In this paper, we summarize the development of ASR for dysarthric speech and compare traditional approaches based on acoustic feature extraction, acoustic models and language–lexical models, aiming to provide a reference for future research. The main contributions of this paper include the following four aspects:
(1)
the influence of different acoustic feature parameters on the performance of ASR for dysarthric speech was analysed;
(2)
the construction and improvement of acoustic models for dysarthric speech recognition based on different machine learning (or deep learning) methods were introduced;
(3)
the effects of different approaches on improving the performance of different language–lexical models for dysarthric speech recognition were introduced;
(4)
several advanced approaches of end-to-end ASR were introduced in the field of dysarthric speech recognition.
The rest of this paper is as follows: Section 2 compares and analyses the effects of different “approaches of acoustic feature extraction”, “acoustic models”, “language-lexical models” and “End-to-End ASR”, respectively. Section 3 discusses the challenges and the future prospects dysarthric speech recognition. Section 4 gives a conclusion.

3. Discussion

This scoping review aimed to summarize the development and trends in ASR technologies for dysarthric speech. Different from the previous surveys [31,32], our survey systematically reviewed ASR technologies for dysarthric speech, including “dysarthric speech databases”, “acoustic features”, “acoustic models”, “language-lexical models” and “end-to-end models”. Our survey discussed the trend of ASR technologies for dysarthric speech along with the development of machine learning and deep learning methods.
Before deep learning, ASR technologies for dysarthric speech developed slowly. In addition, the commercial application of ASR for speakers with dysarthria is not yet mature. Several challenges need to be addressed. For example, the accuracy is not high enough in practical applications. Furthermore, the performance of trained ASR models is not stable enough. In the era of deep learning, high-performance computation and powerful deep learning methods can effectively improve the accuracy of ASR for dysarthric speech. Nevertheless, the scarcity and dearth of data pose significant limitations to further improving ASR for dysarthric speech. A problem we cannot avoid is that data collection from speakers with dysarthria is too difficult. In the future, considering cost and limited resources, researchers could make efforts in two directions: “how to make more data corpus of dysarthric speech” and “how to further improve the performance of ASR trained by few or zero resource data”. Especially, we can consider fusing more modality signals to solve the problem of too little resource data.

4. Conclusions

This scoping survey analysed 63 papers selected from 139 papers in the field of ASR for dysarthric speech. From four aspects of “acoustic feature”, “acoustic model”, “language-lexical model” and “End-to-End ASR for dysarthric speech”, we summarized the development of ASR for dysarthric speech. The poor generalization applicability of the acoustic model caused by the large variations among dysarthric speakers is a major challenge in terms of ASR to reduce speaker dependence. In addition, the limited availability of speech data makes it difficult to train ASR models with sufficient data to achieve high accuracy. These challenges pose significant obstacles to the commercialization and widespread adoption of ASR systems for dysarthric speech. This scoping survey provides technical references for researchers in the field of ASR for dysarthric speech, highlighting the need for continued research in this area to address these challenges. Future research should focus on developing more robust and adaptive acoustic models that can account for the diverse vocal characteristics of dysarthric speakers, as well as exploring alternative approaches to data acquisition and representation that can overcome data scarcity issues. By addressing these challenges, ASR systems for dysarthric speech can become more accurate, reliable, and widely accessible.

Author Contributions

Conceptualization, Z.Q.; methodology, Z.Q. and K.X.; writing, original draft preparation, Z.Q.; writing, review and editing, Z.Q. and K.X.; supervision, Z.Q.; project administration, Z.Q.; funding acquisition, Z.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Humanity and Social Science Youth Foundation of the Ministry of Education of China, grant number 21YJCZH117.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rampello, L.; Rampello, L.; Patti, F.; Zappia, M. When the word doesn’t come out: A synthetic overview of dysarthria. J. Neurol. Sci. 2016, 369, 354–360. [Google Scholar] [CrossRef]
  2. Rauschecker, J.P.; Scott, S.K. Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nat. Neurosci. 2009, 12, 718–724. [Google Scholar] [CrossRef] [PubMed]
  3. Hauser, M.D.; Chomsky, N.; Fitch, W.T. The faculty of language: What is it, who has it, and how did it evolve? Science 2002, 298, 1569–1579. [Google Scholar] [CrossRef] [PubMed]
  4. Sapir, S.; Aronson, A.E. The relationship between psychopathology and speech and language disorders in neurologic patients. J. Speech Hear. Disord. 1990, 55, 503–509. [Google Scholar] [CrossRef] [PubMed]
  5. Kent, R.D. Research on speech motor control and its disorders: A review and prospective. J. Commun. Disord. 2000, 33, 391–428. [Google Scholar] [CrossRef] [PubMed]
  6. Li, M.; Lyden, P.; Brady, M. Aphasia and dysarthria in acute stroke: Recovery and functional outcome. Int. J. Stroke Off. J. Int. Stroke Soc. 2005, 10, 400–406. [Google Scholar] [CrossRef]
  7. Ramig, L.O.; Sapir, S.; Fox, C.; Countryman, S. Changes in vocal loudness following intensive voice treatment (LSVT®) in individuals with Parkinson’s disease: A comparison with untreated patients and normal age-matched controls. Mov. Disord. 2001, 16, 79–83. [Google Scholar] [CrossRef] [PubMed]
  8. Bhogal, S.K.; Teasell, R.; Speechley, M. Intensity of Aphasia Therapy, Impact on Recovery. Stroke 2003, 34, 987–993. [Google Scholar] [CrossRef]
  9. Kwakkel, G. Impact of intensity of practice after stroke: Issues for consideration. Disabil. Rehabil. 2006, 28, 823–830. [Google Scholar] [CrossRef] [PubMed]
  10. Rijntjes, M.; Haevernick, K.; Barzel, A.; Van Den Bussche, H.; Ketels, G.; Weiller, C. Repeat therapy for chronic motor stroke: A pilot study for feasibility and efficacy. Neuro Rehabil. Neural Repair 2009, 23, 275–280. [Google Scholar] [CrossRef]
  11. Beijer, L.J.; Rietveld, T. Potentials of Telehealth Devices for Speech Therapy in Parkinson’s Disease, Diagnostics and Rehabilitation of Parkinson’s Disease; pp. 379–402. 2011. Available online: https://api.semanticscholar.org/CorpusID:220770421 (accessed on 9 September 2023).
  12. Sanders, E.; Ruiter, M.B.; Beijer, L.; Strik, H. Automatic Recognition of Dutch Dysarthric Speech: A Pilot Study. In Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002–INTERSPEECH, Denver, CO, USA, 16–20 September 2002; pp. 661–664. [Google Scholar] [CrossRef]
  13. Hasegawa-Johnson, M.; Gunderson, J.; Penman, A.; Huang, T. Hmm-Based and Svm-Based Recognition of the Speech of Talkers with Spastic Dysarthria. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France, 14–19 May 2006; pp. III.1060–III.1063. [Google Scholar] [CrossRef]
  14. Rudzicz, F. Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. In Proceedings of the Assets 07: 9th International ACM SIGACCESS Conference on Computers and Accessibility, New York, NY, USA, 15–17 October 2007; pp. 255–256. [Google Scholar] [CrossRef]
  15. Morales, S.O.C.; Cox, S.J. Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers. EURASIP J. Adv. Signal Process. 2009, 2009, 308340. [Google Scholar] [CrossRef]
  16. Mengistu, K.; Rudzicz, F. Adapting acoustic and lexical models to dysarthric speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4924–4927. [Google Scholar] [CrossRef]
  17. Seong, W.K.; Park, J.H.; Kim, H.K. Multiple pronunciation lexical modeling based on phoneme confusion matrix for dysarthric speech recognition. Adv. Sci. Technol. Lett. 2012, 14, 57–60. [Google Scholar]
  18. Christensen, H.; Cunningham, S.; Fox, C.; Green, P.; Hain, T. A comparative study of adaptive, automatic recognition of disordered speech. In Proceedings of the Interspeech’12: 13th Annual Conference of the International Speech Communication Association, Portland, OR, USA, 9–13 September 2012; pp. 1776–1779. [Google Scholar]
  19. Shahamiri, S.R.; Salim, S.S.B. Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv. Eng. Inform. 2014, 28, 102–110. [Google Scholar] [CrossRef]
  20. Takashima, Y.; Nakashika, T.; Takiguchi, T.; Arikii, Y. Feature extraction using pre-trained convolutive bottleneck nets for dysarthric speech recognition. In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 1411–1415. [Google Scholar] [CrossRef]
  21. Lee, T.; Liu, Y.Y.; Huang, P.W.; Chien, J.T.; Lam, W.K.; Yeung, Y.T.; Law, T.K.T.; Lee, K.Y.S.; Kong, A.P.H.; Law, S.P. Automatic speech recognition for acoustical analysis and assessment of Cantonese pathological voice and speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 6475–6479. [Google Scholar] [CrossRef]
  22. Joy, N.M.; Umesh, S.; Abraham, B. On improving acoustic models for Torgo dysarthric speech database. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), Stockholm, Sweden, 20–24 August 2017; pp. 2695–2699. [Google Scholar] [CrossRef]
  23. Joy, N.M.; Umesh, S. Improving Acoustic Models in TORGO Dysarthric Speech Database. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 637–645. [Google Scholar] [CrossRef] [PubMed]
  24. Sharma, H.V.; Hasegawa-Johnson, M. Acoustic model adaptation using in-domain background models for dysarthric speech recognition. Comput. Speech Lang. 2013, 27, 1147–1162. [Google Scholar] [CrossRef]
  25. Tu, M.; Wisler, A.; Berisha, V.; Liss, J.M. The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance. J. Acoust. Soc. Am. 2016, 140, EL416–EL422. [Google Scholar] [CrossRef]
  26. Jayaram, G.; Abdelhamied, K. Experiments in dysarthric speech recognition using artificial neural networks. J. Rehabil. Res. Dev. 1995, 32, 162. [Google Scholar]
  27. Polur, P.D.; Miller, G.E. Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model. IEEE Trans. Neural Syst. Rehabil. Eng. 2005, 13, 558–561. [Google Scholar] [CrossRef]
  28. Yilmaz, E.; Mitra, V.; Sivaraman, G.; Franco, H. Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech. Comput. Speech Lang. 2019, 58, 319–334. [Google Scholar] [CrossRef]
  29. Kim, M.; Wang, J.; Kim, H. Dysarthric Speech Recognition Using Kullback-Leibler Divergence-based Hidden Markov Model. In Proceedings of the 17th Annual rence of the International Speech Communication Association (INTERSPEECH 2016), San Francisco, CA, USA, 8–12 September 2016; pp. 2671–2675. [Google Scholar] [CrossRef]
  30. Kim, M.; Cao, B.; An, K.; Wang, J. Dysarthric Speech Recognition Using Convolutional LSTM Neural Network. In Proceedings of the 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018), Hyderabad, India, 2–6 September 2018. [Google Scholar] [CrossRef]
  31. Young, V.; Mihailidis, A. Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assist. Technol. 2010, 22, 99–112. [Google Scholar] [CrossRef]
  32. Mustafa, M.B.; Rosdi, F.; Salim, S.S.; Mughal, M.U. Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker. Expert Syst. Appl. 2015, 42, 3924–3932. [Google Scholar] [CrossRef]
  33. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; The PRISMA Group. Preferred reporting items for systematic reviews and analyses: The PRISMA statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef] [PubMed]
  34. Deller, J.R., Jr.; Liu, M.S.; Ferrier, L.J.; Robichaud, P. The Whitaker database of dysarthric (cerebral palsy) speech. J. Acoust. Soc. Rica 1993, 93, 3516–3518. [Google Scholar] [CrossRef] [PubMed]
  35. Dodding, G.R.; Schalk, T.B. Speech Recognition: Turning Theory to Practice. IEEE Spectr. 1981, 18, 26–32. [Google Scholar] [CrossRef]
  36. Johnson, W.; Darley, F.; Spriestersbach, D. Diagnostic Methods in Speech Pathology; Harper & Row: New York, NY, USA, 1963. [Google Scholar]
  37. Kim, H.; Hasegawa-Johnson, M.; Perlman, A.; Gunderson, J.; Huang, T.; Watkin, K.; Frame, S. Dysarthric speech database for universal access research. In Proceedings of the Ninth Annual Conference of the International Speech Communication Association (Interspeech 2008), Brisbane, Australia, 22–26 September 2008; pp. 741–1744. [Google Scholar]
  38. Chongchong, Y.; Xiaosu, S.; Zhaopeng, Q. Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition with Pre-Trained Models. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 1912–1921. [Google Scholar] [CrossRef]
  39. Rudzicz, F.; Namasivayam, A.K.; Wolff, T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 2012, 46, 523–541. [Google Scholar] [CrossRef]
  40. Enderby, P. Frenchay dysarthria assessment. Br. J. Disord. Commun. 1980, 15, 165–173. [Google Scholar] [CrossRef]
  41. Yorkston, K.M.; Beukelman, D.R.; Traynor, C. Assessment of Intelligibility of Dysarthric Speech; Pro-ed.: Austin, TX, USA, 1984. [Google Scholar]
  42. Clear, J.H. The British National Corpus. In The Digital Word: Text-Based Computing in the Humanities; MIT: Cambridge, MA, USA, 1993; pp. 163–187. [Google Scholar]
  43. Menendez-Pidal, X.; Polikoff, J.B.; Peters, S.M.; Leonzio, J.E.; Bunnell, H.T. The Nemours database of dysarthric speech. In Proceeding of the Fourth International Conference on Spoken Language Processing, ICSLP’96, Philadelphia, PA, USA, 3–6 October 1996; pp. 1962–1965. [Google Scholar] [CrossRef]
  44. Wrench, A. The MOCHA-TIMIT Articulatory Database. 1999. Available online: https://data.cstr.ed.ac.uk/mocha/ (accessed on 9 September 2023).
  45. Zue, V.; Seneff, S.; Glass, J. Speech database development at MIT: TIMIT and beyond. Speech Commun. 1990, 9, 351–356. [Google Scholar] [CrossRef]
  46. Bennett, J.W.; van Lieshout, P.H.H.M.; Steele, C.M. Tongue control for speech and swallowing in healthy younger and older subjects. Int. J. Facial Myol. Off. Publ. Int. Assoc. Orofac. Myol. 2007, 33, 5–18. [Google Scholar] [CrossRef]
  47. Patel, R. Prosodic Control in Severe Dysarthria: Preserved Ability to Mark the Question-Statement Contrast. J. Speech Lang. Hear. Res. 2002, 45, 858–870. [Google Scholar] [CrossRef] [PubMed]
  48. Roy, N.; Leeper, H.A.; Blomgren, M.; Cameron, R.M. A Description of Phonetic, Acoustic, and Physiological Changes Associated with Improved intelligibility in a speaker With Spastic Dysarthria. Am. J. Speech-Lang. Pathol. 2001, 10, 274–290. [Google Scholar] [CrossRef]
  49. Webber, S.G. Webber Photo Cards: Story Starters. 2005. Available online: https://www.superduperinc.com/webber-photo-cards-story-starters.html (accessed on 9 September 2023).
  50. Rudzicz, F. Applying discretized articulatory knowledge to dysarthric speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, China, 19–24 April 2009; pp. 4501–4504. [Google Scholar] [CrossRef]
  51. Blaney, B.; Wilson, J. Acoustic variability in dysarthria and computer speech recognition. Clin. Linguist. Phon. 2000, 14, 27. [Google Scholar] [CrossRef]
  52. Fager, S.K. Duration and Variability in Dysarthric Speakers with Traumatic Brain Injury; The University of Nebraska-Lincol: Lincoln, NE, USA, 2008. [Google Scholar]
  53. Rudzicz, F. Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans. Audio Speech Lang. Process. 2010, 19, 947–960. [Google Scholar] [CrossRef]
  54. Christensen, H.; Aniol, M.B.; Bell, P.; Green, P.; Hain, T.; King, S.; Swietojanski, P. Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In Proceedings of the 14th Annual Conference of International Speech Communication Association (INTERSPEECH 2013), Lyon, France, 25–29 August 2013; pp. 3642–3645. [Google Scholar]
  55. Walter, O.; Despotovic, V.; Haeb-Umbach, R.; Gemnzeke, J.F.; Ons, B.; Van Hamme, H. An evaluation of unsupervised acoustic model training dysarthric speech interface. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014), Singapore, 14–18 September 2014; pp. 3–1017. [Google Scholar]
  56. Hahm, S.; Heitzman, D.; Wang, J. Recognizing dysarthric speech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization. In Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, Dresden, Germany, 11 September 2015; pp. 47–54. [Google Scholar]
  57. Bhat, C.; Vachhani, B.; Kopparapu, S. Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper ral Estimation. In Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTER-SPEECH 2016), San Francisco, CA, USA, 8–12 September 2016; pp. 228–232. [Google Scholar] [CrossRef]
  58. Vachhani, B.; Bhat, C.; Das, B.; Kopparapu, S.K. Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition. In Proceedings of the 18th Annual Conference of International-Speech-Communication-Association (INTERSPEECH 2017), Stockholm, Sweden, 20–24 August 2017; pp. 1854–1858. [Google Scholar] [CrossRef]
  59. Xiong, F.; Barker, J.; Christensen, H. Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. speech Communication. In Proceedings of the 13th ITG-Symposium, 2018, VDE, Oldenburg, Germany, 16 December 2018; pp. 1–5. [Google Scholar]
  60. Zaidi, B.F.; Selouani, S.A.; Boudraa, M.; Yakoub, M.S. Deep neural network architectures for dysarthric speech analysis and recognition. Neural Comput. Appl. 2021, 33, 9089–9108. [Google Scholar] [CrossRef]
  61. Revathi, A.; Nagakrishnan, R.; Sasikaladevi, N. Comparative analysis of Dysarthric speech recognition: Multiple features and robust templates. Multimed. Tools Appl. 2022, 81, 31245–31259. [Google Scholar] [CrossRef]
  62. Rajeswari, R.; Devi, T.; Shalini, S. Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks. Wirel. Pers. Commun. 2022, 122, 293–307. [Google Scholar] [CrossRef]
  63. Greenm, P.D.; Carmichael, J.; Hatzis, A.; Enderby, P.; Hawley, M.S.; Parker, M. Automatic speech recognition with sparse training data for dysarthric speakers. In Proceedings of the European Conference on Speech Communication and Technology, (EUROSPEECH 2003–INTERSPEECH 2003), ISCA, Geneva, Switzerland, 1–4 September 2003. [Google Scholar]
  64. Hain, T. Implicit modelling of pronunciation variation in automatic speech recognition. Speech Commun. 2005, 46, 171–188. [Google Scholar] [CrossRef]
  65. Hawley, M.S.; Enderby, P.; Green, P.; Cunningham, S.; Brownsell, S.; Carmichael, J.; Parker, M.; Hatzis, A.; Peter, O.; Palmer, R. A speech-controlled environmental control system for people with severe dysarthria. Med. Eng. Phys. 2007, 29, 586–593. [Google Scholar] [CrossRef]
  66. Morales, S.O.C.; Cox, S.J. Application of weighted finite-state transducers to improve recognition accuracy for dysarthric speech. In Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, Australia, 22–26 September 2008. [Google Scholar]
  67. Selouani, S.A.; Yakoub, M.S.; O’Shaughnessy, D. Alternative speech communication system for persons with severe speech disorders. EURASIP J. Adv. Signal Process. 2009, 2009, 540409. [Google Scholar] [CrossRef]
  68. Sharma, H.V.; Hasegawa-Johnson, M. State-transition interpolation and MAP adaptation for HMM-based dysarthric speech recognition. In Proceedings of the NAACL HLT 2010 Workshop Speech and Language Processing for Assistive Technologies, Los Angeles, CA, USA, 5 June 2010; pp. 72–79. [Google Scholar]
  69. Seong, W.K.; Park, J.H.; Kim, H.K. Dysarthric speech recognition error correction using weighted finite state transducers based on context-dependent pronunciation variation. In Proceedings of the ICCHP’12: 13th International Conference Computers Helping People with Special Needs, Linz, Austria, 11–13 July 2012; Part II. pp. 475–482. [Google Scholar] [CrossRef]
  70. Shahamiri, S.R.; Salim, S.S.B. A Multi-Views Multi-Learners Approach Towards Dysarthric Speech Recognition Using Multi-Nets Artificial Neural Networks. IEEE Trans. Neural Syst. Rehabil. Eng. 2014, 22, 1053–1063. [Google Scholar] [CrossRef]
  71. Caballero-Morales, S.O.; Trujillo-Romero, F. Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition. Expert Syst. Appl. 2014, 41, 841–852. [Google Scholar] [CrossRef]
  72. Mustafa, M.B.; Salim, S.S.; Mohamed, N.; Al-Qatab, B.; Siong, C.E. Severity-based adaptation with limited data for ASR to aid dysarthric speakers. PLoS ONE 2014, 9, e86285. [Google Scholar] [CrossRef] [PubMed]
  73. Sehgal, S.; Cunningham, S. Model adaptation and adaptive training for the recognition of dysarthric speech. In Proceedings of the 6th Workshop on Speech and Age Processing for Assistive Technologies (SLPAT 2015), Dresden, Germany, 11 September 2015; pp. 65–71. [Google Scholar] [CrossRef]
  74. Yilmaz, E.; Ganzeboom, M.S.; Cucchiarini, C.; Strik, H. Combining non-pathological data of different language varieties to improve DNN-HMM performance on pathological speech. In Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016), San Francisco, CA, USA, 8–12 September 2016; pp. 218–222. [Google Scholar] [CrossRef]
  75. Yilmaz, E.; Ganzeboom, M.S.; Cucchiarini, C.; Strik, H. Multi-stage DNN training for Automatic Recognition of Dysarthric Speech. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), Stockholm, Sweden, 20–24 August 2017; pp. 2685–2689. [Google Scholar] [CrossRef]
  76. Kim, M.; Kim, Y.; Yoo, J.; Wang, J.; Kim, H. Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2017, 25, 1581–1591. [Google Scholar] [CrossRef]
  77. Yu, J.W.; Xie, X.R.; Liu, S.S.; Hu, S.K.; Lam, M.W.Y.; Wu, X.X.; Wong, K.H.; Liu, X.Y.; Meng, H. Development of the CUHK Dysarthric Speech Recognition System for the Speech Corpus. In Proceedings of the 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018), Hyderabad, India, 2–6 September 2018; pp. 2938–2942. [Google Scholar]
  78. Takashima, Y.; Takashima, R.; Takiguchi, T.; Ariki, Y. Knowledge transferability between the speech data of persons with dysarthria speaking different languages for dysarthric speech recognition. IEEE Access 2019, 7, 164320–164326. [Google Scholar] [CrossRef]
  79. Xiong, F.; Barker, J.; Christensen, H. Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric Speech recognition. In Proceedings of the 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 836–5840. [Google Scholar] [CrossRef]
  80. Hermann, E.; Doss, M.M. Dysarthric speech recognition with lattice-free MMI. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6109–6113. [Google Scholar] [CrossRef]
  81. Yakoub, M.S.; Selouani, S.A.; Zaidi, B.F.; Bouchair, A. Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural work. EURASIP J. Audio Speech Music. Process. 2020, 1, 1–7. [Google Scholar] [CrossRef]
  82. Xiong, F.F.; Barker, J.; Yue, Z.J.; Christensen, H. Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In Proceedings of the SSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 7424–7428. [Google Scholar] [CrossRef]
  83. Wu, L.D.; Zong, D.M.; Sun, S.L.; Zhao, J. A Sequential Contrastive Learning Framework for Robust Dysarthric Speech Recognition. In Proceedings of the ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2021, Toronto, ON, Canada, 6–11 June 2021; pp. 7303–7307. [Google Scholar] [CrossRef]
  84. Wang, D.; Yu, J.; Wu, X.; Sun, L.F.; Liu, X.Y.; Meng, H.E. Improved end-to-end dysarthric speech recognition via meta-learning based model re-initialization. In Proceedings of the 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong, China, 24–27 January 2021; pp. 1–5. [Google Scholar] [CrossRef]
  85. Shahamiri, S.R. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 852–861. [Google Scholar] [CrossRef]
  86. Hu, S.K.; Xie, X.R.; Cui, M.Y.; Deng, J.J.; Liu, S.S.; Yu, J.W.; Geng, M.Z.; Liu, X.Y.; Meng, H.E. Neural architecture search for LF-MMI trained delay neural networks. IEEE ACM Trans. Audio Speech Lang. Process. 2022, 30, 1093–1107. [Google Scholar] [CrossRef]
  87. Sriranjani, R.; Umesh, S.; Reddy, M.R. Pronunciation adaptation for disordered speech recognition using state-specific vectors of phone-cluster adaptive training. In Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2015), Dresden, Germany, 11 September 2015; pp. 72–78. [Google Scholar]
  88. Yue, Z.; Xiong, F.; Christensen, H.; Barker, J. Exploring appropriate acoustic and language modelling choices for continuous dysarthric speech recognition. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6094–6098. [Google Scholar] [CrossRef]
  89. Takashima, Y.; Takiguchi, T.; Ariki, Y. End-to-end dysarthric speech recognition using multiple databases. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6395–6399. [Google Scholar]
  90. Lin, Y.; Wang, L.; Dang, J.; Li, S.; Ding, C. End-to-End articulatory modeling for dysarthric articulatory attribute detection. In Proceedings of the ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; p. 7353. [Google Scholar]
  91. Lin, Y.; Wang, L.; Li, S.; Dang, J.; Ding, C. Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription. In Proceedings of the INTERSPEECH, Shanghai, China, 25–29 October 2020; pp. 4791–4795. [Google Scholar]
  92. Soleymanpour, M.; Johnson, M.T.; Berry, J. Dysarthric Speech Augmentation Using Prosodic Transformation and Masking for Subword End-to-end ASR. In Proceedings of the 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania, 13–15 October 2021; pp. 42–46. [Google Scholar]
  93. Almadhor, A.; Irfan, R.; Gao, J.; Salleem, N.; Rauf, H.T.; Kadry, S. E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition. Expert Syst. Appl. 2023, 222, 119797. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.