Deep Learning and Artificial Intelligence Applied to Model Speech and Language in Parkinson’s Disease
Abstract
:1. Introduction
1.1. Motivation
1.2. Literature Review in Speech Analysis for PD Evaluation
1.3. Literature Review in Language Analysis for PD Evaluation
1.4. Contribution of This Study
- (i)
- Speech recordings were modeled using different modern methods based on deep learning architectures, including representations extracted from the Wav2Vec 2.0 model, as well as 1-dimensional and 2-dimensional CNNs, which were all originally proposed in this paper.
- (ii)
- Transliterations of the recordings were modeled using different strategies to represent language patterns, including models such as W2V, BERT, and BETO. We introduced an original method based on CNNs adapted to NLP to consider different n-gram relationships/contexts among the words.
- (iii)
- The best representations from each modality were combined using three different fusion strategies; namely, these were early, joint, and late fusion.
2. Materials and Methods
2.1. Data
2.2. Methods
2.3. Speech
2.3.1. 1D-CNN
2.3.2. 2D-CNN
2.3.3. Wav2vec 2.0
2.4. Language
2.4.1. W2V
2.4.2. BERT
2.4.3. BETO
2.5. Fusion Strategies
2.5.1. Early Fusion
2.5.2. Joint Fusion
2.5.3. Late Fusion
3. Results
3.1. Speech
3.2. Language
3.3. Multi-Modal
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- De Rijk, M.D.; Launer, L.J.; Berger, K.; Breteler, M.M.; Dartigues, J.F.; Baldereschi, M.; Fratiglioni, L.; Lobo, A.; Martinez-Lage, J.; Trenkwalder, C.; et al. Prevalence of Parkinson’s disease in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in the Elderly Research Group. Neurology 2000, 54, S21-3. [Google Scholar] [PubMed]
- Marinus, J.; Zhu, K.; Marras, C.; Aarsland, D.; van Hilten, J.J. Risk factors for non-motor symptoms in Parkinson’s disease. Lancet Neurol. 2018, 17, 559–568. [Google Scholar] [CrossRef] [PubMed]
- Nissar, I.; Mir, W.A.; Shaikh, T.A. Machine Learning Approaches for Detection and Diagnosis of Parkinson’s Disease-A Review. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021; Volume 1, pp. 898–905. [Google Scholar]
- Lowit, A.; Marchetti, A.; Corson, S.; Kuschmann, A. Rhythmic performance in hypokinetic dysarthria: Relationship between reading, spontaneous speech and diadochokinetic tasks. J. Commun. Disord. 2018, 72, 26. [Google Scholar] [CrossRef] [PubMed]
- Orozco-Arroyave, J.; Vásquez-Correa, J.; Nöth, E. Current methods and new trends in signal processing and pattern recognition for the automatic assessment of motor impairments: The case of Parkinson’s disease. In Proceedings of the Neurological Disorders and Imaging Physics. Institute of Physics; IOP Publishing Ltd.: Bristol, UK, 2020; Volume 5, pp. 8-1–8-57. [Google Scholar]
- Logemann, J.A.; Fisher, H.B.; Boshes, B.; Blonsky, E.R. Frequency and cooccurrence of vocal tract dysfunctions in the speech of a large sample of Parkinson patients. J. Speech Hear. Disord. 1978, 43, 47–57. [Google Scholar] [CrossRef]
- Yunusova, Y.; Weismer, G.G.; Lindstrom, M.J. Classifications of vocalic segments from articulatory kinematics: Healthy controls and speakers with dysarthria. J. Speech Lang. Hear. Res. 2011, 54, 1302–1311. [Google Scholar] [CrossRef]
- Pérez-Toro, P.A.; Arias-Vergara, T.; Klumpp, P.; Vásquez-Correa, J.C.; Schuster, M.; Nöth, E.; Orozco-Arroyave, J.R. Depression Assessment in People with Parkinson’s Disease: The Combination of Acoustic Features and Natural Language Processing. Speech Commun. 2022, 145, 10–20. [Google Scholar] [CrossRef]
- Gomez-Gomez, L.F.; Morales, A.; Fierrez, J.; Orozco-Arroyave, J.R. Exploring facial expressions and affective domains for Parkinson detection. arXiv 2020, arXiv:2012.06563. [Google Scholar]
- García, A.M.; Arias-Vergara, T.; CVasquez-Correa, J.; Nöth, E.; Schuster, M.; Welch, A.E.; Orozco-Arroyave, J.R. Cognitive Determinants of Dysarthria in Parkinson’s Disease: An Automated Machine Learning Approach. Mov. Disord. 2021, 36, 2862–2873. [Google Scholar] [CrossRef]
- Birba, A.; García-Cordero, I.; Kozono, G.; Legaz, A.; Ibáñez, A.; Sedeño, L.; García, A.M. Losing ground: Frontostriatal atrophy disrupts language embodiment in Parkinson’s and Huntington’s disease. Neurosci. Biobehav. Rev. 2017, 80, 673–687. [Google Scholar] [CrossRef]
- Birba, A.; Fittipaldi, S.; Cediel Escobar, J.C.; Gonzalez Campo, C.; Legaz, A.; Galiani, A.; García, A.M. Multimodal neurocognitive markers of naturalistic discourse typify diverse neurodegenerative diseases. Cereb. Cortex 2022, 32, 3377–3391. [Google Scholar] [CrossRef]
- Grossman, M.; Carvell, S.; Stern, M.B.; Gollomp, S.; Hurtig, H.I. Sentence comprehension in Parkinson’s disease: The role of attention and memory. Brain Lang. 1992, 42, 347–384. [Google Scholar] [CrossRef]
- Obeso, I.; Casabona, E.; Bringas, M.L.; Álvarez, L.; Jahanshahi, M. Semantic and phonemic verbal fluency in Parkinson’s disease: Influence of clinical and demographic variables. Behav. Neurol. 2012, 25, 111–118. [Google Scholar] [CrossRef]
- García, A.M.; Carrillo, F.; Orozco-Arroyave, J.R.; Trujillo, N.; Bonilla, J.F.V.; Fittipaldi, S.; Cecchi, G.A. How language flows when movements don’t: An automated analysis of spontaneous discourse in Parkinson’s disease. Brain Lang. 2016, 162, 19–28. [Google Scholar] [CrossRef]
- Crescentini, C.; Mondolo, F.; Biasutti, E.; Shallice, T. Supervisory and routine processes in noun and verb generation in nondemented patients with Parkinson’s disease. Neuropsychologia 2008, 46, 434–447. [Google Scholar] [CrossRef]
- Altmann, L.J.; Troche, M.S. High-level language production in Parkinson’s disease: A review. Park. Dis. 2011, 2011, 238956. [Google Scholar] [CrossRef] [Green Version]
- Eyigoz, E.; Courson, M.; Sedeño, L.; Rogg, K.; Orozco-Arroyave, J.R.; Nöth, E.; Skodda, S.; Trujillo, N.; Rodríguez, M.; Rusz, J.; et al. From discourse to pathology: Automatic identification of Parkinson’s disease patients via morphological measures across three languages. Cortex 2020, 132, 191–205. [Google Scholar] [CrossRef]
- Gunduz, H. Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access 2019, 7, 115540–115551. [Google Scholar] [CrossRef]
- Er, M.B.; Isik, E.; Isik, I. Parkinson’s detection based on combined CNN and LSTM using enhanced speech signals with variational mode decomposition. Biomed. Signal Process. Control 2021, 70, 103006. [Google Scholar] [CrossRef]
- Orozco-Arroyave, J.R.; Arias-Londoño, J.D.; Vargas-Bonilla, J.F.; Gonzalez-Rátiva, M.C.; Nöth, E. New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; pp. 342–347. [Google Scholar]
- Quan, C.; Ren, K.; Luo, Z. A deep learning based method for Parkinson’s disease detection using dynamic features of speech. IEEE Access 2021, 9, 10239–10252. [Google Scholar] [CrossRef]
- Orozco-Arroyave, J.R. Analysis of Speech of People with Parkinson’s Disease; Logos Verlag Berlin GmbH: Berlin, Germany, 2016; p. 138. [Google Scholar]
- Quan, C.; Ren, K.; Luo, Z.; Chen, Z.; Ling, Y. End-to-end deep learning approach for Parkinson’s disease detection from speech signals. Biocybern. Biomed. Eng. 2022, 42, 556–574. [Google Scholar] [CrossRef]
- Goetz, C.G.; Tilley, B.C.; Shaftman, S.R.; Stebbins, G.T.; Fahn, S.; Martinez-Martin, P.; LaPelle, N. Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): Scale presentation and clinimetric testing results. Mov. Disord. 2008, 23, 2129–2170. [Google Scholar] [CrossRef] [PubMed]
- Pérez-Toro, P.A.; Vásquez-Correa, J.C.; Strauss, M.; Orozco-Arroyave, J.R.; Nöth, E. Natural language analysis to detect Parkinson’s disease. In Proceedings of the International Conference on Text, Speech, and Dialogue, Ljubljana, Slovenia, 11–13 September 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 82–90. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Zhang, Y.; Jin, R.; Zhi-Hua, Z. Understanding bag-of-word model: A statistical framework. Int. J. Mach. Learn. Cybern. 2010, 1, 43–52. [Google Scholar] [CrossRef]
- Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef] [Green Version]
- Dhir, N.; Edman, M.; Sanchez Ferro, A.; Stafford, T.; Bannard, C. Identifying robust markers of Parkinson’s disease in typing behaviour using a CNN-LSTM network. In Proceedings of the CoNLL. Association for Computational Linguistics, Online, 19–20 November 2020; pp. 578–595. [Google Scholar]
- García, A.M.; Escobar-Grisales, D.; Vásquez Correa, J.C.; Bocanegra, Y.; Moreno, L.; Carmona, J. Detecting Parkinson’s disease and its cognitive phenotypes via automated semantic analyses of action stories. NPJ Park. Dis. 2022, 8, 163. [Google Scholar] [CrossRef]
- Poorjam, A.H.; Kavalekalam, M.S.; Shi, L.; Raykov, J.P.; Jensen, J.R.; Little, M.A.; Christensen, M.G. Automatic quality control and enhancement for voice-based remote Parkinson’s disease detection. Speech Commun. 2021, 127, 1–16. [Google Scholar] [CrossRef]
- Eyigöz, E.; Polosecki, P.; García, A.M.; Rogg, K.; Orozco-Arroyave, J.R.; Skodda, S.; Cecchi, G.A. Unsupervised Morphological Segmentation for Detecting Parkinson’s Disease. In Proceedings of the Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Huerta, J.M.; Stern, R.M. Speech recognition from GSM codec parameters. In Proceedings of the ICSLP, Sydney, Australia, 30 November–4 December 1998; pp. 1463–1466. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Rios-Urrego, C.D.; Vásquez-Correa, J.C.; Orozco-Arroyave, J.R.; Nöth, E. Is There Any Additional Information in a Neural Network Trained for Pathological Speech Classification? In Proceedings of the International Conference on Text, Speech, and Dialogue: 24th International Conference, TSD 2021, Olomouc, Czech Republic, 6–9 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 435–447. [Google Scholar]
- Baevski, A.; Zhou, Y.; Mohamed, A.; Auli, M. Wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 2020, 33, 12449–12460. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
- Reese, S.; Boleda Torrent, G.; Cuadros Oller, M.; Padró, L.; Rigau Claramunt, G. Word-sense disambiguated multilingual wikipedia corpus. In Proceedings of the 7th International Conference on Language Resources and Evaluation, Valletta, Malta, 19–21 May 2010. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- José, C.; Gabriel, C.; Rodrigo, F.; Jorge, P. Spanish pre-trained bert model and evaluation data. Pml4dc Iclr 2020, 2020, 1–10. [Google Scholar]
- Tiedemann, J. Parallel data, tools and interfaces in OPUS. In Proceedings of the Lrec, Istanbul, Turkey, 21–27 May 2012; Volume 2012, pp. 2214–2218. [Google Scholar]
- Perez-Toro, P.A. PauPerezT/WEBERT: Word Embeddings Using BERT, 2020. Available online: https://github.com/PauPerezT/WEBERT/blob/master/utils.py (accessed on 21 June 2023).
- Escobar-Grisales, D.; Vásquez-Correa, J.C.; Orozco-Arroyave, J.R. Author Profiling in Informal and Formal Language Scenarios Via Transfer Learning. Tecnológicas 2021, 24, 212–225. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
Ref. | Datasets | Features/Representation | ML/DL Models | Validation | Results |
---|---|---|---|---|---|
Speech analysis | |||||
[20] | 50 PD–50 HC | Mel spectrograms | 2D-CNNs and LSTMs | Cross-validation: 10 folds | Acc: 98.6% |
[24] | DB1: 30 PD–15 HC DB2: 50 PD–50 HC | Mel spectrograms | 1D-CNNs and 2D-CNNs | Training, validation, test split | DB1 Acc: 81.6% DB2 Acc: 92.0% |
[22] | 30 PD–15 HC | MFCCs and Bark band energies | Bidirectional LSTMs | Cross-validation: 10 folds | Acc: 84.3% |
[19] | 188 PD–64 HC | Tunable Q-factor and time frequency wavelet transform; MFCCs and vocal fold features | 1D-CNNs in parallel | Leave-One-Person-Out cross-validation | Acc: 85.7% |
[32] | 400 PD–400 HC | Perceptual linear predictive coefficients | Gaussian mixture models | Cross-validation: 5 folds | AUC: 0.95 |
Language analysis | |||||
[31] | 40 PD–40 HC | P-RSF metric | SVM | Cross-validation: 10 folds | Acc: 85.0% |
[26] | 50 PD–50 HC | W2V, BoW TF-IDF | SVM | Cross-validation: 10 folds | Acc: 72.0% |
[33] | 88 PD–88 HC | Morphological features | SVM | Leave-One-Out cross validation | Acc: 81.0% |
[18] | Spanish: 91 PD–57 HC German: 88 PD–88 HC Czech: 20 PD–16 HC | Morphological features | LR SVM SGD | Leave-One-Out cross-validation | Spanish Acc: 71.0% German Acc: 71.0% Czech Acc: 80.0% |
[15] | 51 PD–50 HC | Semantic fields; Grammatical word-level repetitions | KNN | Leave-One-Out cross-validation | Pearson’s correlation: 0.77 |
[30] | Spanish: 11 PD–9 HC English: 16 PD–25 HC | Continuous BoW | 1D-CNN Bidirectional-LSTM | Cross-validation: 5 folds | Spanish AUC: 0.7 English AUC: 0.8 |
PD Patients | HC Subjects | PD vs. HC | |
---|---|---|---|
Gender [F/M] | 38/42 | 43/42 | * p = 0.81 |
Age [F/M] | 63.7 ± 7.3/64.5 ± 10.2 | 60.9 ± 8.2/64.8 ± 10.5 | ** p = 0.38 |
Range of age [F/M] | 51–81/45–86 | 49–83/42–86 | |
MDS-UPDRS-III [F/M] | 34.6 ± 19.9/38.5 ± 19.6 | ||
Range of MDS-UPDRS-III [F/M] | 9–106/7–92 |
2D-CNN | 1D-CNN | Wav2vec 2.0 | |
---|---|---|---|
Accuracy | 84.4 ± 8.8 | 72.6 ± 7.9 | 88.5 ± 8.3 |
Sensitivity | 81.3 ± 15.1 | 53.8 ± 16.8 | 82.5 ± 16.9 |
Specificity | 87.6 ± 15.8 | 92.5 ± 9.7 | 94.0 ± 6.3 |
F1-Score | 84.3 ± 9.2 | 68.0 ± 7.7 | 88.3 ± 8.7 |
Statistical Functionals | CNN | |||||
---|---|---|---|---|---|---|
W2V | BERT | BETO | W2V | BERT | BETO | |
Accuracy | 69.7 ± 6.1 | 61.8 ± 7.4 | 62.5 ± 6.9 | 73.8±10.3 | 74.2±10.2 | 77.9 ± 8.4 |
Sensitivity | 71.3 ± 17.7 | 47.5 ± 19.2 | 56.3 ± 23.9 | 75.0 ± 22.4 | 76.3 ± 11.8 | 76.4 ± 12.4 |
Specificity | 68.2 ± 17.4 | 75.7 ± 22.3 | 70.0 ± 21.0 | 72.6 ± 20.2 | 72.1±17.8 | 79.2 ± 15.7 |
F1-Score | 68.6 ± 8.5 | 53.1±11.2 | 56.9 ± 13.3 | 72.2 ± 14.8 | 74.2 ± 9.2 | 76.9 ± 8.4 |
Early | Joint | Late | |
---|---|---|---|
Accuracy | 73.9 ± 12.8 | 77.2 ± 2.0 | 77.6 ± 8.3 |
Sensitivity | 86.3 ± 15.3 | 77.3 ± 5.3 | 75.0 ± 11.2 |
Specificity | 62.4 ± 26.9 | 77.0 ± 4.4 | 80.1 ± 11.7 |
F1-Score | 76.5 ± 11.0 | 76.3 ± 2.3 | 76.4 ± 9.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Escobar-Grisales, D.; Ríos-Urrego, C.D.; Orozco-Arroyave, J.R. Deep Learning and Artificial Intelligence Applied to Model Speech and Language in Parkinson’s Disease. Diagnostics 2023, 13, 2163. https://doi.org/10.3390/diagnostics13132163
Escobar-Grisales D, Ríos-Urrego CD, Orozco-Arroyave JR. Deep Learning and Artificial Intelligence Applied to Model Speech and Language in Parkinson’s Disease. Diagnostics. 2023; 13(13):2163. https://doi.org/10.3390/diagnostics13132163
Chicago/Turabian StyleEscobar-Grisales, Daniel, Cristian David Ríos-Urrego, and Juan Rafael Orozco-Arroyave. 2023. "Deep Learning and Artificial Intelligence Applied to Model Speech and Language in Parkinson’s Disease" Diagnostics 13, no. 13: 2163. https://doi.org/10.3390/diagnostics13132163