RETRACTED: Designing an ASR Corpus for the Albanian Language †
Abstract
:1. Introduction
2. Corpus Construction
2.1. Source Data
2.2. Creation of Audio and Text Files
2.3. Corpus Description and Organization
3. Speech Recognition Architectures
4. Results and Discussion
4.1. Evaluation of the AlbanianCorpus through the Training Set
4.2. Evaluation of the AlbanianCorpus through the Testing Set
4.3. Evaluation of AlbanianCorpus in Comparison to LibriSpeech
4.4. Evaluation of Transformers Architecture Using AlbanianCorpus
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kamath, U.; Liu, J.; Whitaker, J. Deep Learning for NLP and Speech Recognition; Springer: Cham, Switzerland, 2019; Volume 84. [Google Scholar]
- Lee, J.; Kim, K.; Lee, K.; Chung, M. Gender, age, and dialect identification for speaker profiling. In Proceedings of the 2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Cebu, Philippines, 25–27 October 2019. [Google Scholar]
- Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Zhu, Z. Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the International Conference on Machine Learning, PMLR 2016, New York, NY, USA, 20–22 June 2016; pp. 173–182. [Google Scholar]
- Rista, A.; Kadriu, A. End-to-End Speech Recognition Model Based on Deep Learning for Albanian. In Proceedings of the 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 27 September–1 October 2021; pp. 442–446. [Google Scholar]
- Baevski, A.; Zhou, Y.; Mohamed, A.; Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 2020, 33, 12449–12460. [Google Scholar]
- Panayotov, V.; Chen, G.; Povey, D.; Khudanpur, S. Librispeech: An asr corpus based on public domain audio books. In Proceedings of the 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 5206–5210. [Google Scholar]
- Audacity. The Name Audacity (R) Is a Registered Trademark of Dominic Mazzoni. 2017. Available online: http://audacity.sourceforge.net (accessed on 10 January 2022).
- Sproat, R.; Black, A.W.; Chen, S.; Kumar, S.; Ostendorf, M.; Richards, C. Normalization of non-standard words. Comput. Speech Lang. 2001, 15, 287–333. [Google Scholar] [CrossRef]
- Vydana, H.K.; Vuppala, A.K. Residual neural networks for speech recognition. In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017; pp. 543–547. [Google Scholar]
- Ribeiro, A.H.; Tiels, K.; Aguirre, L.A.; Schön, T. Beyond exploding and vanishing gradients: Analysing RNN training using attractors and smoothness. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Palermo, Italy, 26–28 August 2020; pp. 2370–2380. [Google Scholar]
- Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with gumbel-softmax. arXiv 2016, arXiv:1611.01144. [Google Scholar]
- Park, D.; Chan, W.; Zhang, Y.; Chiu, C.; Zoph, B.; Cubuk, E.; Le, Q.V. Specaugment: A simple data augmentation method for automatic speech recog-nition. arXiv 2019, arXiv:1904.08779. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Corpus | Size | Number of Utterances | Total Words | Avg.Length per Utterance |
---|---|---|---|---|
AlbanianCorpus | 100 h | 37,358 | 848,553 | 9.53 s |
AlbanianCorpus_TrainingSet | 80 h | 29,215 | 686,971 | 9.85 s |
AlbanianCorpus_TestingSet | 20 h | 8543 | 161,582 | 8.42 s |
Architecture | WER % | CER % | Training Time |
---|---|---|---|
3 RNN and 5 GRU | 5 | 1 | 1216 h |
1 RNN and 4 GRU | 6 | 3 | 967 h |
1 RNN and 3 GRU | 7 | 5 | 722 h |
1 RNN and 2 GRU | 9 | 8 | 502 h |
2 RNN and 2 GRU | 11 | 10 | 915 h |
Training Loss | Epoch | Step | Validatio Loss | WER |
---|---|---|---|---|
5.16 | 0.1 | 200 | 29.123 | 0.9707 |
0.6853 | 1 | 2000 | 0.3244 | 0.2906 |
0.6154 | 2 | 4000 | 0.2861 | 0.2424 |
0.5299 | 4 | 8000 | 0.2632 | 0.2081 |
0.4563 | 6 | 12,000 | 0.2604 | 0.1997 |
0.419 | 8 | 16,000 | 0.2789 | 0.1909 |
0.3758 | 10 | 20,000 | 0.2893 | 0.1863 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rista, A.; Kadriu, A. RETRACTED: Designing an ASR Corpus for the Albanian Language. Eng. Proc. 2023, 56, 207. https://doi.org/10.3390/ASEC2023-16601
Rista A, Kadriu A. RETRACTED: Designing an ASR Corpus for the Albanian Language. Engineering Proceedings. 2023; 56(1):207. https://doi.org/10.3390/ASEC2023-16601
Chicago/Turabian StyleRista, Amarildo, and Arbana Kadriu. 2023. "RETRACTED: Designing an ASR Corpus for the Albanian Language" Engineering Proceedings 56, no. 1: 207. https://doi.org/10.3390/ASEC2023-16601