applsci-logo

Journal Browser

Journal Browser

Computational Linguistics: From Text to Speech Technologies

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 November 2024 | Viewed by 1640

Special Issue Editors


E-Mail Website
Guest Editor
Research Institute on Multilingual Language Technologies, Department of Translation and Interpreting, University of Malaga, 29016 Málaga, Spain
Interests: corpus linguistics; machine interpreting; speech-to-text; translation and interpreting technologies; computational phraseology

E-Mail Website
Guest Editor
UCREL, Lancaster University, Lancaster LA1 4WA, UK
Interests: computational linguistics; natural language processing; machine translation; quality estimation

Special Issue Information

Dear Colleagues,

In recent years, advancements in machine learning, natural language processing, artificial intelligence, and speech synthesis have revolutionized how we communicate with other humans and language-based systems. From virtual assistants to language translation tools, the capabilities of these technologies continue to expand, offering new possibilities for communication, accessibility, and innovation.

This Special Issue serves as a platform to explore the latest research, methodologies, and applications that drive the development of various text-to-speech technologies, such as automatic speech recognition, machine interpreting, speech translation, and speech-to-text software, among others. The Special Issue is intended for researchers, practitioners, and enthusiasts in the fields of computational linguistics, corpus linguistics, natural language processing, and machine learning. We invite research studies based on neural network architectures, large language models, linguistic modeling, AI-driven systems, and the intersection of linguistics and computer science (including multilingual communication). We would also like to invite authors to address the challenges in applying text-to-speech technologies in practical applications, low-resource languages, and specific domains.

Prof. Dr. Gloria Corpas Pastor
Dr. Tharindu Ranasinghe
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence (AI)
  • automatic speech recognition (ASR)
  • machine interpreting (MI)
  • cascaded models
  • end2end models
  • speech-to-text (STT) modelling
  • speech translation
  • quality estimation
  • large language models (LLMs)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 433 KiB  
Article
Automatic Speech Recognition Advancements for Indigenous Languages of the Americas
by Monica Romero, Sandra Gómez-Canaval and Ivan G. Torre
Appl. Sci. 2024, 14(15), 6497; https://doi.org/10.3390/app14156497 - 25 Jul 2024
Viewed by 677
Abstract
Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities in America. The Second AmericasNLP Competition Track 1 of NeurIPS 2022 proposed the task of training automatic speech recognition (ASR) systems for [...] Read more.
Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities in America. The Second AmericasNLP Competition Track 1 of NeurIPS 2022 proposed the task of training automatic speech recognition (ASR) systems for five Indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana. In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately 36.65 h of transcribed speech data from diverse sources enriched with data augmentation methods. We systematically investigate, using a Bayesian search, the impact of the different hyperparameters on the Wav2vec2.0 XLS-R variants of 300 M and 1 B parameters. Our findings indicate that data and detailed hyperparameter tuning significantly affect ASR accuracy, but language complexity determines the final result. The Quechua model achieved the lowest character error rate (CER) (12.14), while the Kotiria model, despite having the most extensive dataset during the fine-tuning phase, showed the highest CER (36.59). Conversely, with the smallest dataset, the Guarani model achieved a CER of 15.59, while Bribri and Wa’ikhana obtained, respectively, CERs of 34.70 and 35.23. Additionally, Sobol’ sensitivity analysis highlighted the crucial roles of freeze fine-tuning updates and dropout rates. We release our best models for each language, marking the first open ASR models for Wa’ikhana and Kotiria. This work opens avenues for future research to advance ASR techniques in preserving minority Indigenous languages. Full article
(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)
Show Figures

Figure 1

17 pages, 791 KiB  
Article
Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis
by Mengrui Liu, Rui Jiang and Hongwu Yang
Appl. Sci. 2024, 14(14), 6336; https://doi.org/10.3390/app14146336 - 20 Jul 2024
Viewed by 639
Abstract
This article presents a transfer-learning-based method to improve the synthesized speech quality of the low-resource Dungan language. This improvement is accomplished by fine-tuning a pre-trained Mandarin acoustic model to a Dungan language acoustic model using a limited Dungan corpus within the Tacotron2+WaveRNN framework. [...] Read more.
This article presents a transfer-learning-based method to improve the synthesized speech quality of the low-resource Dungan language. This improvement is accomplished by fine-tuning a pre-trained Mandarin acoustic model to a Dungan language acoustic model using a limited Dungan corpus within the Tacotron2+WaveRNN framework. Our method begins with developing a transformer-based Dungan text analyzer capable of generating unit sequences with embedded prosodic information from Dungan sentences. These unit sequences, along with the speech features, provide <unit sequence with prosodic labels, Mel spectrograms> pairs as the input of Tacotron2 to train the acoustic model. Concurrently, we pre-trained a Tacotron2-based Mandarin acoustic model using a large-scale Mandarin corpus. The model is then fine-tuned with a small-scale Dungan speech corpus to derive a Dungan acoustic model that autonomously learns the alignment and mapping of the units to the spectrograms. The resulting spectrograms are converted into waveforms via the WaveRNN vocoder, facilitating the synthesis of high-quality Mandarin or Dungan speech. Both subjective and objective experiments suggest that the proposed transfer learning-based Dungan speech synthesis achieves superior scores compared to models trained only with the Dungan corpus and other methods. Consequently, our method offers a strategy to achieve speech synthesis for low-resource languages by adding prosodic information and leveraging a similar, high-resource language corpus through transfer learning. Full article
(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)
Show Figures

Figure 1

Back to TopTop