applsci-logo

Journal Browser

Journal Browser

Speech Recognition and Natural Language Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 May 2025 | Viewed by 3085

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computing and Mathematics, Faculty of Science and Engineering, University of Derby, Derby DE22 1GB, UK
Interests: artificial intelligence (AI); natural language processing (NLP)

E-Mail Website
Guest Editor
Department of Computing and Mathematics, Faculty of Science and Engineering, University of Derby, Derby DE22 1GB, UK
Interests: artificial intelligence; natural language processing (NLP)

Special Issue Information

Dear Colleagues,

Speech Recognition (SR) and Natural Language Processing (NLP) have emerged as two of the most transformative fields in artificial intelligence. This Special Issue aims to explore the latest advancements and challenges in the interdisciplinary fields of Speech Recognition (SR) and Natural Language Processing (NLP). As the demand for intelligent systems capable of understanding and processing human language continues to rise, researchers are increasingly focusing on developing innovative algorithms, models, and applications in these domains. This Special Issue provides a platform for scholars and practitioners to disseminate their cutting-edge research findings, methodologies, and insights, fostering collaboration and driving progress in this dynamically progressive field.

Topics of interest include, but are not limited to, the following:

  • Automatic Speech Recognition (ASR) systems;
  • Natural Language Understanding (NLU) and interpretation;
  • Speech synthesis and generation;
  • Sentiment analysis and opinion mining;
  • Dialogue systems and conversational interfaces;
  • Machine translation and cross-lingual NLP;
  • Voice user interfaces (VUIs) and intelligent assistants;
  • Language modeling and representation learning;
  • End-to-end speech-to-text and text-to-speech systems;
  • Speech and language applications.

We invite original research contributions, review articles, case studies, and surveys that advance the state of the art in Speech Recognition and Natural Language Processing. Submissions should present novel methodologies, experimental results, theoretical insights, or practical applications that contribute to the development and understanding of these critical areas.

Dr Asad Abdi
Prof. Dr. Farid Meziane
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • automatic speech recognition
  • natural language understanding
  • sentiment analysis
  • machine translation
  • voice user interfaces
  • speech-to-text
  • text-to-speech
  • dialogue systems
  • conversational AI
  • spoken language understanding
  • language modeling

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5718 KiB  
Article
Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering
by Jovan Galić, Branko Marković, Đorđe Grozdić, Branislav Popović and Slavko Šajić
Appl. Sci. 2024, 14(18), 8223; https://doi.org/10.3390/app14188223 - 12 Sep 2024
Viewed by 832
Abstract
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is [...] Read more.
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering. Full article
(This article belongs to the Special Issue Speech Recognition and Natural Language Processing)
Show Figures

Figure 1

17 pages, 543 KiB  
Article
Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding
by Minsoo Kim and Gil-Jin Jang
Appl. Sci. 2024, 14(18), 8138; https://doi.org/10.3390/app14188138 - 10 Sep 2024
Viewed by 738
Abstract
Automatic speech recognition (ASR) aims at understanding naturally spoken human speech to be used as text inputs to machines. In multi-speaker environments, where multiple speakers are talking simultaneously with a large amount of overlap, a significant performance degradation may occur with conventional ASR [...] Read more.
Automatic speech recognition (ASR) aims at understanding naturally spoken human speech to be used as text inputs to machines. In multi-speaker environments, where multiple speakers are talking simultaneously with a large amount of overlap, a significant performance degradation may occur with conventional ASR systems if they are trained by recordings of single talkers. This paper proposes a multi-speaker ASR method that incorporates speaker embedding information as an additional input. The embedding information for each of the speakers in the training set was extracted as numeric vectors, and all of the embedding vectors were stacked to construct a total speaker profile matrix. The speaker profile matrix from the training dataset enables finding embedding vectors that are close to the speakers of the input recordings in the test conditions, and it helps to recognize the individual speakers’ voices mixed in the input. Furthermore, the proposed method efficiently reuses the decoder from the existing speaker-independent ASR model, eliminating the need for retraining the entire system. Various speaker embedding methods such as i-vector, d-vector, and x-vector were adopted, and the experimental results show 0.33% and 0.95% absolute (3.9% and 11.5% relative) improvements without and with the speaker profile in the word error rate (WER). Full article
(This article belongs to the Special Issue Speech Recognition and Natural Language Processing)
Show Figures

Figure 1

13 pages, 4133 KiB  
Article
Gender Recognition Based on the Stacking of Different Acoustic Features
by Ergün Yücesoy
Appl. Sci. 2024, 14(15), 6564; https://doi.org/10.3390/app14156564 - 27 Jul 2024
Viewed by 641
Abstract
A speech signal can provide various information about a speaker, such as their gender, age, accent, and emotional state. The gender of the speaker is the most salient piece of information contained in the speech signal and is directly or indirectly used in [...] Read more.
A speech signal can provide various information about a speaker, such as their gender, age, accent, and emotional state. The gender of the speaker is the most salient piece of information contained in the speech signal and is directly or indirectly used in many applications. In this study, a new approach is proposed for recognizing the gender of the speaker based on the use of hybrid features created by stacking different types of features. For this purpose, four different features, namely Mel frequency cepstral coefficients (MFCC), Mel scaled power spectrogram (Mel Spectrogram), Chroma, Spectral contrast (Contrast), and Tonal Centroid (Tonnetz), and twelve hybrid features created by stacking these features were used. These features were applied to four different classifiers, two of which were based on traditional machine learning (KNN and LDA) while two were based on the deep learning approach (CNN and MLP), and the performance of each was evaluated separately. In the experiments conducted on the Turkish subset of the Common Voice dataset, it was observed that hybrid features, created by stacking different acoustic features, led to improvements in gender recognition accuracy ranging from 0.3 to 1.73%. Full article
(This article belongs to the Special Issue Speech Recognition and Natural Language Processing)
Show Figures

Figure 1

Back to TopTop