sensors-logo

Journal Browser

Journal Browser

Intelligent Sensors Based on Signal Processing for Speech Enhancement and Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (25 March 2024) | Viewed by 2893

Special Issue Editors


E-Mail Website
Guest Editor
Lehrstuhl für Multimediakommunikation und Signalverarbeitung (LMS), University of Erlangen–Nuremberg, 91058 Erlangen, Germany
Interests: microphone arrays; speech enhancement; speech processing; acoustic processing

E-Mail
Guest Editor
Dolby Laboratories, Level 3/35 Mitchell St., McMahons Point, Sydney, NSW 2060, Australia
Interests: deep learning; machine learning; speech processing; automatic speech recognition; automatic speaker recognition; speech enhancement

Special Issue Information

Dear Colleagues,

Intelligent sensors based on signal processing for speech enhancement and recognition are very important in many areas, such as voice communication and human–machine speech interfaces. However, in these applications, the signal of interest picked up by microphone sensors is inevitably contaminated by unwanted effects such as additive noise, reverberations, and interferences, which impair not only the fidelity and quality of the signal of interest but also the reliability of automatic speech recognition. Speech enhancement algorithms with intelligent sensors and intelligent sensor arrays then have been widely studied to deal with those adverse effects, enhancing the signal of interest from its corrupted observations. Moreover, many pieces of research with a growing interaction between signal processing and machine learning, especially deep learning, have shown the advantages of combining these methods in speech enhancement and recognition. This Special Issue aims to bring together leading researchers in the speech enhancement and recognition fields, thus providing advances in exploring methods of signal processing and deep learning.

Dr. Gongping Huang
Dr. Jianbo Ma
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • speech enhancement
  • automatic speech recognition
  • intelligent sensors
  • deep learning
  • signal processing

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 2077 KiB  
Article
Nonlinear Regularization Decoding Method for Speech Recognition
by Jiang Zhang, Liejun Wang, Yinfeng Yu and Miaomiao Xu
Sensors 2024, 24(12), 3846; https://doi.org/10.3390/s24123846 - 14 Jun 2024
Viewed by 348
Abstract
Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex [...] Read more.
Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively. Full article
Show Figures

Figure 1

19 pages, 882 KiB  
Article
Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
by Caleb Rascon
Sensors 2023, 23(9), 4394; https://doi.org/10.3390/s23094394 - 29 Apr 2023
Cited by 7 | Viewed by 2126
Abstract
Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenarios (i.e., feeding the model, in [...] Read more.
Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenarios (i.e., feeding the model, in one go, a complete audio recording, which may extend several seconds). It is of significant interest to evaluate and characterize the current state-of-the-art in applications that process audio online (i.e., feeding the model a sequence of segments of audio data, concatenating the results at the output end). Although evaluations and comparisons between speech-enhancement techniques have been carried out before, as far as the author knows, the work presented here is the first that evaluates the performance of such techniques in relation to their online applicability. This means that this work measures how the output signal-to-interference ratio (as a separation metric), the response time, and memory usage (as online metrics) are impacted by the input length (the size of audio segments), in addition to the amount of noise, amount and number of interferences, and amount of reverberation. Three popular models were evaluated, given their availability on public repositories and online viability, MetricGAN+, Spectral Feature Mapping with Mimic Loss, and Demucs-Denoiser. The characterization was carried out using a systematic evaluation protocol based on the Speechbrain framework. Several intuitions are presented and discussed, and some recommendations for future work are proposed. Full article
Show Figures

Figure 1

Back to TopTop