Audio Signal Processing: Analysis and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Networks".

Deadline for manuscript submissions: closed (30 November 2020) | Viewed by 9093

Special Issue Editor


E-Mail Website1 Website2
Guest Editor
Department of Information Security, Korea University, Seoul, Korea
Interests: audio watermarking; multimedia security; reversible data hiding

Special Issue Information

Audio signal processing plays an important role in consumer electronics, broadcasting, telecommunications, multimedia communications, and entertainment systems. Compared to the image and video signal processing technology, however, the audio market is relatively smaller, and there are significantly fewer dedicated engineers and scientists. As a result, there were not many journals that could publish technical papers on audio signal processing. This Special Issue will provide a venue for audio signal processing engineers and scientists to publish valuable papers on audio analysis and applications. Technology for audio-like signals such as medical signals (EEG, ECG, etc.) and speech signals are welcome for publication. This Special Issue mainly focuses on the emerging applications related to big data analysis, machine and deep learning, forensics, steganography, and so on.

We invite papers addressing, but not limited to, the following topics:

  • Audio and speech analytics;
  • Audio and speech annotation;
  • Audio and speech classification;
  • Audio and speech clustering;
  • Audio and speech compression;
  • Audio and speech emotion recognition;
  • Audio and speech forensics;
  • Audio and speech recognition;
  • Audio and speech segmentation;
  • Audio and speech tagging;
  • Audio and speech watermarking;
  • Machine learning for audio signal processing;
  • Reversible data hiding for audio and speech;
  • Steganography for audio and speech;

Prof. Dr. Hyoung Joong Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 13548 KiB  
Article
An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement
by Soha A. Nossier, Julie Wall, Mansour Moniri, Cornelius Glackin and Nigel Cannings
Electronics 2021, 10(1), 17; https://doi.org/10.3390/electronics10010017 - 24 Dec 2020
Cited by 23 | Viewed by 4933
Abstract
Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder [...] Read more.
Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalization ability; complexity; and, processing time. Further analysis is then provided while using two different approaches. The first approach investigates how the performance is affected by changing network hyperparameters and the structure of the data, including the Lombard effect. While the second approach interprets the results by visualizing the spectrogram of the output layer of all the investigated models, and the spectrograms of the hidden layers of the convolutional neural network architecture. Finally, a general evaluation is performed for supervised deep learning-based speech enhancement while using SWOC analysis, to discuss the technique’s Strengths, Weaknesses, Opportunities, and Challenges. The results of this paper contribute to the understanding of how different deep neural networks perform the speech enhancement task, highlight the strengths and weaknesses of each architecture, and provide recommendations for achieving better performance. This work facilitates the development of better deep neural networks for speech enhancement in the future. Full article
(This article belongs to the Special Issue Audio Signal Processing: Analysis and Applications)
Show Figures

Figure 1

20 pages, 7623 KiB  
Article
Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
by Ioannis Papadimitriou, Anastasios Vafeiadis, Antonios Lalas, Konstantinos Votis and Dimitrios Tzovaras
Electronics 2020, 9(10), 1593; https://doi.org/10.3390/electronics9101593 - 29 Sep 2020
Cited by 20 | Viewed by 3599
Abstract
Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to [...] Read more.
Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported. Full article
(This article belongs to the Special Issue Audio Signal Processing: Analysis and Applications)
Show Figures

Figure 1

Back to TopTop