Advances in Audio and Video Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: 20 May 2024 | Viewed by 1222

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Multimedia Telecommunications, Poznan University of Technology, 60-965 Poznań, Poland
Interests: video coding; immersive video; virtual view synthesis; visual quality assessment

E-Mail Website
Guest Editor
Institute of Multimedia Telecommunications, Poznan University of Technology, 60-965 Poznań, Poland
Interests: immersive video; depth map estimation; multicamera system calibration

Special Issue Information

Dear Colleagues,

Regarding the significant share of audiovisual data in the internet’s traffic and the development of new services such as immersive video and audio systems, there is a need to continuously develop more efficient and sophisticated techniques of processing and coding video and audio data. Moreover, such services should provide the highest quality of content presented to the final user. Therefore, this Special Issue is intended to present new ideas and experimental results in audiovisual data processing and coding, including immersive media processing focused on a high quality of experience.

This Special Issue will be dedicated to novel audiovisual data processing and coding techniques, emphasizing immersive media and delivering high-quality content to the user of an immersive media system.

Dr. Adrian Dziembowski
Dr. Dawid Mieloch
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • immersive media: immersive video and audio processing and coding
  • real-time immersive video applications
  • high QoE video and audio systems
  • objective quality assessment
  • neural networks in audiovisual data processing and coding
  • new audio and video compression standards

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 541 KiB  
Article
Target Selection Strategies for Demucs-Based Speech Enhancement
by Caleb Rascon and Gibran Fuentes-Pineda
Appl. Sci. 2023, 13(13), 7820; https://doi.org/10.3390/app13137820 - 03 Jul 2023
Viewed by 979
Abstract
The Demucs-Denoiser model has been recently shown to achieve a high level of performance for online speech enhancement, but assumes that only one speech source is present in the fed mixture. In real-life multiple-speech-source scenarios, it is not certain which speech source will [...] Read more.
The Demucs-Denoiser model has been recently shown to achieve a high level of performance for online speech enhancement, but assumes that only one speech source is present in the fed mixture. In real-life multiple-speech-source scenarios, it is not certain which speech source will be enhanced. To correct this issue, two target selection strategies for the Demucs-Denoiser model are proposed and evaluated: (1) an embedding-based strategy, using a codified sample of the target speech, and (2) a location-based strategy, using a beamforming-based prefilter to select the target that is in front of a two-microphone array. In this work, it is shown that while both strategies improve the performance of the Demucs-Denoiser model when one or more speech interferences are present, they both have their pros and cons. Specifically, the beamforming-based strategy achieves overall a better performance (increasing the output SIR between 5 and 10 dB) compared to the embedding-based strategy (which only increases the output SIR by 2 dB and only in low-input-SIR scenarios). However, the beamforming-based strategy is sensitive against the location variation of the target speech source (decreasing the output SIR by 10 dB if the target speech source is located only 0.1 m from its expected position), which the embedding-based strategy does not suffers from. Full article
(This article belongs to the Special Issue Advances in Audio and Video Processing)
Show Figures

Figure 1

Back to TopTop