sensors-logo

Journal Browser

Journal Browser

Audio–Visual Sensor Fusion Strategies for Video Content Analytics

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Physical Sensors".

Deadline for manuscript submissions: closed (15 April 2019) | Viewed by 3994

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical Engineering, Korea University, Intelligent Signal Processing Center, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, South Korea
Interests: computer vision; acoustic signal processing; multi-sensor fusion; deep learning; big data analytics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

A two-hour movie, or a short movie clip as its subset, is intended to capture and present a meaningful (or significant) story in video to be recognized and understood by a human audience. What if we substitute human audience with that of an intelligent machine or robot capable of capturing and processing the semantic information in terms of audio and video cues contained in the video? By using both auditory and visual means, the human brain processes the audio (sound, speech) and video (background image scene, moving video objects, and written characters) modalities to extract the spatial and temporal semantic information, that are contextually complementary and robust. Smart machines equipped with audiovisual multisensors (e.g. CCTV equipped with cameras and microphones) should be capable of achieving the same task. An appropriate fusion strategy combining the audio and visual information would be key in developing such artificial general intelligent (AGI) systems. This Special Issue calls for papers on various sensor fusion techniques to combine the audiovisual information cues for video content analytics. There can be a wide range of fusion strategies at various information levels (e.g., feature, decision, and semantic) to extract meaningful information by providing the attention mechanism in terms of weighting significance of the cue to represent the intended world. In light of the recent advancement of deep-learning, this Special Issue will provide an important forum to present new fusion strategies by addressing the relevant research issues toward solving many applications requiring artificial general intelligence.

Prof. Dr. Hanseok Ko
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • camera
  • microphone
  • multimodal
  • auditory
  • visual
  • fusion
  • semantic
  • deep-learning
  • artificial general intelligence

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 3887 KiB  
Article
Visual Object Tracking Using Structured Sparse PCA-Based Appearance Representation and Online Learning
by Gang-Joon Yoon, Hyeong Jae Hwang and Sang Min Yoon
Sensors 2018, 18(10), 3513; https://doi.org/10.3390/s18103513 - 18 Oct 2018
Cited by 3 | Viewed by 3278
Abstract
Visual object tracking is a fundamental research area in the field of computer vision and pattern recognition because it can be utilized by various intelligent systems. However, visual object tracking faces various challenging issues because tracking is influenced by illumination change, pose change, [...] Read more.
Visual object tracking is a fundamental research area in the field of computer vision and pattern recognition because it can be utilized by various intelligent systems. However, visual object tracking faces various challenging issues because tracking is influenced by illumination change, pose change, partial occlusion and background clutter. Sparse representation-based appearance modeling and dictionary learning that optimize tracking history have been proposed as one possible solution to overcome the problems of visual object tracking. However, there are limitations in representing high dimensional descriptors using the standard sparse representation approach. Therefore, this study proposes a structured sparse principal component analysis to represent the complex appearance descriptors of the target object effectively with a linear combination of a small number of elementary atoms chosen from an over-complete dictionary. Using an online dictionary for learning and updating by selecting similar dictionaries that have high probability makes it possible to track the target object in a variety of environments. Qualitative and quantitative experimental results, including comparison to the current state of the art visual object tracking algorithms, validate that the proposed tracking algorithm performs favorably with changes in the target object and environment for benchmark video sequences. Full article
(This article belongs to the Special Issue Audio–Visual Sensor Fusion Strategies for Video Content Analytics)
Show Figures

Figure 1

Back to TopTop