Multimodal Emotion Recognition

A special issue of Multimodal Technologies and Interaction (ISSN 2414-4088).

Deadline for manuscript submissions: closed (10 January 2021) | Viewed by 8296

Special Issue Editors


E-Mail Website
Guest Editor
Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, The Netherlands
Interests: affective computing; human computer interaction; human activity recognition; emotion recognition; computer vision

E-Mail Website
Guest Editor
Department of Data Science and Knowledge Engineering, Maastricht University, 6229 Maastricht, The Netherlands
Interests: machine learning; deep learning; (bio)signal processing and analysis; medical imaging; electroencephalography
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The goal of automatic emotion recognition is to detect and recognize affect from low-level sensorial cues. The processing of behavioral and emotional signals usually involves facial analysis, body posture, speech/vocalization, as well as biomeasurements and analysis of brain signals. Most of the times, proper emotional models are used for emotion recognition. These can be coarsely divided into two categories: discrete and continuous emotion models. One of the most universally recognized and widely used discrete emotion models is based on the six basic emotions (sadness, happiness, fear, anger, surprise, and disgust), as proposed by Paul Ekman. More recently, the concept of compound emotion (e.g., surprisingly happy or happily disgusted) has also been explored in AI. There are also many studies using dimensional spaces (e.g., valence, arousal), mainly due to the fact that discrete models, although easily applicable in classification problems, do not always fully describe every emotion-enriched experience, and their label-based character limits their applicability in domains where the intensity of the emotion is important.

For a more accurate emotion recognition, many works have proposed multimodal fusion approaches, combining various cues (e.g., visual, audio, wearable devices, brain signals). Efforts in developing related methods, however, often face challenges due to a significant lack of proper datasets for training and testing. Thus, one of the most typical problems faced by researchers in affective computing is that they train their systems on one dataset and, while they achieve accurate results on it, they fail to achieve high accuracies when applying the same model on a different dataset. This becomes even more challenging when data are captured in non-controlled, spontaneous conditions. Regarding AI methods for emotion recognition, various techniques have been proposed, with the most recent studies focusing on end-to-end, deep-learning topologies as a way to take advantage of as much training data as possible, aiming at high accuracies in datasets captured in the wild.

This Special Issue is looking for high-quality research contributions in one or more of the following domains:

  • New computational models in multimodal emotion recognition, using deep-learning topologies
  • Personalized emotional models and multimodality
  • Domain adaptation across datasets in emotion recognition
  • Domain adaptation across modalities in emotion recognition
  • New multimodal datasets for emotion recognition

Dr. Stylianos (Stelios) Asteriadis
Dr. Enrique Hortal
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Multimodal emotion recognition
  • Facial expression recognition
  • Audio analysis for emotion recognition
  • Brain signals analysis for emotion recognition
  • Wearable devices for emotion recognition
  • Multimodal datasets for affect analysis

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 7660 KiB  
Article
A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images
by Mohammad Faridul Haque Siddiqui and Ahmad Y. Javaid
Multimodal Technol. Interact. 2020, 4(3), 46; https://doi.org/10.3390/mti4030046 - 06 Aug 2020
Cited by 34 | Viewed by 7626
Abstract
The exigency of emotion recognition is pushing the envelope for meticulous strategies of discerning actual emotions through the use of superior multimodal techniques. This work presents a multimodal automatic emotion recognition (AER) framework capable of differentiating between expressed emotions with high accuracy. The [...] Read more.
The exigency of emotion recognition is pushing the envelope for meticulous strategies of discerning actual emotions through the use of superior multimodal techniques. This work presents a multimodal automatic emotion recognition (AER) framework capable of differentiating between expressed emotions with high accuracy. The contribution involves implementing an ensemble-based approach for the AER through the fusion of visible images and infrared (IR) images with speech. The framework is implemented in two layers, where the first layer detects emotions using single modalities while the second layer combines the modalities and classifies emotions. Convolutional Neural Networks (CNN) have been used for feature extraction and classification. A hybrid fusion approach comprising early (feature-level) and late (decision-level) fusion, was applied to combine the features and the decisions at different stages. The output of the CNN trained with voice samples of the RAVDESS database was combined with the image classifier’s output using decision-level fusion to obtain the final decision. An accuracy of 86.36% and similar recall (0.86), precision (0.88), and f-measure (0.87) scores were obtained. A comparison with contemporary work endorsed the competitiveness of the framework with the rationale for exclusivity in attaining this accuracy in wild backgrounds and light-invariant conditions. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition)
Show Figures

Figure 1

Back to TopTop