Next Article in Journal
YG-SLAM: GPU-Accelerated RGBD-SLAM Using YOLOv5 in a Dynamic Environment
Next Article in Special Issue
Unsupervised Multiview Fuzzy C-Means Clustering Algorithm
Previous Article in Journal
Research on the Method of Hypergraph Construction of Information Systems Based on Set Pair Distance Measurement
Previous Article in Special Issue
Requirements and Trade-Offs of Compression Techniques in Key–Value Stores: A Survey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism

by
Konstantinos Mountzouris
1,
Isidoros Perikos
1,2,* and
Ioannis Hatzilygeroudis
1,*
1
Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece
2
Computer Technology Institute and Press “Diophantus”, 26504 Patras, Greece
*
Authors to whom correspondence should be addressed.
Electronics 2023, 12(20), 4376; https://doi.org/10.3390/electronics12204376
Submission received: 17 September 2023 / Revised: 9 October 2023 / Accepted: 13 October 2023 / Published: 23 October 2023
(This article belongs to the Special Issue Feature Papers in Computer Science & Engineering)

Abstract

Speech emotion recognition (SER) is an interesting and difficult problem to handle. In this paper, we deal with it through the implementation of deep learning networks. We have designed and implemented six different deep learning networks, a deep belief network (DBN), a simple deep neural network (SDNN), an LSTM network (LSTM), an LSTM network with the addition of an attention mechanism (LSTM-ATN), a convolutional neural network (CNN), and a convolutional neural network with the addition of an attention mechanism (CNN-ATN), having in mind, apart from solving the SER problem, to test the impact of the attention mechanism on the results. Dropout and batch normalization techniques are also used to improve the generalization ability (prevention of overfitting) of the models as well as to speed up the training process. The Surrey Audio–Visual Expressed Emotion (SAVEE) database and the Ryerson Audio–Visual Database (RAVDESS) were used for the training and evaluation of our models. The results showed that the networks with the addition of the attention mechanism did better than the others. Furthermore, they showed that the CNN-ATN was the best among the tested networks, achieving an accuracy of 74% for the SAVEE database and 77% for the RAVDESS, and exceeding existing state-of-the-art systems for the same datasets.
Keywords: speech emotion recognition; deep learning; deep belief network; deep neural network; convolutional neural network; LSTM; attention mechanism speech emotion recognition; deep learning; deep belief network; deep neural network; convolutional neural network; LSTM; attention mechanism

Share and Cite

MDPI and ACS Style

Mountzouris, K.; Perikos, I.; Hatzilygeroudis, I. Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism. Electronics 2023, 12, 4376. https://doi.org/10.3390/electronics12204376

AMA Style

Mountzouris K, Perikos I, Hatzilygeroudis I. Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism. Electronics. 2023; 12(20):4376. https://doi.org/10.3390/electronics12204376

Chicago/Turabian Style

Mountzouris, Konstantinos, Isidoros Perikos, and Ioannis Hatzilygeroudis. 2023. "Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism" Electronics 12, no. 20: 4376. https://doi.org/10.3390/electronics12204376

APA Style

Mountzouris, K., Perikos, I., & Hatzilygeroudis, I. (2023). Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism. Electronics, 12(20), 4376. https://doi.org/10.3390/electronics12204376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop