Next Article in Journal
A Comprehensive CFD Assessment of Wheat Flow in Wheat Conveying Cyclone Validation and Performance Analysis by Experimental Data
Next Article in Special Issue
Forecasting the 10.7-cm Solar Radio Flux Using Deep CNN-LSTM Neural Networks
Previous Article in Journal
Separation Characteristics of an Axial Hydrocyclone Separator
Previous Article in Special Issue
Photoplethysmography Analysis with Duffing–Holmes Self-Synchronization Dynamic Errors and 1D CNN-Based Classifier for Upper Extremity Vascular Disease Screening
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

1
Department of Computer Science and Information Engineering, Chang Gung University, Guishan District, Taoyuan City 33302, Taiwan
2
Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Guishan District, Taoyuan City 33302, Taiwan
3
Artificial Intelligence Research Center, Chang Gung University, Guishan District, Taoyuan City 33302, Taiwan
4
Bachelor Program in Artificial Intelligence, Chang Gung University, Guishan District, Taoyuan City 33302, Taiwan
*
Author to whom correspondence should be addressed.
Processes 2021, 9(12), 2286; https://doi.org/10.3390/pr9122286
Submission received: 2 December 2021 / Revised: 11 December 2021 / Accepted: 13 December 2021 / Published: 20 December 2021
(This article belongs to the Special Issue Recent Advances in Machine Learning and Applications)

Abstract

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.
Keywords: spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine

Share and Cite

MDPI and ACS Style

Amjad, A.; Khan, L.; Chang, H.-T. Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification. Processes 2021, 9, 2286. https://doi.org/10.3390/pr9122286

AMA Style

Amjad A, Khan L, Chang H-T. Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification. Processes. 2021; 9(12):2286. https://doi.org/10.3390/pr9122286

Chicago/Turabian Style

Amjad, Ammar, Lal Khan, and Hsien-Tsung Chang. 2021. "Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification" Processes 9, no. 12: 2286. https://doi.org/10.3390/pr9122286

APA Style

Amjad, A., Khan, L., & Chang, H.-T. (2021). Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification. Processes, 9(12), 2286. https://doi.org/10.3390/pr9122286

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop