Emotion recognition from speech with StarGAN and Dense‐DCNN

Open Access

14 September 2021

journal article
research article
Published by Institution of Engineering and Technology (IET) in IET Signal Processing

Vol. 16 (1), 62-79
https://doi.org/10.1049/sil2.12078

Abstract

Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required for training is extremely long, which makes it difficult to ensure efficient classification. The proposed network Dense-DCNN, combined with StarGAN, can address this issue. StarGAN is used to generate numerous Log-Mel spectra with related emotions and extract high-dimensional features through the Dense-DCNN to achieve a high-precision classification. The classification accuracy for all the data sets was more than 90%. Simultaneously, DenseNet's layer jump connection can speed up the classification process, thereby improving efficiency. The experimental verification shows that our model not only has good generalisation ability but also exhibits good robustness in multiscene and multinoise environments, thereby showing potential for application in medical and social education industries.

This publication has 40 references indexed in Scilit:

Recognition of fear from speech using adaptive algorithm with MLP classifier
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Recognizing Emotions Induced by Affective Sounds through Heart Rate Variability
IEEE Transactions on Affective Computing, 2015
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
IEEE Transactions on Multimedia, 2014
Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators
IEEE Transactions on Affective Computing, 2014
Deep learning for robust feature generation in audiovisual emotion recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Understanding how Deep Belief Networks perform acoustic modelling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection Algorithm
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Speech emotion recognition using hidden Markov models
Speech Communication, 2003

Cited by 3 articles