Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks
- 1 December 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding
Abstract
Research in emotion recognition seeks to develop insights into the temporal properties of emotion. However, automatic emotion recognition from spontaneous speech is challenging due to non-ideal recording conditions and highly ambiguous ground truth labels. Further, emotion recognition systems typically work with noisy high-dimensional data, rendering it difficult to find representative features and train an effective classifier. We tackle this problem by using Deep Belief Networks, which can model complex and non-linear high-level relationships between low-level features. We propose and evaluate a suite of hybrid classifiers based on Hidden Markov Models and Deep Belief Networks. We achieve state-of-the-art results on FAU Aibo, a benchmark dataset in emotion recognition [1]. Our work provides insights into important similarities and differences between speech and emotion.Keywords
This publication has 20 references indexed in Scilit:
- Multiple windowed spectral features for emotion recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research GroupsIEEE Signal Processing Magazine, 2012
- Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challengeSpeech Communication, 2011
- Learning emotion-based acoustic features with deep belief networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Deep and Wide: Multiple Layers in Automatic Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing, 2011
- Acoustic Modeling Using Deep Belief NetworksIEEE Transactions on Audio, Speech, and Language Processing, 2011
- An overview of text-independent speaker recognition: From features to supervectorsSpeech Communication, 2010
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representationsPublished by Association for Computing Machinery (ACM) ,2009
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Training Products of Experts by Minimizing Contrastive DivergenceNeural Computation, 2002