Speech emotion recognition based on listener-dependent emotion perception models
Open Access
- 20 April 2021
- journal article
- research article
- Published by Now Publishers in APSIPA Transactions on Signal and Information Processing
- Vol. 10 (1)
- https://doi.org/10.1017/atsip.2021.7
Abstract
This paper presents a novel speech emotion recognition scheme that leverages the individuality of emotion perception. Most conventional methods simply poll multiple listeners and directly model the majority decision as the perceived emotion. However, emotion perception varies with the listener, which forces the conventional methods with their single models to create complex mixtures of emotion perception criteria. In order to mitigate this problem, we propose a majority-voted emotion recognition framework that constructs listener-dependent (LD) emotion recognition models. The LD model can estimate not only listener-wise perceived emotion, but also majority decision by averaging the outputs of the multiple LD models. Three LD models, fine-tuning, auxiliary input, and sub-layer weighting, are introduced, all of which are inspired by successful domain-adaptation frameworks in various speech processing tasks. Experiments on two emotional speech datasets demonstrate that the proposed approach outperforms the conventional emotion recognition frameworks in not only majority-voted but also listener-wise perceived emotion recognition.Keywords
This publication has 41 references indexed in Scilit:
- Component Tying for Mixture Model Adaptation in Personalization of Music Emotion RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017
- Automatic speech emotion recognition using recurrent neural networks with local attentionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion RecognitionPublished by International Speech Communication Association ,2016
- Speaker variability in speech based emotion models - Analysis and normalisationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Speaker-Characterized Emotion Recognition using Online and Iterative Speaker AdaptationCognitive Computation, 2012
- Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challengeSpeech Communication, 2011
- Comparison of emotion perception among different culturesAcoustical Science and Technology, 2010
- IEMOCAP: interactive emotional dyadic motion capture databaseLanguage Resources and Evaluation, 2008