CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset

Abstract

People convey their emotional state in their face and voice. We present an audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception. The dataset consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnicbackgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels andreal-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9, 58.2 and 63.6 percent respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest forvisual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger andhappiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.

Keywords

Funding Information

NIH (R01-MH060722)
NIH (R01 MH084856)

This publication has 41 references indexed in Scilit:

Multisensory emotions: perception, combination and underlying neural processes
Progress in Neurobiology, 2012
Supramodal Representation of Emotions
Journal of Neuroscience, 2011
Incongruence effects in crossmodal emotional integration
NeuroImage, 2011
Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations
Proceedings of the National Academy of Sciences, 2010
IEMOCAP: interactive emotional dyadic motion capture database
Language Resources and Evaluation, 2008
Validation of affective and neutral sentence content for prosodic testing
Behavior Research Methods, 2008
Analysis of the glottal excitation of emotionally styled and stressed speech
The Journal of the Acoustical Society of America, 1995
An argument for basic emotions
Cognition and Emotion, 1992
Facial Expressions of Emotion: New Findings, New Questions
Psychological Science, 1992
Estimating the Reliability, Systematic Error and Random Error of Interval Data
Educational and Psychological Measurement, 1970

Cited by 296 articles