A Bayesian framework for video affective representation

Abstract
Emotions that are elicited in response to a video scene contain valuable information for multimedia tagging and indexing. The novelty of this paper is to introduce a Bayesian classification framework for affective video tagging that allows taking contextual information into account. A set of 21 full length movies was first segmented and informative content-based features were extracted from each shot and scene. Shots were then emotionally annotated, providing ground truth affect. The arousal of shots was computed using a linear regression on the content-based features. Bayesian classification based on the shots arousal and content-based features allowed tagging these scenes into three affective classes, namely calm, positive excited and negative excited. To improve classification accuracy, two contextual priors have been proposed: the movie genre prior, and the temporal dimension prior consisting of the probability of transition between emotions in consecutive scenes. The f1 classification measure of 54.9% that was obtained on three emotional classes with a nai¿ve Bayes classifier was improved to 63.4% after utilizing all the priors.

This publication has 15 references indexed in Scilit: