Joint Audio-Visual Words for Violent Scenes Detection in Movies
- 1 April 2014
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of International Conference on Multimedia Retrieval
Abstract
This paper presents an audio-visual data representation for violent scenes detection in movies. Existing works in this field consider either the audio or the visual information; or their shallow fusion. None has yet explored their joint dependence for violent scenes detection. We propose a feature which provides strong multi-modal audio and visual cues by first joining the audio and the visual features and then revealing statistically the joint multi-modal patterns. Experimental validation was conducted in the context of the Violent Scenes Detection task of the MediaEval 2013 Multimedia benchmark. The obtained results show the potential of the proposed approach in comparison to methods using audio and visual features separately and other fusion methods.Keywords
This publication has 14 references indexed in Scilit:
- Joint audio-visual bi-modal codewords for video event detectionPublished by Association for Computing Machinery (ACM) ,2012
- Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel LearningLecture Notes in Computer Science, 2012
- Audio-Visual Fusion for Detecting Violent Scenes in VideosLecture Notes in Computer Science, 2010
- Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-trainingLecture Notes in Computer Science, 2009
- Detecting Violent Scenes in Movies by Auditory and Visual CuesLecture Notes in Computer Science, 2008
- Audio-Visual Event Recognition in Surveillance Video SequencesIEEE Transactions on Multimedia, 2007
- Violence Content Classification Using Audio FeaturesLecture Notes in Computer Science, 2006
- Early versus late fusion in semantic video analysisPublished by Association for Computing Machinery (ACM) ,2005
- On Space-Time Interest PointsInternational Journal of Computer Vision, 2005
- A graphical model for audiovisual object trackingIEEE Transactions on Pattern Analysis and Machine Intelligence, 2003