CASSANDRA: audio-video sensor fusion for aggression detection

1 September 2007

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 200-205
https://doi.org/10.1109/avss.2007.4425310

Abstract

This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: "scream", "passing train" or "articulation energy". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.

Keywords

This publication has 9 references indexed in Scilit:

Computational Auditory Scene Analysis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Improved adaptive Gaussian mixture model for background subtraction
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Person-on-person violence detection in video data
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
AN INTRODUCTION TO HIDDEN MARKOV MODELS AND BAYESIAN NETWORKS
International Journal of Pattern Recognition and Artificial Intelligence, 2001
The Visual Analysis of Human Movement: A Survey
Computer Vision and Image Understanding, 1999
Smart Rooms
Scientific American, 1996
The Lombard reflex and its role on human listeners and automatic speech recognizers
The Journal of the Acoustical Society of America, 1993
Speech analysis/Synthesis based on a sinusoidal representation
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
Vocal affect expression: A review and a model for future research.
Psychological Bulletin, 1986

Cited by 80 articles