An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

1 December 2013

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 114-117
https://doi.org/10.1109/ism.2013.27

Abstract

Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.

Keywords

This publication has 9 references indexed in Scilit:

There is no data like less data
Published by Association for Computing Machinery (ACM) ,2012
Compact audio representation for event detection in consumer media
Published by International Speech Communication Association ,2012
How to put it into words - using random forests to extract symbol level descriptions from audio content for concept detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Acoustic super models for large scale video event detection
Published by Association for Computing Machinery (ACM) ,2011
Language recognition in ivectors space
Published by International Speech Communication Association ,2011
Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
Published by International Speech Communication Association ,2009
Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
ALIZE, a free toolkit for speaker recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005

Cited by 11 articles