The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music

1 August 2007

journal article
Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America

Vol. 122 (2), 881-891
https://doi.org/10.1121/1.2750160

Abstract

The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.

This publication has 15 references indexed in Scilit:

A scale-free distribution of false positives for a large class of audio similarity measures
Pattern Recognition, 2008
The influence of polyphony on the dynamical modelling of musical timbre
Pattern Recognition Letters, 2007
Mechanisms for Allocating Auditory Attention: An Auditory Saliency Map
Current Biology, 2005
Popular music access: The Sony music browser
Journal of the American Society for Information Science and Technology, 2004
Comparison of techniques for environmental sound recognition
Pattern Recognition Letters, 2003
Machine learning in automated text categorization
ACM Computing Surveys, 2002
MPEG-7 sound-recognition tools
IEEE Transactions on Circuits and Systems for Video Technology, 2001
Automatic Classification of Environmental Noise Events by Hidden Markov Models
Applied Acoustics, 1998
Automatic noise source recognition
The Journal of the Acoustical Society of America, 1998
Common factors in the identification of an assortment of brief everyday sounds.
Journal of Experimental Psychology: Human Perception and Performance, 1993

Cited by 159 articles