The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music
- 1 August 2007
- journal article
- Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America
- Vol. 122 (2), 881-891
- https://doi.org/10.1121/1.2750160
Abstract
The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.This publication has 15 references indexed in Scilit:
- A scale-free distribution of false positives for a large class of audio similarity measuresPattern Recognition, 2008
- The influence of polyphony on the dynamical modelling of musical timbrePattern Recognition Letters, 2007
- Mechanisms for Allocating Auditory Attention: An Auditory Saliency MapCurrent Biology, 2005
- Popular music access: The Sony music browserJournal of the American Society for Information Science and Technology, 2004
- Comparison of techniques for environmental sound recognitionPattern Recognition Letters, 2003
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- MPEG-7 sound-recognition toolsIEEE Transactions on Circuits and Systems for Video Technology, 2001
- Automatic Classification of Environmental Noise Events by Hidden Markov ModelsApplied Acoustics, 1998
- Automatic noise source recognitionThe Journal of the Acoustical Society of America, 1998
- Common factors in the identification of an assortment of brief everyday sounds.Journal of Experimental Psychology: Human Perception and Performance, 1993