Deep Learning and Music Adversaries

10 September 2015

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia

Vol. 17 (11), 2059-2071
https://doi.org/10.1109/tmm.2015.2478068

Abstract

An adversary is an agent designed to make a classification system perform in some particular way, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, exploiting the parameters of the system to find the minimal perturbation of the input image such that the system misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the system inputs are magnitude spectral frames, which require special care in order to produce valid input audio signals from network- derived perturbations . For two different train-test partitionings of two benchmark datasets, and two different architectures , we find that this adversary is very effective. We find that convolutional architectures are more robust compared to systems based on a majority vote over individually classified audio frames. Furthermore , we experiment with a new system that integrates an adversary into the training loop, but do not find that this improves the resilience of the system to new adversaries.

Keywords

Funding Information

Danish Council for Strategic Research of the Danish Agency for Science Technology and Innovation (11-115328)

This publication has 38 references indexed in Scilit:

The neglected user in music information retrieval research
Journal of Intelligent Information Systems, 2013
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems, 2013
Feature learning and deep architectures: new directions for music informatics
Journal of Intelligent Information Systems, 2013
Automatic Tagging of Audio
Published by IGI Global ,2010
IRLbot
ACM Transactions on the Web, 2009
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2009
Concatenative sound synthesis: The early years
Journal of New Music Research, 2006
Distortion discriminant analysis for audio fingerprinting
IEEE Transactions on Speech and Audio Processing, 2003
Musical genre classification of audio signals
IEEE Transactions on Speech and Audio Processing, 2002
Signal estimation from modified short-time Fourier transform
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984

Cited by 74 articles