DeepEar

7 September 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 283-294
https://doi.org/10.1145/2750858.2804262

Abstract

Microphones are remarkably powerful sensors of human behavior and context. However, audio sensing is highly susceptible to wild fluctuations in accuracy when used in diverse acoustic environments (such as, bedrooms, vehicles, or cafes), that users encounter on a daily basis. Towards addressing this challenge, we turn to the field of deep learning; an area of machine learning that has radically changed related audio modeling domains like speech recognition. In this paper, we present DeepEar -- the first mobile audio sensing framework built from coupled Deep Neural Networks (DNNs) that simultaneously perform common audio sensing tasks. We train DeepEar with a large-scale dataset including unlabeled data from 168 place visits. The resulting learned model, involving 2.3M parameters, enables DeepEar to significantly increase inference robustness to background noise beyond conventional approaches present in mobile devices. Finally, we show DeepEar is feasible for smartphones by building a cloud-free DSP-based prototype that runs continuously, using only 6% of the smartphone's battery daily.

Keywords

This publication has 53 references indexed in Scilit:

Feature learning and deep architectures: new directions for music informatics
Journal of Intelligent Information Systems, 2013
SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile Phones
Lecture Notes in Computer Science, 2011
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Musical genre classification of audio signals
IEEE Transactions on Speech and Audio Processing, 2002
Training Products of Experts by Minimizing Contrastive Divergence
Neural Computation, 2002
Comparison of different implementations of MFCC
Journal of Computer Science and Technology, 2001
Maximum likelihood linear transformations for HMM-based speech recognition
Computer Speech & Language, 1998
Robust text-independent speaker identification using Gaussian mixture speaker models
IEEE Transactions on Speech and Audio Processing, 1995
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
IEEE Transactions on Speech and Audio Processing, 1994
Perceptual linear predictive (PLP) analysis of speech
The Journal of the Acoustical Society of America, 1990

Cited by 235 articles