DeepEar
- 7 September 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 283-294
- https://doi.org/10.1145/2750858.2804262
Abstract
Microphones are remarkably powerful sensors of human behavior and context. However, audio sensing is highly susceptible to wild fluctuations in accuracy when used in diverse acoustic environments (such as, bedrooms, vehicles, or cafes), that users encounter on a daily basis. Towards addressing this challenge, we turn to the field of deep learning; an area of machine learning that has radically changed related audio modeling domains like speech recognition. In this paper, we present DeepEar -- the first mobile audio sensing framework built from coupled Deep Neural Networks (DNNs) that simultaneously perform common audio sensing tasks. We train DeepEar with a large-scale dataset including unlabeled data from 168 place visits. The resulting learned model, involving 2.3M parameters, enables DeepEar to significantly increase inference robustness to background noise beyond conventional approaches present in mobile devices. Finally, we show DeepEar is feasible for smartphones by building a cloud-free DSP-based prototype that runs continuously, using only 6% of the smartphone's battery daily.Keywords
This publication has 53 references indexed in Scilit:
- Feature learning and deep architectures: new directions for music informaticsJournal of Intelligent Information Systems, 2013
- SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile PhonesLecture Notes in Computer Science, 2011
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Musical genre classification of audio signalsIEEE Transactions on Speech and Audio Processing, 2002
- Training Products of Experts by Minimizing Contrastive DivergenceNeural Computation, 2002
- Comparison of different implementations of MFCCJournal of Computer Science and Technology, 2001
- Maximum likelihood linear transformations for HMM-based speech recognitionComputer Speech & Language, 1998
- Robust text-independent speaker identification using Gaussian mixture speaker modelsIEEE Transactions on Speech and Audio Processing, 1995
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chainsIEEE Transactions on Speech and Audio Processing, 1994
- Perceptual linear predictive (PLP) analysis of speechThe Journal of the Acoustical Society of America, 1990