Librispeech: An ASR corpus based on public domain audio books
Top Cited Papers
- 1 April 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 5206-5210
- https://doi.org/10.1109/icassp.2015.7178964
Abstract
This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.Keywords
This publication has 12 references indexed in Scilit:
- Improving deep neural network acoustic models using generalized maxout networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Joint-sequence models for grapheme-to-phoneme conversionSpeech Communication, 2008
- A compact model for speaker-adaptive trainingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Normalization of non-standard wordsComputer Speech & Language, 2001
- Semi-tied covariance matrices for hidden Markov modelsIEEE Transactions on Speech and Audio Processing, 1999
- An empirical study of smoothing techniques for language modelingPublished by Association for Computational Linguistics (ACL) ,1996
- The design for the wall street journal-based CSR corpusPublished by Association for Computational Linguistics (ACL) ,1992
- The zero-frequency problem: estimating the probabilities of novel events in adaptive text compressionIEEE Transactions on Information Theory, 1991
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentencesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980