Librispeech: An ASR corpus based on public domain audio books

Top Cited Papers

1 April 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 5206-5210
https://doi.org/10.1109/icassp.2015.7178964

Abstract

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

Keywords

This publication has 12 references indexed in Scilit:

Improving deep neural network acoustic models using generalized maxout networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Joint-sequence models for grapheme-to-phoneme conversion
Speech Communication, 2008
A compact model for speaker-adaptive training
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Normalization of non-standard words
Computer Speech & Language, 2001
Semi-tied covariance matrices for hidden Markov models
IEEE Transactions on Speech and Audio Processing, 1999
An empirical study of smoothing techniques for language modeling
Published by Association for Computational Linguistics (ACL) ,1996
The design for the wall street journal-based CSR corpus
Published by Association for Computational Linguistics (ACL) ,1992
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression
IEEE Transactions on Information Theory, 1991
Identification of common molecular subsequences
Journal of Molecular Biology, 1981
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980

Cited by 1937 articles