Dynamic Vocabulary Adaptation for a daily and real-time Broadcast News Transcription System

The daily and real-time transcription of broadcast news (BN) is a challenging task both in acoustic and in language modeling. To achieve optimal performance, several problems have to be overcome. Particularly, when transcribing BN data in highly inflected languages, the vocabulary growth leads to high OOV word rates. To address this problem, we propose a daily vocabulary and LM adaptation framework which directly extracts new words based on contemporary written news available on the Internet and some linguistic knowledge about the words found on those news. Experiments have been carried out for a European Portuguese BN transcription system. Preliminary results computed on 7 shows, yields a relative reduction of 61% in OOV and 2.1% in WER.

This publication has 2 references indexed in Scilit: