Abstract
Summary form only given. A typical text-based natural language application (eg. machine translation, summarization, information extraction) consists of a pipeline of preprocessing steps such as tokenization, stemming, part-of-speech tagging, named entity detection, chunking, parsing. Information flows downstream through the preprocessing steps along a narrow pipe: each step transforms a single input string into a single best solution string. However, this narrow pipe is limiting for two reasons: First, since each of the preprocessing steps are erroneous, producing a single best solution could magnify the error propogation down the pipeline. Second, the preprocessing steps are forced to resolve genuine ambiguity prematurely. While the widening of the pipeline can potentially benefit text-based language applications, it is imperative for spoken language processing where the output from the speech recognizer is typically a word lattice/graph. In this talk, we review how such a goal has been accomplished in tasks such as spoken language understanding, speech translation and multimodal language processing. We will also sketch methods that encode the preprocessing steps as finite-state transductions in order to exploit composition of finite-state transducers as a general constraint propogation method.