Automated extraction of Tree-Adjoining Grammars from treebanks

22 December 2005

journal article
research article
Published by Cambridge University Press (CUP) in Natural Language Engineering

Vol. 12 (3), 251-299
https://doi.org/10.1017/s1351324905003943

Abstract

There has been a contemporary surge of interest in the application of stochastic models of parsing. The use of tree-adjoining grammar (TAG) in this domain has been relatively limited due in part to the unavailability, until recently, of large-scale corpora hand-annotated with TAG structures. Our goals are to develop inexpensive means of generating such corpora and to demonstrate their applicability to stochastic modeling. We present a method for automatically extracting a linguistically plausible TAG from the Penn Treebank. Furthermore, we also introduce labor-inexpensive methods for inducing higher-level organization of TAGs. Empirically, we perform an evaluation of various automatically extracted TAGs and also demonstrate how our induced higher-level organization of TAGs can be used for smoothing stochastic TAG models.

Keywords

Cited by 8 articles