An automatic part-of-speech tagger for Middle Low German
- 21 July 2017
- journal article
- Published by John Benjamins Publishing Company in Corpus Studies of Language Through Time
- Vol. 22 (1), 107-140
- https://doi.org/10.1075/ijcl.22.1.05kol
Abstract
Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.Keywords
This publication has 15 references indexed in Scilit:
- Multimodular Text Normalization of Dutch User-Generated ContentACM Transactions on Intelligent Systems and Technology, 2016
- Part-of-Speech Tagging for Historical EnglishPublished by Association for Computational Linguistics (ACL) ,2016
- Annotierte Korpora für die Historische Syntaxforschung: Anwendungsbeispiele anhand des Referenzkorpus MittelhochdeutschZeitschrift für Germanistische Linguistik, 2015
- Parsing early and late modern English corporaDigital Scholarship in the Humanities, 2014
- Das Referenzkorpus: Neue Perspektiven für die mittelniederdeutsche GrammatikographieJahrbuch für Germanistische Sprachgeschichte, 2014
- CorA: A web-based annotation tool for historical and other non-standard language dataPublished by Association for Computational Linguistics (ACL) ,2014
- A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical TextPublished by Association for Computational Linguistics (ACL) ,2014
- Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and ChangePublished by Springer Science and Business Media LLC ,2011
- Der ,Atlas spätmittelalterlicher Schreibsprachen des niederdeutschen Altlandes und angrenzender Gebiete‘ (ASnA)Published by Walter de Gruyter GmbH ,2007
- Forms of Language Contact in the Area of the Hanseatic League: Dialect Contact Phenomena and SemicommunicationNordic Journal of Linguistics, 1996