Iwona Kraska-Szlenk,
International Journal of Corpus Linguistics;

The article focuses on the polysemy and usage patterns of the Polish lexeme głowa “head” and its diminutive główka. Based on corpus methodology and cognitive linguistics analysis, it is argued that the two lexemes are too autonomous in their meanings than predicted by their morphological relatedness. As the two words cover different semantic domains, we observe that the diminutive suffix has developed a new function which signals lexicalization of meaning toward a non-human semantic domain, for example, material objects, plants, etc. Our research contributes to studies on Polish morphology and lexical semantics and to theoretical research on the polysemy of body part terms.
International Journal of Corpus Linguistics;

Corpus research on questions as reader engagement markers in academic writing typically focuses on direct questions. Such questions are signalled by question marks and are relatively easily searchable in a corpus. However, indirect questions can be more challenging to identify, as they can be introduced by a range of forms. Based on a contrastive analysis of a corpus of English, French, and Spanish economics research articles, this paper provides pertinent evidence on direct and indirect questions as reader engagement markers. Firstly, it shows that direct and indirect questions as reader engagement markers are a rhetorical and generic feature of academic writing in the economics research article and, secondly, it presents a comprehensive list of indirect question illocutionary force indicating devices, valuable for future studies of indirect questions. Methodologically, this paper illustrates a replicable process for functional analysis and discusses the value of theoretically merging corpus and contrastive linguistic approaches.
Olav Hackstein, Ryan Sandell
International Journal of Corpus Linguistics;

This article examines the lexically parallel English and German constructions can’t stand somebody/something and jemanden/etwas nicht ausstehen können “not tolerate (someone or something)”, from synchronic, diachronic, and quantitative perspectives. Syntactic and semantic restrictions suggest that the usage of stand and ausstehen in the relevant sense is older than other semantically similar verbs (e.g. English tolerate, German leiden), while quantitative evidence from corpora shows that the can’t stand and nicht ausstehen können constructions are both colligationally stronger than lexical competitors. Evidence from the history of stand indicates that the lexeme stand in the Germanic and other Indo-European languages has a long history of being employed in the relevant sense. The restrictions on usage and the colligational strength of the respective English and German constructions are thus argued to result from the antiquity of the construction and functional competition from other lexemes.
Paul Baker, Rachelle Vessey
International Journal of Corpus Linguistics, Volume 23, pp 255-278;

Using corpus linguistics and qualitative, manual discourse analysis, this paper compares English and French extremist texts to determine how messages in different languages draw upon similar and distinct discursive themes and linguistic strategies. Findings show that both corpora focus on religion and rewards (i.e. for faith) and strongly rely on othering strategies. However, the English texts are concerned with world events whereas the French texts focus on issues specific to France. Also, while the English texts use Arabic code-switching as a form of legitimation, the French texts use a formal register and quotation from scripture in discussions of permissions, rights, obligations and laws. Finally, the English texts refer to and justify violence to a greater extent than the French texts. This paper contributes to the field of terrorism studies and the field of corpus linguistics by presenting a new approach to corpus-driven studies of discourse across more than one language.
, Cedric Krummes,
International Journal of Corpus Linguistics, Volume 20, pp 500-525;

The aim of this paper is to contribute to learner corpus research into formulaic language in native and non-native German. To this effect, a corpus of argumentative essays written by advanced British students of German (WHiG) was compared with a corpus of argumentative essays written by German native speakers (Falko-L1). A corpus-driven analysis reveals a larger number of 3-grams in WHiG than in Falko-L1, which suggests that British advanced learners of German are more likely to use formulaic language in argumentative writing than their native-speaker counterparts. Secondly, by classifying the formulaic sequences according to their functions, this study finds that native speakers of German prefer discourse-structuring devices to stance expressions, whilst British advanced learners display the opposite preferences. Thirdly, the results show that learners of German make greater use of macro-discourse-structuring devices and cautious language, whereas native speakers favour micro-discourse structuring devices and tend to use more direct language.
International Journal of Corpus Linguistics, Volume 22, pp 319-344;

This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to (i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.
, Geraldine Mark
International Journal of Corpus Linguistics, Volume 22, pp 457-489;

English Profile (EP) is an ongoing empirical exploration of learner English initiated by Cambridge University Press and Cambridge English, among others. EP aims to create a set of empirically-based descriptions of language competencies for English. ‘Reference Level Descriptors’ already exist as part of the Common European Framework of Reference (CEFR) but are intuitively derived and not designed for one specific language. The English Grammar Profile (EGP, is a sub-project of EP which aims to profile learner competence in grammar. This paper details the rationale for the study and the methodology that was developed to investigate the Cambridge Learner Corpus to arrive at over 1,200 grammatical competence statements. Key findings which link to existing corpus-based second language acquisition work are also presented.
, , Helen Baker
International Journal of Corpus Linguistics, Volume 24, pp 413-444;

This article introduces a methodology for the diachronic analysis of large historical corpora, Usage Fluctuation Analysis (UFA). UFA looks at the fluctuation of the usage of a word as observed through collocation. It presupposes neither a commitment to a specific semantic theory, nor that the results will focus solely on semantics. We focus, rather, upon a word’s usage. UFA considers large amounts of evidence about usage, through time, as made available by historical corpora, displaying fluctuation in word usage in the form of a graph. The paper provides guidelines for the interpretation of UFA graphs and provides three short case studies applying the technique to (i) the analysis of the word its and (ii) two words related to social actors, whore and harlot. These case studies relate UFA to prior, labour intensive, corpus and historical analyses. They also highlight the novel observations that the technique affords.
International Journal of Corpus Linguistics, Volume 27, pp 259-290;

This paper presents evidence from both corpora and agent-based simulation for the effect of lectal contamination. By doing so, it shows how agent-based simulation can be used as a complementary technique to corpus research in the study of language variation. Lectal contamination is an effect whereby the words that are typical of a language variety more often appear in a morphosyntactic variant typical of that same variety, even among language use from a different variety. This study looks at the Dutch partitive genitive construction, which exhibits variation between a “Netherlandic” variant with -s ending and a “Belgian” variant without -s ending. It is shown that the probability of the Belgian variant without -s increases among more “Belgian” words, in the language use of both Belgians and people from the Netherlands. Meanwhile, an agent-based simulation reveals the crucial theoretical preconditions that lead to this effect.
