International Journal of Corpus Linguistics
ISSN / EISSN: 13846655 / 15699811
Published by: John Benjamins Publishing Company
Total articles ≅ 708
Latest articles in this journal
Published: 20 September 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.22018.sti
This article introduces Corpus PalaeoHibernicum (CorPH), a corpus currently consisting of 78 texts in Early Irish (c. 7th–10th cent.) created by the ERC-funded Chronologicon Hibernicum (ChronHib) project by bringing together pre-existing lexical and syntactic databases and adding further crucial texts from the period. In addition to being annotated for POS, morphological and syntactic information, another layer of annotation has been developed for CorPH – ‘Variation Tagging’, i.e. a tagset that numerically encodes synchronic language variation during the Early Irish period, thus allowing for much improved research on the chronological variation among the material. Another new pillar of studying linguistic variation is Bayesian Language Variation Analysis (BLaVA), in order to address the challenge that “not-so-big data” poses to statistical corpus methods. Instead of reflecting feature frequencies, BLaVA models language variation as probabilities of variation.
Published: 9 September 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.22016.ale
The ways in which politicians have discussed who, what, and where was considered “uncivilized’” across the past two centuries gives an insight into how speakers in a position of authority classified and constructed the world around them, and how those in power in Britain see the country and themselves. This article uses the Hansard Corpus 1803–2003 of speeches in the UK Parliament alongside data from the Historical Thesaurus of English to analyse diachronic variation in usage of words for persons, places and practices considered uncivil. It proposes new methods and offers quantitative data to describe the period’s shift in political attitudes towards not just the so-called “uncivil” but also the country as a whole.
Published: 6 September 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.20055.kim
This study examines usage changes of English-based loanwords and Korean replacement words promoted by the National Institute of Korean Language in a six-year span, using two corpora. It focuses on 18 Korean and anglicized word pairs appearing on the National Institute of Korean Language’s website that purportedly showcase the Institute’s successful efforts to curtail the usage of English words by promoting Korean replacement words. The results indicate that promoting Korean does not necessarily decrease the usage of English, and that the usage of English-based words seems to increase in conjunction with the Korean words. Several Korean words promoted by the National Institute of Korean Language have extremely low frequencies, and some loanwords are being used with various meanings. Commentaries are provided to explain various patterns of observed usage change.
Published: 6 September 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.22005.fit
This paper demonstrates the value of studying co-occurrence ‘quads’ – constellations of four non-adjacent lemmas that consistently co-occur across spans of up to 100 tokens – for understanding discursive change. We map meaning onto quads as ‘discursive concepts’, which encompass encyclopaedic semantics, pragmatics, and context. We investigate a high-frequency quad with high co-occurrence strength in EEBO-TCP: world-heaven-earth-power. We conduct semantic and pragmatic analysis to generate hypotheses regarding discursive change. The quad’s components are semantically underspecified; thus, although the quad indicates a discursive concept, each instantiation of the quad is variable, contingent, and dependent upon context and pragmatic processes for interpretation. We observe how the vague lexemes that constitute building blocks of religious discourse are employed to generate new, timely secular discourses; and we argue that semantic underspecification is the site and source of discursive change. Indeed, the volatile, unstable nature of the component lexical meanings renders them indispensable to early modern debate.
Published: 29 August 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.22011.cla
This paper applies a new approach to the identification of discourses, based on Multiple Correspondence Analysis (MCA), to the study of discourse variation over time. The MCA approach to keywords deals with a major issue with the use of keywords to identify discourses: the allocation of individual keywords to multiple discourses. Yet, as this paper demonstrates, the approach also allows us to observe variation in the prevalence of discourses over time. The MCA approach to keywords allows the allocation of individual texts to multiple discourses based on patterns of keyword co-occurrence. Metadata in the corpus data analysed (here, UK newspaper articles about Islam) can then be used to map those discourses over time, resulting in a clear view of how the discourses vary relative to one another as time progresses. The paper argues that the drivers for these fluctuations are language external; the real-world events reported on in the newspapers.
Published: 23 August 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.20177.lii
This paper explores variation in lexico-grammatical register features across text lengths in a large-scale sample of Reddit comments. Very short texts are known to be problematic for many statistical methods, so understanding their nature is important for the corpus-linguistic study of social media, where most contributions are short. I show that the frequencies of linguistic features change with comment length, even between longer comments, although longer texts are often considered similar in statistical terms. Moreover, I classify the variation found between short comments of different lengths into two main patterns, although other patterns can also be found, and there is variation even within these patterns. Furthermore, I interpret the observed differences in terms of register variation. For example, shorter comments appear to be more casual and less edited in terms of their feature makeup, whereas narrative and informational registers seem to favor longer comments.
Published: 19 August 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.22014.rod
This paper tracks stylistic variation in the use of two roughly synonymous suffixes, the Romance -ity and the native -ness, during the Early Modern English period. We seek to verify from a statistical viewpoint the claims of Rodríguez-Puente (2020), who reports on a decrease of -ness in favour of -ity in registers representative of the speech-written and formal-informal continua at that time. To this end, we develop new methods of statistical and visual analysis that enable diachronic comparisons of competing processes across subcorpora, building upon an earlier method by Säily and Suomela (2009). Our results confirm that -ity gained ground first in written registers and then spread towards speech-related registers, and we are able to time this change more accurately thanks to a novel periodisation. We also provide strong statistical support indicating that the proportion of -ity was significantly higher in legal registers than in other registers.
Published: 8 August 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.20165.ver
The aims of this paper are to detect the most problematic issues related to dialogue act annotation in speech corpora and to define basic categories of dialogue acts. I critically examine and test generic schemes that represent different lines of dialogue act annotation: AMI, DART, ISO 24617–2 and SWBD-DAMSL. It is found that the most problematic issues regarding dialogue act annotation are related to the distinction between the semantic and pragmatic meanings of utterances, the annotation of metadiscourse, and the adequacy and informativeness of the tagset. The identified basic dialogue act categories are information providing, information seeking, actions, social acts and metadiscourse. The findings help improve dialogue act annotation.
Published: 21 July 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.20074.kra
The article focuses on the polysemy and usage patterns of the Polish lexeme głowa “head” and its diminutive główka. Based on corpus methodology and cognitive linguistics analysis, it is argued that the two lexemes are too autonomous in their meanings than predicted by their morphological relatedness. As the two words cover different semantic domains, we observe that the diminutive suffix has developed a new function which signals lexicalization of meaning toward a non-human semantic domain, for example, material objects, plants, etc. Our research contributes to studies on Polish morphology and lexical semantics and to theoretical research on the polysemy of body part terms.
Published: 18 July 2022
International Journal of Corpus Linguistics; https://doi.org/10.1075/ijcl.20065.cur
Corpus research on questions as reader engagement markers in academic writing typically focuses on direct questions. Such questions are signalled by question marks and are relatively easily searchable in a corpus. However, indirect questions can be more challenging to identify, as they can be introduced by a range of forms. Based on a contrastive analysis of a corpus of English, French, and Spanish economics research articles, this paper provides pertinent evidence on direct and indirect questions as reader engagement markers. Firstly, it shows that direct and indirect questions as reader engagement markers are a rhetorical and generic feature of academic writing in the economics research article and, secondly, it presents a comprehensive list of indirect question illocutionary force indicating devices, valuable for future studies of indirect questions. Methodologically, this paper illustrates a replicable process for functional analysis and discusses the value of theoretically merging corpus and contrastive linguistic approaches.