SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
Top Cited Papers
Open Access
- 2 June 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 5 (6), e10729
- https://doi.org/10.1371/journal.pone.0010729
Abstract
Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to. Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.Keywords
This publication has 18 references indexed in Scilit:
- SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitlesBehavior Research Methods, 2010
- Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American EnglishBehavior Research Methods, 2009
- Chinese word segmentation and statistical machine translationACM Transactions on Speech and Language Processing, 2008
- Reading spaced and unspaced Chinese text: Evidence from eye movements.Journal of Experimental Psychology: Human Perception and Performance, 2008
- The use of film subtitles to estimate word frequenciesApplied Psycholinguistics, 2007
- The English Lexicon ProjectBehavior Research Methods, 2007
- Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision TimesPsychological Science, 2006
- Evolution and present situation of corpus research in ChinaCorpus Studies of Language Through Time, 2006
- Frequency effects in the processing of Chinese inflectionJournal of Memory and Language, 2006
- The time course of graphic, phonological, and semantic activation in Chinese character identification.Journal of Experimental Psychology: Learning, Memory, and Cognition, 1998