Abstract
Corpus-based vocabulary research has had a profound impact on English language education, and there is abundant evidence that this will remain the case for the foreseeable future. Perhaps the greatest challenge of such research is the determination of what constitutes a Word for counting and analysis purposes. Decisions in this regard have important ramifications not only for the lexical findings themselves, but also for the pedagogical theories and practices that derive from them. This article surveys several fields of study in order to discuss this dilemma, with a particular focus on three problematic areas relating to computer-processed corpora: (a) morphological relationships between words, (b) homonymy and polysemy, and (c) multiword items. The article concludes with recommendations for assessing the validity of the Word construct in applied corpus-based vocabulary research.