Understanding Wordscores
- 1 January 2008
- journal article
- Published by Cambridge University Press (CUP) in Political Analysis
- Vol. 16 (4), 356-371
- https://doi.org/10.1093/pan/mpn004
Abstract
Wordscores is a widely used procedure for inferring policy positions, or scores, for new documents on the basis of scores for words derived from documents with known scores. It is computationally straightforward, requires no distributional assumptions, but has unresolved practical and theoretical problems. In applications, estimated document scores are on the wrong scale and the theoretical development does not specify a statistical model, so it is unclear what assumptions the method makes about political text and how to tell whether they fit particular text analysis applications. The first part of the paper demonstrates that badly scaled document score estimates reflect deeper problems with the method. The second part shows how to understand Wordscores as an approximation to correspondence analysis which itself approximates a statistical ideal point model for words. Problems with the method are identified with the conditions under which these layers of approximation fail to ensure consistent and unbiased estimation of the parameters of the ideal point model.Keywords
This publication has 16 references indexed in Scilit:
- A Scaling Model for Estimating Time‐Series Party Positions from TextsAmerican Journal of Political Science, 2008
- Estimating policy positions using political texts: An evaluation of the Wordscores approachElectoral Studies, 2007
- Compared to What? A Comment on “A Robust Transformation Procedure for Interpreting Political Text” by Martin and VanbergPolitical Analysis, 2007
- Estimating Irish party policy positions using computer wordscoring: the 2002 election – a research noteIrish Political Studies, 2003
- Using Principal Component Analysis and Correspondence Analysis for Estimation in Latent Variable ModelsJournal of the American Statistical Association, 2000
- Weighted averaging, logistic regression and the Gaussian response modelPlant Ecology, 1986
- Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response ModelBiometrics, 1985
- Correspondence Analysis: A Neglected Multivariate MethodJournal of the Royal Statistical Society Series C: Applied Statistics, 1974
- Reciprocal Averaging: An Eigenvector Method of OrdinationJournal of Ecology, 1973
- Structure Formelle des Textes et CommunicationWORD, 1954