Understanding Wordscores

1 January 2008

journal article
Published by Cambridge University Press (CUP) in Political Analysis

Vol. 16 (4), 356-371
https://doi.org/10.1093/pan/mpn004

Abstract

Wordscores is a widely used procedure for inferring policy positions, or scores, for new documents on the basis of scores for words derived from documents with known scores. It is computationally straightforward, requires no distributional assumptions, but has unresolved practical and theoretical problems. In applications, estimated document scores are on the wrong scale and the theoretical development does not specify a statistical model, so it is unclear what assumptions the method makes about political text and how to tell whether they fit particular text analysis applications. The first part of the paper demonstrates that badly scaled document score estimates reflect deeper problems with the method. The second part shows how to understand Wordscores as an approximation to correspondence analysis which itself approximates a statistical ideal point model for words. Problems with the method are identified with the conditions under which these layers of approximation fail to ensure consistent and unbiased estimation of the parameters of the ideal point model.

Keywords

This publication has 16 references indexed in Scilit:

A Scaling Model for Estimating Time‐Series Party Positions from Texts
American Journal of Political Science, 2008
Estimating policy positions using political texts: An evaluation of the Wordscores approach
Electoral Studies, 2007
Compared to What? A Comment on “A Robust Transformation Procedure for Interpreting Political Text” by Martin and Vanberg
Political Analysis, 2007
Estimating Irish party policy positions using computer wordscoring: the 2002 election – a research note
Irish Political Studies, 2003
Using Principal Component Analysis and Correspondence Analysis for Estimation in Latent Variable Models
Journal of the American Statistical Association, 2000
Weighted averaging, logistic regression and the Gaussian response model
Plant Ecology, 1986
Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response Model
Biometrics, 1985
Correspondence Analysis: A Neglected Multivariate Method
Journal of the Royal Statistical Society Series C: Applied Statistics, 1974
Reciprocal Averaging: An Eigenvector Method of Ordination
Journal of Ecology, 1973
Structure Formelle des Textes et Communication
WORD, 1954

Cited by 121 articles