Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Open Access

3 September 2022

journal article
advances in-methodology
Published by Leibniz Institute for Psychology (ZPID) in Measurement Instruments for the Social Sciences

Vol. 4 (1), 1-20
https://doi.org/10.1186/s42409-022-00039-w

Abstract

International large-scale assessments (LSAs), such as the Programme for International Student Assessment (PISA), provide essential information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of the distributions of these cognitive domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect on the conceptual foundations of analytical choices in LSA studies. This article discusses the methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons and (5) trend estimation. This article’s primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.

Keywords

Funding Information

IPN – Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik an der Universität Kiel

This publication has 100 references indexed in Scilit:

Measurement, Sampling, and Equating Errors in Large‐Scale Assessments
Educational Measurement: Issues and Practice, 2010
Estimation of a four‐parameter item response theory model
British Journal of Mathematical and Statistical Psychology, 2010
An NCME Instructional Module on Booklet Designs in Large‐Scale Assessments of Student Achievement: Theory and Practice
Educational Measurement: Issues and Practice, 2009
To Model or Not To Model? Competing Modes of Inference for Finite Population Sampling
Journal of the American Statistical Association, 2004
FOCUS ARTICLE: The Foundations of Assessment
Measurement: Interdisciplinary Research and Perspectives, 2003
On the Reliability of Testlet‐Based Tests
Journal of Educational Measurement, 1991
Estimating Ability With the Wrong Model
Journal of Educational Statistics, 1987
The Generalizability of Class Means
Review of Educational Research, 1977
Alpha Coefficients for Stratified-Parallel Tests
Educational and Psychological Measurement, 1965
Coefficient alpha and the internal structure of tests
Psychometrika, 1951

Cited by 12 articles