How much do students’ scores in PISA reflect general intelligence and how much do they reflect specific abilities?

Abstract
International Large-Scale Assessments (LSA) allow comparisons of education systems' effectiveness in promoting student learning in specific domains, such as reading, mathematics, and science. However, it has been argued that students' scores in International LSAs mostly reflect general cognitive ability (g). This study examines the extent to which students' scores in reading, mathematics, science, and a Raven's Progressive Matrices test reflect general ability g and domain-specific abilities with data from 3,472 Polish students who participated in the OECD's 2009 Programme for International Student Assessment (PISA) and who were retested with the same PISA instruments, but with a different item set, in 2010. Variance in students' responses to test items is explained better by with a bifactor Item Response Theory (IRT) model than by the multidimensional IRT model routinely used to scale PISA and other LSAs. The bifactor IRT model assumes that non-g factors (reading, math, science, and Raven's test) are uncorrelated with g and with each other. The bifactor model generates specific ability factors with more theoretically credible relationships with criterion variables than the multidimensional standard model. Further analyses of the bifactor model indicate that the domain-specific factors are not reliable enough to be interpreted meaningfully. They lie somewhere between unreliable measures of domain-specific abilities and nuisance factors reflecting measurement error. The finding that PISA achievement scores reflect mostly g, which may arise because PISA aims to test broad abilities in a variety of contexts or may be a general characteristic of LSAs and national achievement tests. Educational Impact and Implications Statement This study analyzes Programme for International Student Assessment data from Poland to establish how much the achievement of secondary school students in reading, mathematics, science and in a Raven's Progressive Matrices test reflects general ability and how much it reflects domain-specific abilities. Findings indicate that a scaling model that accounts for general ability, fit the data better than models typically employed in large scale assessments that ignore the influence of general ability on student achievement. The finding that students' responses to PISA test items reflect general ability rather than domain-specific abilities, if replicated to other countries, could have important implications for the design of large-scale assessments and the interpretation of analyses of large-scale assessment data.