(searched for: doi:10.1186/s42409-022-00039-w)
Published: 26 May 2023
Information, Volume 14; https://doi.org/10.3390/info14060306
Item response theory (IRT) models are factor models for dichotomous or polytomous variables (i.e., item responses). The symmetric logistic or probit link functions are most frequently utilized for modeling dichotomous or polytomous items. In this article, we propose an IRT model for dichotomous and polytomous items using the asymmetric generalistic logistic link function that covers a lot of symmetric and asymmetric link functions. Compared to IRT modeling based on the logistic or probit link function, the generalized logistic link function additionally estimates two parameters related to the asymmetry of the link function. To stabilize the estimation of item-specific asymmetry parameters, regularized estimation is employed. The usefulness of the proposed model is illustrated through simulations and empirical examples for dichotomous and polytomous item responses.
Knowledge, Volume 3, pp 215-231; https://doi.org/10.3390/knowledge3020015
In recent literature, alternative models for handling missing item responses in large-scale assessments have been proposed. Based on simulations and arguments based on psychometric test theory, it is argued in this literature that missing item responses should never be scored as incorrect in scaling models but rather treated as ignorable or handled based on a model. The present article shows that these arguments have limited validity and illustrates the consequences in a country comparison using the PIRLS 2011 study. It is argued that students omit (constructed response) items because they do not know the correct item answer. A different treatment of missing item responses than scoring them as incorrect leads to significant changes in country rankings, which induces nonignorable consequences regarding the validity of the results. Additionally, two alternative item response models are proposed based on different assumptions for missing item responses. In the first pseudo-likelihood approach, missing item responses for a particular student are replaced by a score that ranges between zero and a model-implied probability computed based on the non-missing items. In the second approach, the probability of a missing item response is predicted by a latent response propensity variable and the item response itself. The models were applied to the PIRLS 2011 study, demonstrating that country comparisons change under different modeling assumptions for missing item responses.
Psych, Volume 5, pp 350-375; https://doi.org/10.3390/psych5020024
In this study, we present a package for R that is intended as a professional tool for the management and analysis of data from educational tests and useful both in high-stakes assessment programs and survey research. Focused on psychometric models based on the sum score as the scoring rule and having sufficient statistics for their parameters, dexter fully exploits the many theoretical and practical advantages of this choice: lack of unnecessary assumptions, stable and fast estimation, and powerful and sensible diagnostic techniques. It includes an easy to use data management system tailored to the structure of test data and compatible with the current paradigm of tidy data. Companion packages currently include a graphical user interface and support for multi-stage testing.
Published: 26 April 2023
Educational and Psychological Measurement; https://doi.org/10.1177/00131644231168398
Rapid guessing (RG) is a form of non-effortful responding that is characterized by short response latencies. This construct-irrelevant behavior has been shown in previous research to bias inferences concerning measurement properties and scores. To mitigate these deleterious effects, a number of response time threshold scoring procedures have been proposed, which recode RG responses (e.g., treat them as incorrect or missing, or impute probable values) and then estimate parameters for the recoded dataset using a unidimensional or multidimensional IRT model. To date, there have been limited attempts to compare these methods under the possibility that RG may be misclassified in practice. To address this shortcoming, the present simulation study compared item and ability parameter recovery for four scoring procedures by manipulating sample size, the linear relationship between RG propensity and ability, the percentage of RG responses, and the type and rate of RG misclassifications. Results demonstrated two general trends. First, across all conditions, treating RG responses as incorrect produced the largest degree of combined systematic and random error (larger than ignoring RG). Second, the remaining scoring approaches generally provided equal accuracy in parameter recovery when RG was perfectly identified; however, the multidimensional IRT approach was susceptible to increased error as misclassification rates grew. Overall, the findings suggest that recoding RG as missing and employing a unidimensional IRT model is a promising approach.
Published: 1 February 2023
by Elsevier BV
Journal: Learning and Individual Differences
Learning and Individual Differences, Volume 102; https://doi.org/10.1016/j.lindif.2023.102267
Stats, Volume 6, pp 192-208; https://doi.org/10.3390/stats6010012
In the social sciences, the performance of two groups is frequently compared based on a cognitive test involving binary items. Item response models are often utilized for comparing the two groups. However, the presence of differential item functioning (DIF) can impact group comparisons. In order to avoid the biased estimation of groups, appropriate statistical methods for handling differential item functioning are required. This article compares the performance-regularized estimation and several robust linking approaches in three simulation studies that address the one-parameter logistic (1PL) and two-parameter logistic (2PL) models, respectively. It turned out that robust linking approaches are at least as effective as the regularized estimation approach in most of the conditions in the simulation studies.
Psych, Volume 5, pp 38-49; https://doi.org/10.3390/psych5010004
To compute factor score estimates, lavaan version 0.6–12 offers the function lavPredict( ) that can not only be applied in single-level modeling but also in multilevel modeling, where characteristics of higher-level units such as working environments or team leaders are often assessed by ratings of employees. Surprisingly, the function provides results that deviate from the expected ones. Specifically, whereas the function yields correct EAP estimates of higher-level factors, the ML estimates are counterintuitive and possibly incorrect. Moreover, the function does not provide the expected standard errors. I illustrate these issues using an example from organizational research where team leaders are evaluated by their employees, and I discuss these issues from a measurement perspective.
Published: 17 November 2022
Mathematical and Computational Applications, Volume 27; https://doi.org/10.3390/mca27060095
Guessing effects frequently occur in testing data in educational or psychological applications. Different item response models have been proposed to handle guessing effects in dichotomous test items. However, it has been pointed out in the literature that the often employed three-parameter logistic model poses implausible assumptions regarding the guessing process. The four-parameter guessing model has been proposed as an alternative to circumvent these conceptual issues. In this article, the four-parameter guessing model is compared with alternative item response models for handling guessing effects through a simulation study and an empirical example. It turns out that model selection for item response models should be rather based on the AIC than the BIC. However, the RMSD item fit statistic used with typical cutoff values was found to be ineffective in detecting misspecified item response models. Furthermore, sufficiently large sample sizes are required for sufficiently precise item parameter estimation. Moreover, it is argued that the criterion of the statistical model fit should not be the sole criterion of model choice. The item response model used in operational practice should be valid with respect to the meaning of the ability variable and the underlying model assumptions. In this sense, the four-parameter guessing model could be the model of choice in educational large-scale assessment studies.