Estimating Measurement Errors and Score Dependability of NECO 2019 Mathematics Examination

Abstract
This study estimated measurement error and score dependability in examinations using the Generalizability Theory. Scores obtained by the students (object of measurements) in examinations are affected by multiple sources of error (facets), and these scores are used in taking relative and absolute decisions about the students. There is, therefore, needed to estimate measurement error and score dependability to find the extent of the contributions of the facets to error in examination scores. Three research questions and one hypothesis were used to guide the study. The study population comprised 5,085 SS3 students of the 34 Government-owned senior secondary schools in Yenagoa LGA of Bayelsa State 2019/2020 academic session. 10 schools were selected using simple random sampling technic and the 1,525 SS3 students of the selected schools formed the sample. section A of the 2018 NECO Mathematics main paper and 2018 NECO Mathematics Marking Scheme were used to collect the data. EduG version 6.0-e based on ANOVA and Generalizability theory was used to answer the three research questions. A 95% confidence interval was computed using the standard error variance components to test the hypothesis. The findings of the study revealed that some hidden sources of error were at play in the study. the students’ facets (σ2S) made the highest contribution to measurement error in examination scores followed by the residual (σ2SIM). Also, the Students’ facet was significantly different (p< 0.05) in its contributions to measurement error, while the other facets and their interactions were not significantly different in their contribution to measurement error. Hence, Ho1 was not accepted for the students’ facet but accepted for other facets. An increase in the level of markers from 1 to 4 with level of items 5 yielded an outcome of 0.84 to 0.91 respectively, a generalizability coefficient of 0.94 high enough to rank order students according to their relative abilities in examinations was obtained when the level of markers was at 2 with an increment in level of items to 10. An index of dependability of 0.93 that is high enough to maximize reliability was obtained when we have level of markers at 2 and the items at 10.