
A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment
Author(s) -
Kaja Zupanc,
Erik Štrumbelj
Publication year - 2018
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0195297
Subject(s) - bayesian probability , scale (ratio) , statistics , inter rater reliability , bayesian hierarchical modeling , reliability (semiconductor) , rubric , bayesian inference , computer science , hierarchical database model , econometrics , point estimation , psychology , mathematics , data mining , rating scale , geography , cartography , power (physics) , physics , quantum mechanics , mathematics education
We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias.