Premium
A Framework for Evaluation and Use of Automated Scoring
Author(s) -
Williamson David M.,
Xi Xiaoming,
Breyer F. Jay
Publication year - 2012
Publication title -
educational measurement: issues and practice
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.158
H-Index - 52
eISSN - 1745-3992
pISSN - 0731-1745
DOI - 10.1111/j.1745-3992.2011.00223.x
Subject(s) - generalizability theory , computer science , scoring system , context (archaeology) , complement (music) , machine learning , data science , data mining , medicine , psychology , paleontology , developmental psychology , biochemistry , chemistry , surgery , complementation , gene , biology , phenotype
A framework for evaluation and use of automated scoring of constructed‐response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are discussed within the framework. The fit between the scoring capability and the assessment purpose, the agreement between human and automated scores, the consideration of associations with independent measures, the generalizability of automated scores as implemented in operational practice across different tasks and test forms, and the impact and consequences for the population and subgroups are proffered as integral evidence supporting use of automated scoring. Specific evaluation guidelines are provided for using automated scoring to complement human scoring for tests used for high‐stakes purposes. These guidelines are intended to be generalizable to new automated scoring systems and as existing systems change over time .