Premium
The Generalizability of Scores for a Performance Assessment Scored with a Computer‐Automated Scoring System
Author(s) -
Clauser Brian E.,
Harik Polina,
Clyman Stephen G.
Publication year - 2000
Publication title -
journal of educational measurement
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.917
H-Index - 47
eISSN - 1745-3984
pISSN - 0022-0655
DOI - 10.1111/j.1745-3984.2000.tb01085.x
Subject(s) - generalizability theory , psychology , test validity , evaluation methods , psychometrics , computer science , applied psychology , clinical psychology , reliability engineering , developmental psychology , engineering
When performance assessments are delivered and scored by computer, the costs of scoring may be substantially lower than those of scoring the same assessment based on expert review of the individual performances. Computerized scoring algorithms also ensure that the scoring rules are implemented precisely and uniformly. Such computerized algorithms represent an effort to encode the scoring policies of experts. This raises the question, would a different group of experts have produced a meaningfully different algorithm? The research reported in this paper uses generalizability theory to assess the impact of using independent, randomly equivalent groups of experts to develop the scoring algorithms for a set of computer‐simulation tasks designed to measure physicians’ patient management skills. The results suggest that the impact of this “expert group” effect may be significant but that it can be controlled with appropriate test development strategies. The appendix presents multivariate generalizability analysis to examine the stability of the assessed proficiency across scores representing the scoring policies of different groups of experts.