z-logo
open-access-imgOpen Access
EVALUATION OF THE E‐RATER ® SCORING ENGINE FOR THE GRE ® ISSUE AND ARGUMENT PROMPTS
Author(s) -
Ramineni Chaitanya,
Trapani Catherine S.,
Williamson David M.,
Davey Tim,
Bridgeman Brent
Publication year - 2012
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.2012.tb02284.x
Subject(s) - argument (complex analysis) , task (project management) , inter rater reliability , writing assessment , statistics , psychology , scoring system , computer science , cognitive psychology , artificial intelligence , mathematics , mathematics education , medicine , rating scale , management , surgery , economics
Automated scoring models for the e‐rater ® scoring engine were built and evaluated for the GRE ® argument and issue‐writing tasks. Prompt‐specific, generic, and generic with prompt‐specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in mean scores, and correlations with external measures were examined to evaluate the e‐rater model performance against human scores. Performance was also evaluated across different demographic subgroups. Additional analyses were performed to establish appropriate agreement thresholds between human and e‐rater scores for unusual essays and the impact of using e‐rater on operational scores. The generic e‐rater scoring model with operational prompt‐specific intercept for the issue‐writing task and prompt‐specific e‐rater scoring model for the argument writing task were recommended for operational use. The two automated scoring models were implemented to produce check scores at a discrepancy threshold of 0.5 with human scores.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here