STUMPING  E‐RATER : CHALLENGING THE VALIDITY OF AUTOMATED ESSAY SCORING | Zendy

Powers Donald E. | Zendy; Burstein Jill C. | Zendy; Chodorow Martin | Zendy; Fowles Mary E. | Zendy; Kukich Karen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

STUMPING E‐RATER : CHALLENGING THE VALIDITY OF AUTOMATED ESSAY SCORING

Author(s) -

Powers Donald E.,

Burstein Jill C.,

Chodorow Martin,

Fowles Mary E.,

Kukich Karen

Publication year - 2001

Publication title -

ets research report series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.235

H-Index - 5

ISSN - 2330-8516

DOI - 10.1002/j.2333-8504.2001.tb01845.x

Subject(s) - writing assessment , computer science , inter rater reliability , psychology , scoring system , artificial intelligence , natural language processing , applied psychology , mathematics education , rating scale , developmental psychology , medicine , surgery

For this study, various writing experts were invited to “challenge” e‐rater – an automated essay scorer that relies on natural language processing techniques – by composing essays in response to Graduate Record Examinations (GRE ® ) Writing Assessment prompts with the intention of undermining its scoring capability. Specifically, using detailed information about e‐rater 's approach to essay scoring, writers tried to “trick” the computer‐based system into assigning scores that were higher or lower than deserved. E‐rater's automated scores on these “problem essays” were compared with scores given by two trained, human readers, and the difference between the scores constituted the standard for judging the extent to which e‐rater was fooled. Challengers were differentially successful in writing problematic essays. Expert writers were more successful in tricking e‐rater into assigning scores that were too high than in duping e‐rater into awarding scores that were too low. The study provides information on ways in which e‐rater , and perhaps other automated essay scoring systems, may fail to provide accurate evaluations, if used as the sole method of scoring in high‐stakes assessments. The results suggest possible avenues for improving automated scoring methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore