
APPLYING CONTENT SIMILARITY METRICS TO CORPUS DATA: DIFFERENCES BETWEEN NATIVE AND NON‐NATIVE SPEAKER RESPONSES TO A TOEFL® INTEGRATED WRITING PROMPT
Author(s) -
Deane Paul,
Gurevich Olga
Publication year - 2008
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.2008.tb02137.x
Subject(s) - test of english as a foreign language , natural language processing , similarity (geometry) , computer science , population , first language , artificial intelligence , test (biology) , linguistics , speech recognition , psychology , language assessment , mathematics education , image (mathematics) , biology , paleontology , philosophy , demography , sociology
For many purposes, it is useful to collect a corpus of texts all produced to the same stimulus, whether to measure performance (as on a test) or to test hypotheses about population differences. This paper examines several methods for measuring similarities in phrasing and content and demonstrates that these methods can be used to identify population differences between native and non‐native speakers of English in a writing task.