The subjective frequency of word n-grams
Author(s) -
Cyrus Shaoul,
Chris Westbury,
R. Harald Baayen
Publication year - 2013
Publication title -
psihologija
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.222
H-Index - 16
eISSN - 1451-9283
pISSN - 0048-5705
DOI - 10.2298/psi1304497s
Subject(s) - word lists by frequency , n gram , frequency , word (group theory) , lexical decision task , task (project management) , psychology , respondent , statistics , probabilistic logic , mathematics , natural language processing , speech recognition , artificial intelligence , computer science , language model , cognition , sentence , geometry , management , neuroscience , economics , political science , law
When asked to think about the subjective frequency of an n-gram (a group of n words), what properties of the n-gram influence the respondent? It has been recently shown that n-grams that occurred more frequently in a large corpus of English were read faster than n-grams that occurred less frequently (Arnon & Snider, 2010), an effect that is analogous to the frequency effects in word reading and lexical decision. The subjective frequency of words has also been extensively studied and linked to performance on linguistic tasks. We investigated the capacity of people to gauge the absolute and relative frequencies of n-grams. Subjective frequency ratings collected for 352 n-grams showed a strong correlation with corpus frequency, in particular for n-grams with the highest subjective frequency. These n-grams were then paired up and used in a relative frequency decision task (e.g. Is green hills more frequent than weekend trips?). Accuracy on this task was reliably above chance, and the trial-level accuracy was best predicted by a model that included the corpus frequencies of the whole n-grams. A computational model of word recognition (Baayen, Milin, Djurdjevic, Hendrix, & Marelli, 2011) was then used to attempt to simulate subjective frequency ratings, with limited success. Our results suggest that human n-gram frequency intuitions arise from the probabilistic information contained in n-grams.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom