Premium
Learning Phonemes With a Proto‐Lexicon
Author(s) -
Martin Andrew,
Peperkamp Sharon,
Dupoux Emmanuel
Publication year - 2012
Publication title -
cognitive science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.498
H-Index - 114
eISSN - 1551-6709
pISSN - 0364-0213
DOI - 10.1111/j.1551-6709.2012.01267.x
Subject(s) - lexicon , computer science , linguistics , natural language processing , artificial intelligence , speech recognition , psychology , philosophy
Before the end of the first year of life, infants begin to lose the ability to perceive distinctions between sounds that are not phonemic in their native language. It is typically assumed that this developmental change reflects the construction of language‐specific phoneme categories, but how these categories are learned largely remains a mystery. Peperkamp, Le Calvez, Nadal, and Dupoux (2006) present an algorithm that can discover phonemes using the distributions of allophones as well as the phonetic properties of the allophones and their contexts. We show that a third type of information source, the occurrence of pairs of minimally differing word forms in speech heard by the infant, is also useful for learning phonemic categories and is in fact more reliable than purely distributional information in data containing a large number of allophones. In our model, learners build an approximation of the lexicon consisting of the high‐frequency n ‐grams present in their speech input, allowing them to take advantage of top‐down lexical information without needing to learn words. This may explain how infants have already begun to exhibit sensitivity to phonemic categories before they have a large receptive lexicon.