Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition | Zendy

Elin Larsen | Zendy; Alejandrina Cristià | Zendy; Emmanuel Dupoux | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition

Author(s) -

Elin Larsen,

Alejandrina Cristià,

Emmanuel Dupoux

Publication year - 2017

Publication title -

interspeech 2022

Language(s) - English

Resource type - Conference proceedings

DOI - 10.21437/interspeech.2017-937

Subject(s) - computer science , natural language processing , artificial intelligence , vocabulary , text segmentation , word (group theory) , segmentation , speech recognition , linguistics , philosophy

A range of computational approaches have been used to model the discovery of word forms from continuous speech by infants. Typically, these algorithms are evaluated with respect to the ideal ‘gold standard’ word segmentation and lexicon. These metrics assess how well an algorithm matches the adult state, but may not reflect the intermediate states of the child’s lexical development. We set up a new evaluation method based on the correlation between word frequency counts derived from the application of an algorithm onto a corpus of child-directed speech, and the proportion of infants knowing those words, according to parental reports. We evaluate a representative set of 4 algorithms, applied to transcriptions of the Brent corpus, which have been phonologized using either phonemes or syllables as basic units. Results show remarkable variation in the extent to which these 8 algorithm-unit combinations predicted infant vocabulary, with some of these predictions surpassing those derived from the adult gold standard segmentation. We argue that infant vocabulary prediction provides a useful complement to traditional evaluation; for example, the best predictor model was also one of the worst in terms of segmentation score, and there was no clear relationship between token or boundary F-score and vocabulary prediction.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research