Premium
Learning words from sights and sounds: a computational model
Author(s) -
Roy Deb K.,
Pentland Alex P.
Publication year - 2002
Publication title -
cognitive science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.498
H-Index - 114
eISSN - 1551-6709
pISSN - 0364-0213
DOI - 10.1207/s15516709cog2601_4
Subject(s) - computer science , categorization , lexicon , artificial intelligence , speech recognition , sight , process (computing) , segmentation , computational model , natural language processing , set (abstract data type) , physics , astronomy , programming language , operating system
Abstract This paper presents an implemented computational model of word acquisition which learns directly from raw multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent cross‐modal structure. The model has been implemented in a system using novel speech processing, computer vision, and machine learning algorithms. In evaluations the model successfully performed speech segmentation, word discovery and visual categorization from spontaneous infant‐directed speech paired with video images of single objects. These results demonstrate the possibility of using state‐of‐the‐art techniques from sensory pattern recognition and machine learning to implement cognitive models which can process raw sensor data without the need for human transcription or labeling.