Identifying the addressee in human-human-robot interactions based on head pose and speech
Author(s) -
Michael Katzenmaier,
Rainer Stiefelhagen,
Tanja Schultz
Publication year - 2004
Publication title -
repository kitopen (karlsruhe institute of technology)
Language(s) - English
Resource type - Conference proceedings
ISBN - 1-58113-995-0
DOI - 10.1145/1027933.1027959
Subject(s) - human–robot interaction , computer science , robot , artificial intelligence , feature (linguistics) , head (geology) , sensory cue , computer vision , identification (biology) , speech recognition , botany , biology , philosophy , geomorphology , geology , linguistics
In this work we investigate the power of acoustic and visual cues, and their combination, to identify the addressee in a human-human-robot interaction. Based on eighteen audio-visual recordings of two human beings and a (simulated) robot we discriminate the interaction of the two humans from the interaction of one human with the robot. The paper compares the result of three approaches. The first approach uses purely acoustic cues to find the addressees. Low level, feature based cues as well as higher-level cues are examined. In the second approach we test whether the human's head pose is a suitable cue. Our results show that visually estimated head pose is a more reliable cue for the identification of the addressee in the human-human-robot interaction. In the third approach we combine the acoustic and visual cues which results in significant improvements.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom