z-logo
open-access-imgOpen Access
An Intelligence Architecture for Grounded Language Communication with Field Robots
Author(s) -
Thomas Howard,
Ethan Stump,
Jonathan Fink,
Jacob Arkin,
Rohan Paul,
Daehyung Park,
Subhro Roy,
Daniel Barber,
Rhyse Bendell,
Karl Schmeckpeper,
Jing Tian,
Jean Oh,
Maggie Wigness,
Long Quang,
Brandon Rothrock,
Jeremy Nash,
Matthew Walter,
Florian Jentsch,
Nicholas Roy
Publication year - 2022
Publication title -
field robotics
Language(s) - English
Resource type - Journals
ISSN - 2771-3989
DOI - 10.55417/fr.2022017
Subject(s) - computer science , human–computer interaction , robot , exploit , pipeline (software) , interface (matter) , artificial intelligence , human–robot interaction , situated , semantics (computer science) , field (mathematics) , natural language , computer security , programming language , mathematics , bubble , maximum bubble pressure method , parallel computing , pure mathematics
For humans and robots to collaborate effectively as teammates in unstructured environments, robots must be able to construct semantically rich models of the environment, communicate efficiently with teammates, and perform sequences of tasks robustly with minimal human intervention, as direct human guidance may be infrequent and/or intermittent. Contemporary architectures for human-robot interaction often rely on engineered human-interface devices or structured languages that require extensive prior training and inherently limit the kinds of information that humans and robots can communicate. Natural language, particularly when situated with a visual representation of the robot’s environment, allows humans and robots to exchange information about abstract goals, specific actions, and/or properties of the environment quickly and effectively. In addition, it serves as a mechanism to resolve inconsistencies in the mental models of the environment across the human-robot team. This article details a novel intelligence architecture that exploits a centralized representation of the environment to perform complex tasks in unstructured environments. The centralized environment model is informed by a visual perception pipeline, declarative knowledge, deliberate interactive estimation, and a multimodal interface. The language pipeline also exploits proactive symbol grounding to resolve uncertainty in ambiguous statements through inverse semantics. A series of experiments on three different, unmanned ground vehicles demonstrates the utility of this architecture through its robust ability to perform language-guided spatial navigation, mobile manipulation, and bidirectional communication with human operators. Experimental results give examples of component-level behaviors and overall system performance that guide a discussion on observed performance and opportunities for future innovation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here