z-logo
open-access-imgOpen Access
Generating Multi-sentence Natural Language Descriptions of Indoor Scenes
Author(s) -
Dahua Lin,
Sanja Fidler,
Chen Kong,
Raquel Urtasun
Publication year - 2015
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5244/c.29.93
Subject(s) - computer science , artificial intelligence , parsing , sentence , natural language processing , context (archaeology) , representation (politics) , natural language , generative grammar , inference , generative model , paleontology , politics , political science , law , biology
This paper proposes a novel framework for generating lingual descriptions of indoor scenes. This is an important problem, as an effective solution to this problem can enable many exciting real-world applications, such as human robot interaction, image/video synopsis, and automatic caption generation. Whereas substantial efforts have been made to tackle this problem, previous approaches focusing primarily on generating a single sentence for each image, which is not sufficient for describing complex scenes. We attempt to go beyond this, by generating coherent descriptions with multiple sentences. Particularly, we are interested in generating multi-sentence descriptions of cluttered indoor scenes. Complex, multi-sentence output requires us to deal with challenging problems such as consistent co-referrals to visual entities across sentences. Furthermore, the sequence of sentences needs to be as natural as possible, mimicking how humans describe the scene. This is especially important for example in the context of social robotics to enable realistic communications. Towards this goal, we develop a framework with three major components: (1) a holistic visual parser based on [3] that couples the inference of objects, attributes, and relations to produce a semantic representation of a 3D scene (Fig. 1); (2) a generative grammar automatically learned from training text; and (3) a text generation algorithm that takes into account subtle dependencies across sentences, such as logical order, diversity, saliency of objects, and co-reference resolution.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom