Advances in Multimedia Information Processing – PCM 2018
Author(s) -
Richang Hong,
Wen-Huang Cheng,
Toshihiko Yamasaki,
Meng Wang,
ChongWah Ngo
Publication year - 2018
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
DOI - 10.1007/978-3-030-00764-5
Subject(s) - computer science , multimedia , field (mathematics) , mathematics , pure mathematics
In this paper, we study the problem of object detection and segmentation in the cluttered indoor scenes based on RGB-D data. The main issues about object detection and segmentation in the indoor scenes are coming from serious obstruction, inconspicuous classes, and confusion categories. To solve these problems, we propose a multimodal fusion deep convolutional neural network (MFDCNN) framework for object detection and segmentation, which can boost the performance effectively at two levels whilst keeping the framework end-to-end training. Towards the object detection, we adopt a multimodal region proposal network to solve the problem of object-level detection, towards the semantic segmentation, we utilize a multimodal fully convolutional network to provide the class labels to which each pixel belongs. We focus on learning object detection and segmentation simultaneous, we propose a novel loss function to combine these two kind networks together. Under this framework, we focus on cluttered indoor scenes with challenging settings and evaluate the performance of our MFDCNN on the NYU-Depth V2 dataset. Our MFDCNN achieves state-of-the-art performance on the object detection task and earns the comparable state-of-the-art performance on the task of semantic segmentation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom