A Semi-Supervised Approach to Monocular Depth Estimation, Depth Refinement, and Semantic Segmentation of Driving Scenes using a Siamese Triple Decoder Architecture | Zendy

John Paul T. Yusiong | Zendy; Prospero C. Naval | Zendy

Open Access

A Semi-Supervised Approach to Monocular Depth Estimation, Depth Refinement, and Semantic Segmentation of Driving Scenes using a Siamese Triple Decoder Architecture

Author(s) -

John Paul T. Yusiong,

Prospero C. Naval

Publication year - 2020

Publication title -

informatica

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.172

H-Index - 34

eISSN - 1854-3871

pISSN - 0350-5596

DOI - 10.31449/inf.v44i4.3018

Subject(s) - computer science , segmentation , artificial intelligence , ground truth , monocular , task (project management) , context (archaeology) , depth map , computer vision , semantics (computer science) , image (mathematics) , pattern recognition (psychology) , paleontology , management , economics , biology , programming language

Depth estimation and semantic segmentation are two fundamental tasks in scene understanding. These two tasks are usually solved separately, although they have complementary properties and are highly correlated. Jointly solving these two tasks is very beneficial for real-world applications that require both geometric and semantic information. Within this context, the paper presents a unified learning framework for generating a refined depth estimation map and semantic segmentation map given a single image. Specifically, this paper proposes a novel architecture called JDSNet. JDSNet is a Siamese triple decoder architecture that can simultaneously perform depth estimation, depth refinement, and semantic labeling of a scene from an image by exploiting the interaction between depth and semantic information. A semi-supervised method is used to train JDSNet to learn features for both tasks where geometry-based image reconstruction methods are employed instead of ground-truth depth labels for the depth estimation task while ground-truth semantic labels are required for the semantic segmentation task. This work uses the KITTI driving dataset to evaluate the effectiveness of the proposed approach. The experimental results show that the proposed approach achieves excellent performance on both tasks, and these indicate that the model can effectively utilize both geometric and semantic information.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research