
Improving stereo matching by incorporating geometry prior into ConvNet
Author(s) -
Liang Zhengfa,
Liu Hengzhu,
Qiao Linbo,
Feng Yiliu,
Chen Wei
Publication year - 2017
Publication title -
electronics letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 146
ISSN - 1350-911X
DOI - 10.1049/el.2017.2418
Subject(s) - artificial intelligence , image warping , ground truth , computer science , computer vision , consistency (knowledge bases) , matching (statistics) , metric (unit) , image (mathematics) , epipolar geometry , process (computing) , pattern recognition (psychology) , mathematics , statistics , economics , operating system , operations management
Deep learning‐based methods for stereo matching have shown superior performance over traditional ones. However, most of them ignore the inherent geometry prior of stereo matching when training, i.e. the reference image can be reconstructed from the second image in the visible regions. The reconstruction can be achieved by backward warping the second image using the disparity map of the reference image, while the visible regions can be calculated by left‐right consistency check. This prior is useful especially when the ground truth disparity is sparse (e.g. the outdoor scene such as KITTI 2015). This prior incorporated into a two‐stage end‐to‐end training process, both of which try to minimise the end‐point‐error with respect to the sparse ground truth disparity (supervised learning), and the reconstruction error (self‐supervised learning). The predicted disparity and the reconstruction error of the first stage act as additional information, and are fed to the second stage to make further use of this prior knowledge to improve performance. Experiments on the challenging KITTI 2015 dataset show that the method improves the results in the foreground region, and ranks first among all the published methods on the D1‐fg metric.