Deeper Depth Prediction with Fully Convolutional Residual Networks
- 1 October 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 239-248
- https://doi.org/10.1109/3dv.2016.32
Abstract
This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, we introduce the reverse Huber loss that is particularly suited for the task at hand and driven by the value distributions commonly present in depth maps. Our model is composed of a single architecture that is trained end-to-end and does not rely on post-processing techniques, such as CRFs or other additional refinement steps. As a result, it runs in real-time on images or videos. In the evaluation, we show that the proposed model contains fewer parameters and requires fewer training data than the current state of the art, while outperforming all approaches on depth estimation. Code and models are publicly available.Keywords
Other Versions
This publication has 26 references indexed in Scilit:
- Towards unified depth and semantic prediction from a single imagePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Depth from focus with your mobile phonePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based FusionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Robust odometry estimation for RGB-D camerasPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Adaptive deconvolutional networks for mid and high level feature learningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Structure from motionPublished by Springer Science and Business Media LLC ,2010
- Make3D: Learning 3D Scene Structure from a Single Still ImageIEEE Transactions on Pattern Analysis and Machine Intelligence, 2008
- Learning Depth from StereoLecture Notes in Computer Science, 2004
- Shape-from-shading: a surveyIEEE Transactions on Pattern Analysis and Machine Intelligence, 1999