Deeper Depth Prediction with Fully Convolutional Residual Networks

1 October 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 239-248
https://doi.org/10.1109/3dv.2016.32

Abstract

This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, we introduce the reverse Huber loss that is particularly suited for the task at hand and driven by the value distributions commonly present in depth maps. Our model is composed of a single architecture that is trained end-to-end and does not rely on post-processing techniques, such as CRFs or other additional refinement steps. As a result, it runs in real-time on images or videos. In the evaluation, we show that the proposed model contains fewer parameters and requires fewer training data than the current state of the art, while outperforming all approaches on depth estimation. Code and models are publicly available.

Keywords

Other Versions

Version 2, 2016-06-01, preprints

This publication has 26 references indexed in Scilit:

Towards unified depth and semantic prediction from a single image
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Depth from focus with your mobile phone
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Robust odometry estimation for RGB-D cameras
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Adaptive deconvolutional networks for mid and high level feature learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Structure from motion
Published by Springer Science and Business Media LLC ,2010
Make3D: Learning 3D Scene Structure from a Single Still Image
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008
Learning Depth from Stereo
Lecture Notes in Computer Science, 2004
Shape-from-shading: a survey
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999

Cited by 1343 articles