Deep Stereo: Learning to Predict New Views from the World's Imagery

Top Cited Papers

1 June 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 5515-5524
https://doi.org/10.1109/cvpr.2016.595

Abstract

Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision [22, 33], but their use in graphics problems has been limited ([23, 7] are notable recent exceptions). In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches, which consist of multiple complex stages of processing, each of which requires careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network, which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system, which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. We show view interpolation results on imagery from the KITTI dataset [12], from data from [1] as well as on Google Street View images. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.

Keywords

This publication has 28 references indexed in Scilit:

First-person hyper-lapse videos
ACM Transactions on Graphics, 2014
Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014
Depth synthesis and local warps for plausible image-based navigation
ACM Transactions on Graphics, 2013
Silhouette‐Aware Warping for Image‐Based Rendering
Computer Graphics Forum, 2011
Floating Textures
Computer Graphics Forum, 2008
On New View Synthesis Using Multiview Stereo
Published by British Machine Vision Association and Society for Pattern Recognition ,2007
Efficient Dense Stereo with Occlusions for New View-Synthesis by Four-State Dynamic Programming
International Journal of Computer Vision, 2006
High-quality video view interpolation using a layered representation
Published by Association for Computing Machinery (ACM) ,2004
View morphing
Published by Association for Computing Machinery (ACM) ,1996
Light field rendering
Published by Association for Computing Machinery (ACM) ,1996

Cited by 307 articles