Deep View Synthesis via Self-Consistent Generative Network

28 January 2021

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia

Vol. 24 (99), 451-465
https://doi.org/10.1109/tmm.2021.3053401

Abstract

View synthesis aims to produce unseen views from a set of views captured by two or more cameras at different positions. This task is non-trivial since it is hard to conduct pixel-level matching among different views. To address this issue, most existing methods seek to exploit the geometric information to match pixels. However, when the distinct cameras have a large baseline (i.e., far away from each other), severe geometry distortion issues would occur and the geometric information may fail to provide useful guidance, resulting in very blurry synthesized images. To address the above issues, in this paper, we propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views from the given input views without explicitly exploiting the geometric information. The proposed SCGN model consists of two main components, i.e., a View Synthesis Network (VSN) and a View Decomposition Network (VDN), both employing an Encoder-Decoder structure. Here, the VDN seeks to reconstruct input views from the synthesized novel view to preserve the consistency of view synthesis. Thanks to VDN, SCGN is able to synthesize novel views without using any geometric rectification before encoding, making it easier for both training and applications. Finally, adversarial loss is introduced to improve the photo-realism of novel views. Both qualitative and quantitative comparisons against several state-of-the-art methods on two benchmark tasks demonstrated the superiority of our approach.

Keywords

Funding Information

Science and Technology Program of Guangzhou, China (202007030007)
Key-Area Research and Development Program of Guangdong Province (2018B010107001)
National Natural Science Foundation of China (61836003)
Guangdong Project (2017ZT07X183)
Fundamental Research Funds for the Central Universities (D2191240)

This publication has 36 references indexed in Scilit:

View Synthesis by Appearance Flow
Published by Springer Science and Business Media LLC ,2016
Multi-view 3D Models from Single Images with a Convolutional Network
Published by Springer Science and Business Media LLC ,2016
Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Deep Stereo: Learning to Predict New Views from the World's Imagery
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Robust Statistical Face Frontalization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Learning to generate chairs with convolutional neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Multi-PIE
Image and Vision Computing, 2010
Stereo Analysis by Hybrid Recursive Matching for Real-Time Immersive Video Conferencing
IEEE Transactions on Circuits and Systems for Video Technology, 2004
Modeling and rendering architecture from photographs
Published by Association for Computing Machinery (ACM) ,1996
Stereo vision for view synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996

Cited by 4 articles