Deep View Synthesis via Self-Consistent Generative Network
- 28 January 2021
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia
- Vol. 24 (99), 451-465
- https://doi.org/10.1109/tmm.2021.3053401
Abstract
View synthesis aims to produce unseen views from a set of views captured by two or more cameras at different positions. This task is non-trivial since it is hard to conduct pixel-level matching among different views. To address this issue, most existing methods seek to exploit the geometric information to match pixels. However, when the distinct cameras have a large baseline (i.e., far away from each other), severe geometry distortion issues would occur and the geometric information may fail to provide useful guidance, resulting in very blurry synthesized images. To address the above issues, in this paper, we propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views from the given input views without explicitly exploiting the geometric information. The proposed SCGN model consists of two main components, i.e., a View Synthesis Network (VSN) and a View Decomposition Network (VDN), both employing an Encoder-Decoder structure. Here, the VDN seeks to reconstruct input views from the synthesized novel view to preserve the consistency of view synthesis. Thanks to VDN, SCGN is able to synthesize novel views without using any geometric rectification before encoding, making it easier for both training and applications. Finally, adversarial loss is introduced to improve the photo-realism of novel views. Both qualitative and quantitative comparisons against several state-of-the-art methods on two benchmark tasks demonstrated the superiority of our approach.Keywords
Funding Information
- Science and Technology Program of Guangzhou, China (202007030007)
- Key-Area Research and Development Program of Guangdong Province (2018B010107001)
- National Natural Science Foundation of China (61836003)
- Guangdong Project (2017ZT07X183)
- Fundamental Research Funds for the Central Universities (D2191240)
This publication has 36 references indexed in Scilit:
- View Synthesis by Appearance FlowPublished by Springer Science and Business Media LLC ,2016
- Multi-view 3D Models from Single Images with a Convolutional NetworkPublished by Springer Science and Business Media LLC ,2016
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Deep Stereo: Learning to Predict New Views from the World's ImageryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Robust Statistical Face FrontalizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Learning to generate chairs with convolutional neural networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Multi-PIEImage and Vision Computing, 2010
- Stereo Analysis by Hybrid Recursive Matching for Real-Time Immersive Video ConferencingIEEE Transactions on Circuits and Systems for Video Technology, 2004
- Modeling and rendering architecture from photographsPublished by Association for Computing Machinery (ACM) ,1996
- Stereo vision for view synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996