Creating novel views from a single image has achieved tremendous strides with advanced autoregressive models, as unseen regions have to be inferred from the visible scene contents. Although recent methods generate high-quality novel views, synthesizing with only one explicit or implicit 3D geometry has a trade-off between two objectives that we call the "seesaw" problem: 1) preserving reprojected contents and 2) completing realistic out-of-view regions. Also, autoregressive models require a considerable computational cost. In this paper, we propose a single-image view synthesis framework for mitigating the seesaw problem while utilizing an efficient non-autoregressive model. Motivated by the characteristics that explicit methods well preserve reprojected pixels and implicit methods complete realistic out-of-view regions, we introduce a loss function to complement two renderers. Our loss function promotes that explicit features improve the reprojected area of implicit features and implicit features improve the out-of-view area of explicit features. With the proposed architecture and loss function, we can alleviate the seesaw problem, outperforming autoregressive-based state-of-the-art methods and generating an image $\approx$100 times faster. We validate the efficiency and effectiveness of our method with experiments on RealEstate10K and ACID datasets.
翻译:从单张图像生成新视角的方法随着先进的回归模型取得了巨大进展,因为需要从可见场景内容推断不可见区域。尽管现有方法能生成高质量的新视角,但仅使用显式或隐式3D几何进行合成会在两个目标间产生权衡,我们称之为"跷跷板"问题:1) 保留重投影内容,2) 补全合理的视野外区域。此外,自回归模型需要大量计算成本。本文提出一种单图像视角合成框架,在利用高效非自回归模型的同时缓解跷跷板问题。受显式方法能良好保留重投影像素而隐式方法能补全真实感视野外区域这一特性启发,我们引入一种损失函数来补充两种渲染器。该损失函数促使显式特征改善隐式特征的重投影区域,同时隐式特征改善显式特征的视野外区域。通过所提出的架构和损失函数,我们能够缓解跷跷板问题,超越基于自回归的最先进方法,并实现约100倍的图像生成加速。我们在RealEstate10K和ACID数据集上的实验验证了该方法的效率与有效性。