Existing image-based rendering methods usually adopt depth-based image warping operation to synthesize novel views. In this paper, we reason the essential limitations of the traditional warping operation to be the limited neighborhood and only distance-based interpolation weights. To this end, we propose content-aware warping, which adaptively learns the interpolation weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network. Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from a set of input source views, in which two additional modules, namely confidence-based blending and feature-assistant spatial refinement, are naturally proposed to handle the occlusion issue and capture the spatial correlation among pixels of the synthesized view, respectively. Besides, we also propose a weight-smoothness loss term to regularize the network. Experimental results on light field datasets with wide baselines and multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually. The source code will be publicly available at https://github.com/MantangGuo/CW4VS.
翻译:现有的基于图像的渲染方法通常采用基于深度的图像扭曲操作来合成新视图。在本文中,我们论证了传统扭曲操作的本质局限性在于其邻域范围有限且仅采用基于距离的插值权重。为此,我们提出内容自适应扭曲,该方法通过轻量级神经网络从相对较大邻域像素的上下文信息中自适应学习插值权重。基于这一可学习扭曲模块,我们提出了一种新的端到端学习框架,用于从一组输入源视图合成新视图。在该框架中,我们自然地引入了两个附加模块,即基于置信度的融合模块和特征辅助空间精化模块,分别用于处理遮挡问题和捕捉合成视图像素之间的空间相关性。此外,我们还提出了权重平滑损失项来规范网络训练。在宽基线光场数据集和多视图数据集上的实验结果表明,所提方法在定量指标和视觉效果上均显著优于现有最优方法。源代码将公开在 https://github.com/MantangGuo/CW4VS。