We propose a novel framework for diffusion-based novel view synthesis in which we leverage external representations as conditions, harnessing their geometric and semantic correspondence properties for enhanced geometric consistency in generated novel viewpoints. First, we provide a detailed analysis exploring the correspondence capabilities emergent in the spatial attention of external visual representations. Building from these insights, we propose a representation-guided novel view synthesis through dedicated representation projection modules that inject external representations into the diffusion process, a methodology named ReNoV, short for representation-guided novel view synthesis. Our experiments show that this design yields marked improvements in both reconstruction fidelity and inpainting quality, outperforming prior diffusion-based novel-view methods on standard benchmarks and enabling robust synthesis from sparse, unposed image collections.
翻译:我们提出了一种新颖的基于扩散模型的新视角合成框架,该框架利用外部表示作为条件,借助其几何与语义对应特性,以增强生成新视角的几何一致性。首先,我们进行了详细分析,探讨了外部视觉表示在空间注意力中涌现出的对应能力。基于这些洞见,我们提出了一种通过专用表示投影模块将外部表示注入扩散过程的表示引导新视角合成方法,该方法命名为ReNoV(表示引导新视角合成)。我们的实验表明,该设计在重建保真度和修复质量上均取得了显著提升,在标准基准测试中超越了先前基于扩散的新视角合成方法,并能够从稀疏、无位姿的图像集合中实现鲁棒的合成。