This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision. Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks. We achieve this through designing an architecture that utilizes a shared representation, which serves as a foundation for enhanced 3D geometry understanding. Capitalizing on the inherent interplay between the tasks, our unified framework is trained end-to-end with the proposed training strategy to improve overall model accuracy. Through extensive evaluations across diverse indoor and outdoor scenes from two real-world datasets, we demonstrate that our approach achieves substantial improvement over previous methodologies, especially in scenarios characterized by extreme viewpoint changes and the absence of accurate camera poses.
翻译:本工作深入研究了基于立体图像对的无位姿新视角合成任务,这是三维视觉领域中一项具有挑战性和开创性的工作。我们的创新框架与以往任何工作不同,无缝集成了二维对应点匹配、相机位姿估计和NeRF渲染,促进了这些任务的协同增强。我们通过设计一种利用共享表示的架构来实现这一点,该表示为增强的三维几何理解奠定了基础。利用任务间的固有相互作用,我们的统一框架通过所提出的训练策略进行端到端训练,以提高整体模型精度。通过在两个真实世界数据集的多样化室内外场景中进行广泛评估,我们证明我们的方法相比以往方法取得了显著改进,尤其是在极端视角变化和缺乏准确相机位姿的场景中。