通用物体作为姿态探针用于少视角合成 (Generic Objects as Pose Probes for Few-Shot View Synthesis)

Radiance fields including NeRFs and 3D Gaussians demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as inputs. COLMAP is frequently employed for preprocessing to estimate poses, while it necessitates a large number of feature matches to operate effectively, and it struggles with scenes characterized by sparse features, large baselines between images, or a limited number of input images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images. Traditional methods often use calibration boards but they are not common in images. We propose a novel idea of utilizing everyday objects, commonly found in both images and real life, as "pose probes". The probe object is automatically segmented by SAM, whose shape is initialized from a cube. We apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. Specifically, object poses of two views are first estimated by PnP matching in an SDF representation, which serves as initial poses. PnP matching, requiring only a few features, is suitable for feature-sparse scenes. Additional views are incrementally incorporated to refine poses from preceding views. In experiments, PoseProbe achieves state-of-the-art performance in both pose estimation and novel view synthesis across multiple datasets. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance. Our project page is available at: \href{https://zhirui-gao.github.io/PoseProbe.github.io/}{this https URL}

翻译：包括NeRF和3D高斯在内的辐射场在高保真渲染与场景重建方面展现出巨大潜力，但其需要大量带姿态图像作为输入。COLMAP常被用于预处理以估计姿态，然而该方法需要大量特征匹配才能有效运行，且在特征稀疏、图像间基线较大或输入图像数量有限的场景中表现不佳。我们的目标仅使用3至6张无姿态场景图像来解决少视角NeRF重建问题。传统方法常使用标定板，但这类物体在图像中并不常见。我们提出一种创新思路：利用日常物体（在图像和现实生活中普遍存在）作为“姿态探针”。探针物体通过SAM自动分割，其形状由立方体初始化。我们采用双分支体渲染优化（物体NeRF与场景NeRF）来约束姿态优化并联合优化几何结构。具体而言，首先通过SDF表示中的PnP匹配估计两个视角的物体姿态作为初始姿态。PnP匹配仅需少量特征，适用于特征稀疏场景。随后逐步加入额外视角以优化先前视角的姿态。实验表明，PoseProbe在多个数据集上实现了姿态估计与新视角合成的最先进性能。我们验证了其有效性，特别是在COLMAP难以处理的少视角与大基线场景中。消融实验显示，在同一场景中使用不同物体可获得相当的性能。项目页面详见：\href{https://zhirui-gao.github.io/PoseProbe.github.io/}{此https网址}