We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.
翻译:我们提出iFusion,一种仅需两幅未知相机姿态视角的新型三维物体重建框架。虽然单视角重建能产生视觉上令人满意的结果,但在未观测面可能显著偏离真实物体。增加视角可提升重建保真度,但需要已知相机姿态。然而,姿态信息的可用性假设往往不切实际,现有姿态估计算法在稀疏视角场景下表现不佳。为解决这一问题,我们利用预训练的新视角合成扩散模型,该模型嵌入了不同物体几何与外观的隐式知识。策略包含三个步骤:(1) 通过逆扩散模型进行相机姿态估计而非合成新视角;(2) 利用输入视角与估计姿态微调扩散模型,使其成为针对目标物体的新视角合成器;(3) 结合配准视角与微调扩散模型重建三维物体。实验表明该方法在姿态估计与新视角合成方面均表现优异。此外,iFusion可无缝集成各类重建方法并提升其性能。