Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with the input. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results, even when faced with open-world objects. To address the blurriness introduced by conventional SDS, we introduce the category-score distillation sampling (C-SDS) to enhance detail. We conduct experiments on CO3DV2 which is a multi-view dataset of real-world objects. Both quantitative and qualitative evaluations demonstrate that our approach outperforms previous state-of-the-art works on the metrics regarding NVS and geometry reconstruction.
翻译:从极端稀疏视角重建3D物体是一个长期且具有挑战性的问题。尽管近期技术利用图像扩散模型在新视角生成合理图像,或通过分数蒸馏采样(SDS)将预训练的扩散先验蒸馏到3D表示中,但这些方法往往难以同时在新视角合成(NVS)和几何重建方面实现高质量、一致且细节丰富的结果。本文提出Sparse3D,一种专门针对稀疏视角输入的新型3D重建方法。该方法从多视图一致扩散模型中蒸馏鲁棒先验,以优化神经辐射场。具体而言,我们采用一个控制器,利用输入视角的对极特征,引导预训练的扩散模型(如Stable Diffusion)生成与输入保持3D一致性的新视角图像。通过利用强大图像扩散模型的2D先验,我们的集成模型即使面对开放世界物体也能持续提供高质量结果。为解决传统SDS引入的模糊问题,我们引入类别分数蒸馏采样(C-SDS)来增强细节。我们在真实世界物体多视角数据集CO3DV2上进行实验。定量与定性评估均表明,我们的方法在NVS和几何重建相关指标上超越先前最先进工作。