We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2$\times$2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions. Our project page is at https://ethanweber.me/nerfiller.
翻译:我们提出NeRFiller方法,该方法利用现成的2D视觉生成模型,通过生成式3D修复完成3D捕捉中缺失的部分。由于网格重建失败或观测不足(例如接触区域,如物体底部,或难以到达的区域),3D场景或物体的捕捉中常存在缺失部分。我们通过利用2D修复扩散模型来应对这一具有挑战性的3D修复问题。我们发现这些模型的一个令人惊讶的行为:当图像构成2×2网格时,它们能生成更具3D一致性的修复结果,并展示了如何将此行为推广至四张以上图像。随后,我们提出一个迭代框架,将这些修复区域提炼成单一一致的3D场景。与相关工作不同,我们专注于完成场景而非删除前景物体,且我们的方法不需要严格的2D物体掩码或文本。我们在多种场景下将我们的方法与适应于本设置的相关基线进行比较,结果表明NeRFiller能生成最具3D一致性和合理性的场景完成结果。项目页面:https://ethanweber.me/nerfiller。