Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plausibly replace the missing content. A good inpainting method should therefore not only enable high-quality synthesis but also a high degree of control. Based on this observation, we focus on enabling explicit control over the inpainted content and leverage a reference image as an efficient means to achieve this goal. Specifically, we introduce RefFusion, a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. The personalization effectively adapts the prior distribution to the target scene, resulting in a lower variance of score distillation objective and hence significantly sharper details. Our framework achieves state-of-the-art results for object removal while maintaining high controllability. We further demonstrate the generality of our formulation on other downstream tasks such as object insertion, scene outpainting, and sparse view reconstruction.
翻译:神经重建方法正迅速成为三维场景表示的优选方案,但其有限的可编辑性仍构成挑战。本研究提出一种三维场景修复方法——即用期望内容连贯替换重建场景中部分区域的任务。场景修复本质上是一个病态问题,因为存在众多可合理替换缺失内容的解决方案。因此,优秀的修复方法不仅应实现高质量合成,还需具备高度可控性。基于这一观察,我们聚焦于对修复内容的显式控制,并利用参考图像作为实现该目标的有效手段。具体而言,我们提出RefFusion——一种基于多尺度图像修复扩散模型对给定参考视图进行个性化适配的新型三维修复方法。该个性化过程有效将先验分布适应至目标场景,从而降低分数蒸馏目标的方差,显著提升细节清晰度。本框架在保持高度可控性的同时,实现了物体移除任务的最优结果。我们进一步展示了本方法在物体插入、场景外推及稀疏视图重建等下游任务中的普适性。