Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF
翻译:神经辐射场(NeRF)是一种从多视角图像进行三维重建的表示方法。尽管近期一些工作展示了利用扩散先验编辑重建NeRF的初步成功,它们在完全未覆盖区域合成合理几何结构方面仍面临困难。一个主要原因是扩散模型生成内容的高度多样性,这阻碍了辐射场收敛到清晰且确定的几何形状。此外,在真实数据上应用潜在扩散模型常因自编码误差导致与图像条件不协调的纹理偏移。这两个问题在使用像素距离损失时会被进一步放大。为解决这些问题,我们提出通过逐场景定制来调节扩散模型的随机性,并利用掩码对抗训练减轻纹理偏移。在分析过程中,我们还发现常用的像素损失和感知损失对NeRF修复任务是有害的。通过严格实验,我们的框架在多种真实场景中实现了最先进的NeRF修复效果。项目页面:https://hubert0527.github.io/MALD-NeRF