We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.
翻译:摘要:我们提出DiffRF,一种基于去噪扩散概率模型的三维辐射场合成新方法。现有扩散方法虽能处理图像、隐层编码或点云数据,但我们是首个直接生成体积辐射场的工作。为此,我们提出一个直接作用于显式体素网格表示的三维去噪模型。然而,由于从一组带位姿图像生成的辐射场存在歧义且包含伪影,获取真实的辐射场样本具有挑战性。我们通过将去噪公式与渲染损失相结合来解决该问题,使模型学习一种偏向良好图像质量的偏差先验,而非试图复制如漂浮伪影等拟合误差。与二维扩散模型相比,我们的模型学习多视图一致先验,支持自由视角合成和精确形状生成。相较于三维生成对抗网络(GAN),我们的扩散方法天然支持条件生成,例如推理时的掩码补全或单视角三维合成。