We introduce MVDream, a multi-view diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view prior can serve as a generalizable 3D prior that is agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.
翻译:我们提出MVDream,一种可从给定文本提示生成一致多视角图像的多视角扩散模型。通过同时从2D和3D数据中学习,该多视角扩散模型能够兼顾2D扩散模型的泛化能力与3D渲染的一致性。我们证明,这种多视角先验可作为一种与3D表示无关的通用3D先验。通过分数蒸馏采样(Score Distillation Sampling)将其应用于3D生成时,能显著提升现有2D提升方法的一致性与稳定性。此外,该模型还能像DreamBooth一样,仅通过少量2D示例学习新概念,但直接服务于3D生成任务。