Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement.Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.
翻译:见证文本到图像扩散模型的发展,文本到3D生成领域已取得显著进展。目前,该领域主要存在两种主流范式:前馈式生成方案(可快速生成3D资产但结果常较粗糙)和基于分数蒸馏采样(SDS)的方案(虽速度较慢但能生成高保真3D资产)。协同融合这两种方法有望推动3D生成技术的进步。本文提出BoostDream——一种高效即插即用的3D优化方法,旨在将粗糙的3D资产转化为高质量成果。BoostDream框架包含三个独立流程:(1)引入3D模型蒸馏技术,从前馈生成的3D资产中拟合可微分表征;(2)设计新型多视图SDS损失函数,利用多视图感知的2D扩散模型优化3D资产;(3)提出在优化过程中使用提示词与多视图一致性法线图作为引导。我们在不同可微分3D表征上开展了广泛实验,结果表明BoostDream能快速生成高质量3D资产,并克服了传统SDS方法中常见的贾纳斯问题。这一突破标志着3D生成流程在效率与质量上均取得重大进展。