It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation. 2D diffusion models solely learn view-agnostic priors and thus lack 3D knowledge during the lifting, leading to the multi-view inconsistency problem. We find that this problem primarily stems from geometric inconsistency, and avoiding misplaced geometric structures substantially mitigates the problem in the final outputs. Therefore, we improve the consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting, addressing the vast majority of the problem. This is achieved by fine-tuning the 2D diffusion model to be viewpoint-aware and to produce view-specific coordinate maps of canonically oriented 3D objects. In our process, only coarse 3D information is used for aligning. This "coarse" alignment not only resolves the multi-view inconsistency in geometries but also retains the ability in 2D diffusion models to generate detailed and diversified high-quality objects unseen in the 3D datasets. Furthermore, our aligned geometric priors (AGP) are generic and can be seamlessly integrated into various state-of-the-art pipelines, obtaining high generalizability in terms of unseen shapes and visual appearance while greatly alleviating the multi-view inconsistency problem. Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation, while many previous methods are around 30%. Our project page is https://sweetdreamer3d.github.io/
翻译:将预训练扩散模型中的二维结果提升至三维世界进行文本到三维生成存在固有歧义。2D扩散模型仅学习与视角无关的先验信息,因此在提升过程中缺乏三维知识,导致多视角不一致问题。我们发现该问题主要源于几何不一致性,而避免错位的几何结构能显著缓解最终输出中的这一缺陷。为此,通过在提升过程中将扩散模型中的2D几何先验与定义明确的三维形状对齐,我们解决了绝大部分不一致问题。这通过微调2D扩散模型使其具备视角感知能力,并生成规范朝向三维物体对应的视角特定坐标图来实现。我们的过程中仅使用粗糙三维信息进行对齐。这种"粗粒度"对齐不仅能解决几何中的多视角不一致问题,还保留了2D扩散模型生成三维数据集中未见过的细节丰富且多样化的高质量物体的能力。此外,我们的对齐几何先验(AGP)具有通用性,可无缝集成到多种先进流程中,在保持未见形状与视觉外观高泛化性的同时,显著缓解多视角不一致问题。人工评估显示,我们的方法实现了85%以上的一致性率,达到新最优性能,而此前多数方法仅约30%。项目页面详见:https://sweetdreamer3d.github.io/