It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation. 2D diffusion models solely learn view-agnostic priors and thus lack 3D knowledge during the lifting, leading to the multi-view inconsistency problem. We find that this problem primarily stems from geometric inconsistency, and avoiding misplaced geometric structures substantially mitigates the problem in the final outputs. Therefore, we improve the consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting, addressing the vast majority of the problem. This is achieved by fine-tuning the 2D diffusion model to be viewpoint-aware and to produce view-specific coordinate maps of canonically oriented 3D objects. In our process, only coarse 3D information is used for aligning. This "coarse" alignment not only resolves the multi-view inconsistency in geometries but also retains the ability in 2D diffusion models to generate detailed and diversified high-quality objects unseen in the 3D datasets. Furthermore, our aligned geometric priors (AGP) are generic and can be seamlessly integrated into various state-of-the-art pipelines, obtaining high generalizability in terms of unseen shapes and visual appearance while greatly alleviating the multi-view inconsistency problem. Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation, while many previous methods are around 30%. Our project page is https://sweetdreamer3d.github.io/
翻译:从预训练扩散模型中将2D结果提升到3D世界进行文本到3D生成本质上具有歧义性。2D扩散模型仅学习视图无关的先验,因此在提升过程中缺乏3D知识,导致多视图不一致问题。我们发现该问题主要源于几何不一致性,避免错误的几何结构能显著缓解最终输出中的问题。因此,我们通过在提升过程中对齐扩散模型中的2D几何先验与明确定义的3D形状来提高一致性,从而解决绝大部分问题。这通过微调2D扩散模型使其具备视角感知能力,并生成规范朝向3D对象的视角特定坐标映射来实现。在我们的过程中,仅使用粗略的3D信息进行对齐。这种"粗对齐"不仅解决了几何中的多视图不一致性,还保留了2D扩散模型生成在3D数据集中未见过的、细节丰富且多样化的高质量对象的能力。此外,我们的对齐几何先验(AGP)具有通用性,可无缝集成到各种最先进的管线中,在未见过的形状和视觉外观方面获得高泛化能力,同时极大缓解多视图不一致问题。我们的方法代表了新的最先进性能,通过人工评估达到85%以上的一致性率,而许多先前方法仅约为30%。我们的项目页面为 https://sweetdreamer3d.github.io/