Retrieval-Augmented Score Distillation for Text-to-3D Generation

Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model's 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.

翻译：文本生成3D通过引入强大的2D扩散模型取得了显著成功，但缺乏3D先验知识导致三维几何不一致性问题。近期，随着大规模多视图数据集的发布，在多视图数据集上微调扩散模型成为解决3D不一致性问题的主流方法。然而，该方法面临根本性困难：相较于2D数据，3D数据在质量和多样性方面存在局限性。为规避这些权衡，我们探索了一种专门针对得分蒸馏的检索增强方法，命名为ReDream。我们假设通过直接在优化过程中利用语义相关的3D资产，可以充分发挥2D扩散模型的表达能力和3D资产几何一致性。为此，我们提出了一种新颖的基于检索的文本生成3D质量增强框架。我们利用检索到的资产将几何先验整合到变分目标中，并调整扩散模型的2D先验以增强视图一致性，从而在生成场景的几何结构和保真度方面实现显著改进。大量实验表明，ReDream在保持更高几何一致性的同时展现出卓越的生成质量。项目页面访问地址：https://ku-cvlab.github.io/ReDream/。

相关内容

ASSETS

关注 0

ACM SIGACCESS Conference on Computers and Accessibility是为残疾人和老年人提供与计算机相关的设计、评估、使用和教育研究的首要论坛。我们欢迎提交原始的高质量的有关计算和可访问性的主题。今年，ASSETS首次将其范围扩大到包括关于计算机无障碍教育相关主题的原创高质量研究。官网链接：http://assets19.sigaccess.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日