Instant3D: Instant Text-to-3D Generation

Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. In particular, we propose to combine three key mechanisms: cross-attention, style injection, and token-to-plane transformation, which collectively ensure precise alignment of the output with the input text. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The code, data, and models are available at https://github.com/ming1993li/Instant3DCodes.

翻译：文本到三维生成已引起计算机视觉社区的广泛关注。现有方法主要针对每个文本提示从头优化神经场，依赖大量重复的训练成本，这阻碍了其实际部署。本文提出一种名为Instant3D的新型快速文本到三维生成框架。训练完成后，Instant3D可通过单次前馈网络运行，在一秒内为未见过的文本提示创建三维物体。我们通过设计一种能从文本提示直接构建三维三平面的新网络实现了这一显著速度。Instant3D的核心创新在于探索有效将文本条件注入网络的策略。具体而言，我们提出结合三种关键机制：交叉注意力、风格注入和令牌到平面变换，这些机制共同确保输出与输入文本的精确对齐。此外，我们提出一种简单有效的激活函数——缩放Sigmoid，用于替代原始Sigmoid函数，将训练收敛速度提升十倍以上。最后，为解决三维生成中的Janus（多头）问题，我们提出一种自适应Perp-Neg算法，该算法可根据训练过程中Janus问题的严重程度动态调整其概念否定尺度，有效减少多头效应。在多种基准数据集上的大量实验表明，所提算法在定性和定量方面均优于最新方法，同时实现了显著更高的效率。代码、数据和模型可在https://github.com/ming1993li/Instant3DCodes获取。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日