Instant3D: Instant Text-to-3D Generation

Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.

翻译：文本到三维生成旨在根据文本提示合成生动的三维物体，已引起计算机视觉领域的广泛关注。尽管现有若干方法在此任务上取得了令人瞩目的成果，但它们主要依赖于耗时的优化范式。具体而言，这些方法为每个文本提示从头开始优化神经场，生成一个物体需耗时约一小时或更久。这种繁重且重复的训练成本阻碍了其实际部署。本文提出了一种名为Instant3D的新型框架，用于快速实现文本到三维生成。经过训练后，Instant3D能够通过单次前馈网络运行，在不到一秒内为未见过的文本提示创建三维物体。我们通过设计一个新网络，直接从文本提示构建三维三重平面，实现了这一显著速度。Instant3D的核心创新在于探索如何有效将文本条件注入网络的策略。此外，我们提出了一种简单而有效的激活函数——缩放Sigmoid函数，以替代原始Sigmoid函数，将训练收敛速度提升十倍以上。最后，为解决三维生成中的雅努斯（多头）问题，我们提出了一种自适应Perp-Neg算法，该算法能在训练过程中根据雅努斯问题的严重程度动态调整其概念否定尺度，有效减轻多面效应。在各类基准数据集上的大量实验表明，所提算法在定性和定量两方面均优于现有最先进方法，同时实现了显著更高的效率。项目页面位于https://ming1993li.github.io/Instant3DProj。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日