This paper presents a new text-guided technique for generating 3D shapes. The technique leverages a hybrid 3D shape representation, namely EXIM, combining the strengths of explicit and implicit representations. Specifically, the explicit stage controls the topology of the generated 3D shapes and enables local modifications, whereas the implicit stage refines the shape and paints it with plausible colors. Also, the hybrid approach separates the shape and color and generates color conditioned on shape to ensure shape-color consistency. Unlike the existing state-of-the-art methods, we achieve high-fidelity shape generation from natural-language descriptions without the need for time-consuming per-shape optimization or reliance on human-annotated texts during training or test-time optimization. Further, we demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes. Through extensive experiments, we demonstrate the compelling quality of our results and the high coherency of our generated shapes with the input texts, surpassing the performance of existing methods by a significant margin. Codes and models are released at https://github.com/liuzhengzhe/EXIM.
翻译:本文提出了一种新的文本引导三维形状生成技术。该技术利用名为EXIM的混合三维形状表示,结合了显式表示与隐式表示的优势。具体而言,显式阶段控制生成三维形状的拓扑结构并支持局部修改,而隐式阶段则优化形状细节并赋予其合理的色彩。此外,这种混合方法将形状与色彩分离,并基于形状生成色彩以确保形状-色彩一致性。与现有最先进方法不同,我们无需在训练或测试优化阶段进行耗时的逐形状优化或依赖人工标注文本,即可从自然语言描述中生成高保真三维形状。进一步地,我们展示了该方法在利用文本诱导的三维形状生成风格一致室内场景中的适用性。通过大量实验,我们证明了结果的卓越质量以及生成形状与输入文本的高度一致性,其性能显著超越了现有方法。代码与模型已发布于 https://github.com/liuzhengzhe/EXIM 。