Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies. However, current methods struggle to generate correct 3D content for a complex prompt in semantics, i.e., a prompt describing multiple interacted objects binding with different attributes. In this work, we propose a general framework named Progressive3D, which decomposes the entire generation into a series of locally progressive editing steps to create precise 3D content for complex prompts, and we constrain the content change to only occur in regions determined by user-defined region prompts in each editing step. Furthermore, we propose an overlapped semantic component suppression technique to encourage the optimization process to focus more on the semantic differences between prompts. Extensive experiments demonstrate that the proposed Progressive3D framework generates precise 3D content for prompts with complex semantics and is general for various text-to-3D methods driven by different 3D representations.
翻译:近期,得益于图像扩散模型与优化策略的进步,文本到3D生成方法已展现出令人瞩目的3D内容创作能力。然而,现有方法在处理包含复杂语义的提示(即描述多个具有不同属性的交互对象的提示)时,难以生成正确的3D内容。本文提出一个名为Progressive3D的通用框架,该框架将整个生成过程分解为一系列局部渐进式编辑步骤,从而为复杂提示生成精确的3D内容。在每个编辑步骤中,我们约束内容变更仅发生在由用户定义的区域提示所确定的区域内。此外,我们还提出一种重叠语义成分抑制技术,以促使优化过程更加关注提示之间的语义差异。大量实验表明,所提出的Progressive3D框架能为具有复杂语义的提示生成精确的3D内容,且通用适用于由不同3D表示驱动的多种文本到3D方法。