Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, part-aware text-to-3D generation. DreamPartGen introduces Duplex Part Latents (DPLs) that jointly model each part's geometry and appearance, and Relational Semantic Latents (RSLs) that capture inter-part dependencies derived from language. A synchronized co-denoising process enforces mutual geometric and semantic consistency, enabling coherent, interpretable, and text-aligned 3D synthesis. Across multiple benchmarks, DreamPartGen delivers state-of-the-art performance in geometric fidelity and text-shape alignment.
翻译:理解并生成由有意义部件组成的3D对象是人类感知与推理的基础。然而,多数文本到三维方法忽略了部件的语义与功能结构。尽管近期部件感知方法引入了分解机制,但它们仍主要聚焦几何结构,缺乏语义锚定,且未能建模部件与文本描述之间的对应关系及部件间的相互关系。我们提出DreamPartGen——一个用于语义驱动的、部件感知的文本到三维生成框架。DreamPartGen引入双工部件潜在变量(DPLs),共同建模每个部件的几何与外观;同时引入关系语义潜在变量(RSLs),捕捉源自语言的部件间依赖关系。同步协同去噪过程强制执行相互的几何与语义一致性,从而实现连贯、可解释且与文本对齐的三维合成。在多个基准测试上,DreamPartGen在几何保真度与文本-形状对齐方面均达到最优性能。