To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias and explicitly infuse geometric understanding into the generation pipeline. Firstly, we employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure, thereby reducing the inherent viewpoint bias. Next, we utilize deep geometric moments to ensure geometric consistency in the 3D representation explicitly. By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects, thereby improving the quality and usability of our 3D representations.
翻译:为解决三维资产数据稀缺问题,二维提升技术(如分数蒸馏采样)已成为文本到三维生成流程中广泛采用的方法。然而,这些技术中使用的扩散模型容易受到视角偏差的影响,从而导致几何不一致问题(如双面神问题)。为此,我们提出MT3D——一种文本到三维生成模型,该模型利用高保真三维对象来克服视角偏差,并将几何理解显式注入生成流程。首先,我们采用源自高质量三维模型的深度图作为控制信号,确保生成的二维图像保持基本形状与结构,从而降低固有视角偏差。其次,我们利用深度几何矩显式保证三维表示中的几何一致性。通过融合三维资产的几何细节,MT3D能够生成多样化且几何一致的对象,从而提升三维表示的质量与可用性。