We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster.
翻译:本文介绍Meta 3D Gen(简称3DGen),一种全新的、最先进的快速文本到3D资产生成流程。3DGen能够在一分钟内生成具有高提示保真度、高质量3D形状和纹理的3D资产。它支持基于物理的渲染(PBR),这对于现实应用中3D资产的重新光照至关重要。此外,3DGen支持对先前生成(或艺术家创建)的3D形状,利用用户提供的额外文本输入进行生成式重新纹理化。3DGen集成了我们为文本到3D和文本到纹理生成分别开发的两个关键技术组件:Meta 3D AssetGen和Meta 3D TextureGen。通过结合两者的优势,3DGen能够同时在三种空间中表示3D对象:视图空间、体素空间和UV(或纹理)空间。这两种技术的集成相对于单阶段模型实现了68%的胜率。我们将3DGen与众多业界基线方法进行比较,结果表明,在处理复杂文本提示时,3DGen在提示保真度和视觉质量方面均优于这些基线方法,同时速度显著更快。