Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. LATTE3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization.
翻译:近期文本到三维生成方法虽然能产生令人印象深刻的三维结果,但需要耗费大量时间进行优化,每个提示需要长达一小时。ATT3D等摊销方法通过同时优化多个提示提升效率,实现了快速文本到三维合成,但无法捕捉高频几何与纹理细节,且难以扩展至大规模提示集,导致泛化能力较差。我们提出LATTE3D,解决了这些局限性,在显著更大的提示集上实现了快速高质量生成。该方法的关键在于:1) 构建可扩展架构,2) 在优化过程中通过三维感知扩散先验、形状正则化及模型初始化利用三维数据,从而对多样且复杂的训练提示具有鲁棒性。LATTE3D对神经场和纹理化表面生成进行摊销,仅需一次前向传播即可生成高度细节化的纹理网格。LATTE3D在400毫秒内即可生成三维物体,且可借助快速测试时优化进一步增强。