The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consistency, quality, and speed, which is crucial for advancing texture generation to real-world applications, remains elusive. To that end, we introduce Meta 3D TextureGen: a new feedforward method comprised of two sequential networks aimed at generating high-quality and globally consistent textures for arbitrary geometries of any complexity degree in less than 20 seconds. Our method achieves state-of-the-art results in quality and speed by conditioning a text-to-image model on 3D semantics in 2D space and fusing them into a complete and high-resolution UV texture map, as demonstrated by extensive qualitative and quantitative evaluations. In addition, we introduce a texture enhancement network that is capable of up-scaling any texture by an arbitrary ratio, producing 4k pixel resolution textures.
翻译:近期,文本到图像模型的可用性和适应性开启了许多相关领域的新纪元,这些领域受益于学习到的文本先验以及高质量、快速的生成能力,其中之一便是三维物体的纹理生成。尽管最近的纹理生成方法通过使用文本到图像网络取得了令人印象深刻的结果,但全局一致性、质量和速度的结合——这对于推动纹理生成走向实际应用至关重要——仍然难以实现。为此,我们提出了 Meta 3D TextureGen:一种新的前馈方法,由两个顺序网络组成,旨在为任意复杂度的几何体生成高质量且全局一致的纹理,耗时少于 20 秒。我们的方法通过在二维空间中利用三维语义信息来调节文本到图像模型,并将其融合成完整的高分辨率 UV 纹理贴图,从而在质量和速度上达到了最先进水平,这已通过广泛的定性和定量评估得到验证。此外,我们引入了一个纹理增强网络,该网络能够以任意比例对任何纹理进行超分辨率处理,生成 4K 像素分辨率的纹理。