We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then decoded into a final texture map. During the sampling process, we focus on both global and local consistency across multiple viewpoints: global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network, and low-level consistency is achieved by dynamically aligning latent textures. Finally, we apply reference-based inpainting and img2img on denser views for texture refinement. Our approach overcomes the limitations of slow optimization in distillation-based methods and instability in inpainting-based methods. Experiments on meshes from various sources demonstrate that our method surpasses the baseline methods quantitatively and qualitatively.
翻译:我们提出GenesisTex,一种从文本描述为三维几何体合成纹理的新方法。GenesisTex通过纹理空间采样将预训练的图像扩散模型自适应到纹理空间。具体而言,我们为每个视角维护一个潜在纹理图,该纹理图根据对应视角渲染图上的预测噪声进行更新,然后对采样得到的潜在纹理图进行解码以生成最终纹理图。在采样过程中,我们兼顾多视角间的全局一致性与局部一致性:全局一致性通过噪声预测网络中融入的风格一致性机制实现,低层级一致性则通过动态对齐潜在纹理达成。最后,我们在更密集视角上应用基于参考的修补与图像到图像转换以精细化纹理。该方法克服了基于蒸馏方法优化速度慢、基于修补方法不稳定的局限性。在多种来源网格上的实验表明,本方法在定量与定性指标上均优于基线方法。