High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. Existing representations either rely on UV maps, which suffer from distortion during unwrapping, or point-based methods, which tightly couple texture fidelity to geometric density that limits high-resolution texture generation. To address these limitations, we introduce TexSpot, a diffusion-based texture enhancement framework. At its core is Texlet, a novel 3D texture representation that merges the geometric expressiveness of point-based 3D textures with the compactness of UV-based representation. Each Texlet latent vector encodes a local texture patch via a 2D encoder and is further aggregated using a 3D encoder to incorporate global shape context. A cascaded 3D-to-2D decoder reconstructs high-quality texture patches, enabling the Texlet space learning. Leveraging this representation, we train a diffusion transformer conditioned on Texlets to refine and enhance textures produced by multi-view diffusion methods. Extensive experiments demonstrate that TexSpot significantly improves visual fidelity, geometric consistency, and robustness over existing state-of-the-art 3D texture generation and enhancement approaches. Project page: https://texlet-arch.github.io/TexSpot-page.
翻译:高质量三维纹理生成仍是一项基础性挑战,这主要源于当前主流多视角扩散流程中固有的视角不一致性问题。现有表示方法要么依赖UV贴图(其在展开过程中会产生畸变),要么采用基于点的方法(其纹理保真度与几何密度紧密耦合,限制了高分辨率纹理的生成)。为克服这些局限,本文提出TexSpot——一种基于扩散的纹理增强框架。其核心是Texlet,这是一种新颖的三维纹理表示方法,它融合了点基三维纹理的几何表现力与UV表示法的紧凑性。每个Texlet潜在向量通过二维编码器编码局部纹理块,并进一步通过三维编码器聚合以融入全局形状上下文。级联式三维到二维解码器重建高质量纹理块,从而实现Texlet空间学习。基于此表示,我们训练了一个以Texlet为条件的扩散Transformer,用于优化和增强由多视角扩散方法生成的纹理。大量实验表明,TexSpot在视觉保真度、几何一致性和鲁棒性方面显著优于现有最先进的三维纹理生成与增强方法。项目页面:https://texlet-arch.github.io/TexSpot-page。