Texturing 3D humans with semantic UV maps remains a challenge due to the difficulty of acquiring reasonably unfolded UV. Despite recent text-to-3D advancements in supervising multi-view renderings using large text-to-image (T2I) models, issues persist with generation speed, text consistency, and texture quality, resulting in data scarcity among existing datasets. We present TexDreamer, the first zero-shot multimodal high-fidelity 3D human texture generation model. Utilizing an efficient texture adaptation finetuning strategy, we adapt large T2I model to a semantic UV structure while preserving its original generalization capability. Leveraging a novel feature translator module, the trained model is capable of generating high-fidelity 3D human textures from either text or image within seconds. Furthermore, we introduce ArTicuLated humAn textureS (ATLAS), the largest high-resolution (1024 X 1024) 3D human texture dataset which contains 50k high-fidelity textures with text descriptions.
翻译:使用语义UV图对三维人体进行纹理映射仍是一项挑战,原因在于获取合理展开的UV图存在困难。尽管近年来利用大规模文生图模型监督多视角渲染的文到三维技术取得进展,但在生成速度、文本一致性与纹理质量方面仍存在问题,导致现有数据集存在数据稀缺性。我们提出TexDreamer,这是首个零样本多模态高保真三维人体纹理生成模型。通过高效的纹理适配微调策略,我们在保留大规模文生图模型原有泛化能力的同时,将其适配至语义UV结构。基于新型特征转换器模块,该训练模型可在数秒内从文本或图像生成高保真三维人体纹理。此外,我们引入了ATLAS(关节式人体纹理集),这是目前最大规模的高分辨率(1024×1024)三维人体纹理数据集,包含5万个带有文本描述的高保真纹理。