SMPL-GPTexture: Dual-View 3D Human Texture Estimation using Text-to-Image Generation Models

Generating high-quality, photorealistic textures for 3D human avatars remains a fundamental yet challenging task in computer vision and multimedia field. However, real paired front and back images of human subjects are rarely available with privacy, ethical and cost of acquisition, which restricts scalability of the data. Additionally, learning priors from image inputs using deep generative models, such as GANs or diffusion models, to infer unseen regions such as the human back often leads to artifacts, structural inconsistencies, or loss of fine-grained detail. To address these issues, we present SMPL-GPTexture (skinned multi-person linear model - general purpose Texture), a novel pipeline that takes natural language prompts as input and leverages a state-of-the-art text-to-image generation model to produce paired high-resolution front and back images of a human subject as the starting point for texture estimation. Using the generated paired dual-view images, we first employ a human mesh recovery model to obtain a robust 2D-to-3D SMPL alignment between image pixels and the 3D model's UV coordinates for each views. Second, we use an inverted rasterization technique that explicitly projects the observed colour from the input images into the UV space, thereby producing accurate, complete texture maps. Finally, we apply a diffusion-based inpainting module to fill in the missing regions, and the fusion mechanism then combines these results into a unified full texture map. Extensive experiments shows that our SMPL-GPTexture can generate high resolution texture aligned with user's prompts.

翻译：为三维人体化身生成高质量、逼真的纹理，仍是计算机视觉与多媒体领域一项基础但具有挑战性的任务。然而，由于隐私、伦理及采集成本问题，人体对象真实的正背面对图像数据极为匮乏，严重制约了数据的可扩展性。此外，利用深度生成模型（如GAN或扩散模型）从图像输入中学习先验以推断人体背面等不可见区域时，常导致伪影、结构不一致或细粒度细节丢失。为解决上述问题，我们提出SMPL-GPTexture（蒙皮多人体线性模型-通用纹理）——一种创新流水线：以自然语言提示为输入，并利用先进的文本到图像生成模型生成人体对象成对的高分辨率正背面图像，作为纹理估计的起点。首先，利用生成的成对双视角图像，我们采用人体网格恢复模型实现每张图像像素与三维模型UV坐标间稳健的二维到三维SMPL对齐。其次，采用反向光栅化技术，将输入图像中的观测颜色显式投影至UV空间，从而生成精确且完整的纹理贴图。最后，应用基于扩散模型的修复模块填补缺失区域，并通过融合机制将结果整合为统一的完整纹理贴图。大量实验表明，我们的SMPL-GPTexture能够生成与用户提示对齐的高分辨率纹理。