Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose. The code and data will be made available on the homepage https://alex-jyj.github.io/UV-Gaussians/ once the paper is accepted.
翻译:摘要:从多视角图像序列重建照片级可驱动人体头像一直是计算机视觉与图形学领域热门且具挑战性的课题。现有基于NeRF的方法虽能实现高质量人体模型的新视角渲染,但其训练与推理过程均耗时较长。近期方法采用3D高斯表征人体,实现了更快的训练与渲染速度,但这类方法忽视了网格引导的重要性,仅以粗糙的网格引导在3D空间中直接预测高斯分布,阻碍了高斯的学习过程并易产生模糊纹理。为此,我们提出UV高斯模型,通过联合学习网格形变与二维UV空间的高斯纹理来实现3D人体建模。该方法利用UV映射嵌入在二维空间中学习高斯纹理,充分发挥强大2D网络的特征提取能力。此外,通过独立的网格网络优化姿态相关的几何形变,从而引导高斯渲染并显著提升渲染质量。我们采集并处理了包含多视角图像、扫描模型、参数化模型配准及对应纹理图的新人体运动数据集。实验结果表明,本方法在新视角与新姿态合成任务中均达到最优水平。论文接收后,相关代码与数据将在主页https://alex-jyj.github.io/UV-Gaussians/上开源。