Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.
翻译:高斯点绘技术作为一种融合了显式(网格)与隐式(NeRF)3D表示优势的新型3D表示方法,展现出强大潜力。本文旨在利用高斯点绘技术从文本描述生成逼真的可动画化身,以解决基于网格或NeRF表示的方法在灵活性和效率等方面的局限性。然而,直接应用高斯点绘无法生成高质量的可动画化身,存在学习不稳定性问题,且难以捕捉精细的化身几何结构,常导致身体部位退化。针对这些问题,我们首先提出一种基于基元的3D高斯表示方法,将高斯函数定义在姿态驱动基元内以促进动画化。其次,为稳定并分摊数百万高斯参数的学习过程,我们提出使用神经隐式场预测高斯属性(如颜色)。最后,为捕捉精细化身几何结构并提取详尽网格,我们提出一种基于有符号距离函数(SDF)的隐式网格学习方法,用于正则化高斯底层几何并提取高细节纹理网格。所提出的GAvatar方法仅需文本提示即可大规模生成多样化可动画化身。在表观与几何质量上,GAvatar显著超越现有方法,且能以100帧/秒的极快速度渲染1K分辨率图像。