We present X-Avatar, a novel avatar model that captures the full expressiveness of digital humans to bring about life-like experiences in telepresence, AR/VR and beyond. Our method models bodies, hands, facial expressions and appearance in a holistic fashion and can be learned from either full 3D scans or RGB-D data. To achieve this, we propose a part-aware learned forward skinning module that can be driven by the parameter space of SMPL-X, allowing for expressive animation of X-Avatars. To efficiently learn the neural shape and deformation fields, we propose novel part-aware sampling and initialization strategies. This leads to higher fidelity results, especially for smaller body parts while maintaining efficient training despite increased number of articulated bones. To capture the appearance of the avatar with high-frequency details, we extend the geometry and deformation fields with a texture network that is conditioned on pose, facial expression, geometry and the normals of the deformed surface. We show experimentally that our method outperforms strong baselines in both data domains both quantitatively and qualitatively on the animation task. To facilitate future research on expressive avatars we contribute a new dataset, called X-Humans, containing 233 sequences of high-quality textured scans from 20 participants, totalling 35,500 data frames.
翻译:我们提出了X-Avatar,一种新颖的化身模型,它能够捕捉数字人类的全方位表现力,从而在远程呈现、增强现实/虚拟现实及其他领域带来逼真的体验。我们的方法以整体方式建模身体、手部、面部表情和外观,并能从完整的三维扫描或RGB-D数据中学习。为实现这一目标,我们提出了一种可感知部分的习得前向蒙皮模块,该模块可由SMPL-X参数空间驱动,从而支持X-Avatar的表现力动画。为了高效学习神经形状和变形场,我们提出了新颖的基于部分感知的采样和初始化策略。这能够在增加关节点数量的情况下,获得更高保真度的结果,尤其是针对较小的身体部位,同时保持高效训练。为了捕捉具有高频细节的化身外观,我们扩展了几何和变形场,引入了一个纹理网络,该网络以姿态、面部表情、几何以及变形表面的法线为条件。实验证明,我们的方法在动画任务上,无论是在数据领域还是定量定性方面,均优于强基线方法。为促进表现力化身领域的未来研究,我们贡献了一个新数据集X-Humans,其中包含来自20位参与者的233个高质量纹理扫描序列,共计35,500个数据帧。