We present X-Avatar, a novel avatar model that captures the full expressiveness of digital humans to bring about life-like experiences in telepresence, AR/VR and beyond. Our method models bodies, hands, facial expressions and appearance in a holistic fashion and can be learned from either full 3D scans or RGB-D data. To achieve this, we propose a part-aware learned forward skinning module that can be driven by the parameter space of SMPL-X, allowing for expressive animation of X-Avatars. To efficiently learn the neural shape and deformation fields, we propose novel part-aware sampling and initialization strategies. This leads to higher fidelity results, especially for smaller body parts while maintaining efficient training despite increased number of articulated bones. To capture the appearance of the avatar with high-frequency details, we extend the geometry and deformation fields with a texture network that is conditioned on pose, facial expression, geometry and the normals of the deformed surface. We show experimentally that our method outperforms strong baselines in both data domains both quantitatively and qualitatively on the animation task. To facilitate future research on expressive avatars we contribute a new dataset, called X-Humans, containing 233 sequences of high-quality textured scans from 20 participants, totalling 35,500 data frames.
翻译:我们提出了X-Avatar,这是一种新颖的化身模型,能够捕捉数字人类的全部表现力,从而在远程临场、AR/VR及更广泛领域带来栩栩如生的体验。我们的方法以整体方式对身体、手部、面部表情及外观进行建模,并且可以从完整的三维扫描数据或RGB-D数据中学习。为实现这一目标,我们提出了一种部件感知的学习式前向蒙皮模块,该模块可由SMPL-X的参数空间驱动,从而实现对X-Avatar的表现力动画。为了高效地学习神经形状和形变场,我们提出了新颖的部件感知采样与初始化策略。这能在保持高效训练的同时,获得更高保真度的结果,尤其对于较小的身体部件,尽管关节数量有所增加。为了捕捉具有高频细节的化身外观,我们在几何与形变场的基础上扩展了一个纹理网络,该网络以姿态、面部表情、几何信息以及形变表面的法线为条件。我们通过实验表明,在动画任务上,我们的方法在数据域中无论在定量还是定性上都优于强基线模型。为了促进表现力化身领域的未来研究,我们贡献了一个名为X-Humans的新数据集,其中包含来自20名参与者的233个序列的高质量纹理扫描数据,总计35,500个数据帧。