Recent advances in 3D-aware GAN models have enabled the generation of realistic and controllable human body images. However, existing methods focus on the control of major body joints, neglecting the manipulation of expressive attributes, such as facial expressions, jaw poses, hand poses, and so on. In this work, we present XAGen, the first 3D generative model for human avatars capable of expressive control over body, face, and hands. To enhance the fidelity of small-scale regions like face and hands, we devise a multi-scale and multi-part 3D representation that models fine details. Based on this representation, we propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands to ease model training and enhance geometric quality. Furthermore, we design multi-part discriminators that evaluate the quality of the generated avatars with respect to their appearance and fine-grained control capabilities. Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities. Code and data will be made available at https://showlab.github.io/xagen.
翻译:三维感知生成对抗网络(GAN)模型的最新进展已实现逼真且可控的人体图像生成。然而,现有方法主要关注主要关节的控制,忽视了面部表情、下颌姿态、手部姿态等富有表现力属性的操控能力。本文提出首个具备人体、面部和手部表现力控制能力的三维人体化身生成模型——XAGen。为提升面部、手部等小尺度区域的保真度,我们设计了一种多尺度多部分三维表征来建模精细细节。基于该表征,我们提出多部分渲染技术,将人体、面部和手部的合成过程解耦,从而简化模型训练并增强几何质量。此外,我们设计了多部分判别器,评估生成化身在外观和细粒度控制能力方面的质量。实验表明,XAGen在真实感、多样性和表现力控制能力方面均超越现有最优方法。代码与数据将发布于https://showlab.github.io/xagen。