Monocular facial performance capture in-the-wild is challenging due to varied capture conditions, face shapes, and expressions. Most current methods rely on linear 3D Morphable Models, which represent facial expressions independently of identity at the vertex displacement level. We propose SEREP (Semantic Expression Representation), a model that disentangles expression from identity at the semantic level. It first learns an expression representation from unpaired 3D facial expressions using a cycle consistency loss. Then we train a model to predict expression from monocular images using a novel semi-supervised scheme that relies on domain adaptation. In addition, we introduce MultiREX, a benchmark addressing the lack of evaluation resources for the expression capture task. Our experiments show that SEREP outperforms state-of-the-art methods, capturing challenging expressions and transferring them to novel identities.
翻译:野外单目面部性能捕捉因捕获条件、面部形状和表情的多样性而具有挑战性。当前大多数方法依赖于线性三维形变模型,这些模型在顶点位移层面独立于身份来表示面部表情。我们提出了SEREP(语义表情表示),这是一种在语义层面将表情与身份解耦的模型。它首先使用循环一致性损失从非配对的三维面部表情中学习表情表示。随后,我们采用一种依赖领域自适应的新型半监督方案,训练一个从单目图像预测表情的模型。此外,我们引入了MultiREX基准,以解决表情捕捉任务中评估资源匮乏的问题。我们的实验表明,SEREP在捕捉具有挑战性的表情并将其迁移到新身份方面,优于现有最先进的方法。