Monocular facial performance capture in-the-wild is challenging due to varied capture conditions, face shapes, and expressions. Most current methods rely on linear 3D Morphable Models, which represent facial expressions independently of identity at the vertex displacement level. We propose SEREP (Semantic Expression Representation), a model that disentangles expression from identity at the semantic level. It first learns an expression representation from unpaired 3D facial expressions using a cycle consistency loss. Then we train a model to predict expression from monocular images using a novel semi-supervised scheme that relies on domain adaptation. In addition, we introduce MultiREX, a benchmark addressing the lack of evaluation resources for the expression capture task. Our experiments show that SEREP outperforms state-of-the-art methods, capturing challenging expressions and transferring them to novel identities.
翻译:单目野外面部表情捕捉因拍摄条件、面部形状和表情的多样性而具有挑战性。当前大多数方法依赖于线性三维可形变模型,这些模型在顶点位移层面独立于身份信息对面部表情进行建模。我们提出SEREP(语义化表情表示模型),这是一种在语义层面将表情与身份信息解耦的模型。该模型首先利用循环一致性损失从无配对的三维面部表情数据中学习表情表示。随后,我们通过一种基于领域自适应的新型半监督训练方案,训练从单目图像预测表情的模型。此外,我们提出了MultiREX基准数据集,以解决当前表情捕捉任务中评估资源匮乏的问题。实验表明,SEREP在捕捉复杂表情并将其迁移至新身份方面优于现有最优方法。