Modern generators render talking-head videos with impressive levels of photorealism, ushering in new user experiences such as videoconferencing under constrained bandwidth budgets. Their safe adoption, however, requires a mechanism to verify if the rendered video is trustworthy. For instance, for videoconferencing we must identify cases in which a synthetic video portrait uses the appearance of an individual without their consent. We term this task avatar fingerprinting. We propose to tackle it by leveraging facial motion signatures unique to each person. Specifically, we learn an embedding in which the motion signatures of one identity are grouped together, and pushed away from those of other identities, regardless of the appearance in the synthetic video. Avatar fingerprinting algorithms will be critical as talking head generators become more ubiquitous, and yet no large scale datasets exist for this new task. Therefore, we contribute a large dataset of people delivering scripted and improvised short monologues, accompanied by synthetic videos in which we render videos of one person using the facial appearance of another. Project page: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.
翻译:现代生成器能够以令人惊叹的逼真度渲染头部特写视频,从而带来诸如带宽受限下的视频会议等全新用户体验。然而,其安全应用需要一种机制来验证渲染视频的可信度。例如,在视频会议中,我们必须识别合成视频肖像未经他人同意而使用其外貌的情况。我们将此任务称为化身指纹识别。我们提出通过利用每个人独有的面部运动特征来解决此问题。具体而言,我们学习一种嵌入表示,其中同一身份的运动特征被聚集在一起,并与来自其他身份的运动特征分离,无论合成视频中的外貌如何。随着头部特写生成器变得越来越普遍,化身指纹识别算法将变得至关重要,但尚不存在针对这一新任务的大规模数据集。因此,我们贡献了一个大型数据集,其中包含人物进行有剧本和即兴短独白的内容,并附有合成视频——这些视频利用一个人的面部外貌来渲染另一个人的形象。项目页面:https://research.nvidia.com/labs/nxp/avatar-fingerprinting/。