Advances in face swapping have enabled the automatic generation of highly realistic faces. Yet face swaps are perceived differently than when looking at real faces, with key differences in viewer behavior surrounding the eyes. Face swapping algorithms generally place no emphasis on the eyes, relying on pixel or feature matching losses that consider the entire face to guide the training process. We further investigate viewer perception of face swaps, focusing our analysis on the presence of an uncanny valley effect. We additionally propose a novel loss equation for the training of face swapping models, leveraging a pretrained gaze estimation network to directly improve representation of the eyes. We confirm that viewed face swaps do elicit uncanny responses from viewers. Our proposed improvements significant reduce viewing angle errors between face swaps and their source material. Our method additionally reduces the prevalence of the eyes as a deciding factor when viewers perform deepfake detection tasks. Our findings have implications on face swapping for special effects, as digital avatars, as privacy mechanisms, and more; negative responses from users could limit effectiveness in said applications. Our gaze improvements are a first step towards alleviating negative viewer perceptions via a targeted approach.
翻译:换脸技术的进步使得自动生成了高度逼真的人脸成为可能。然而,观众在观看换脸图像时的感知与真实人脸存在差异,关键区别在于对眼睛区域的注视行为。现有换脸算法通常未对眼睛区域给予特别关注,而是依赖考虑整个人脸的像素或特征匹配损失来指导训练过程。我们进一步研究了观众对换脸图像的感知,重点分析了其中是否存在恐怖谷效应。此外,我们提出了一种用于训练换脸模型的新型损失方程,利用预训练的凝视估计网络直接改进对眼睛区域的表征。实验证实,观看的换脸图像确实会引发观众的恐怖谷反应。我们提出的改进方法显著减少了换脸图像与源材料之间的视角误差。同时,该方法降低了观众在执行深度伪造检测任务时将眼睛作为关键决定因素的比例。本研究结果对换脸技术在特效制作、数字化身、隐私保护等领域的应用具有启示意义——用户的负面反应可能限制这些应用的有效性。我们提出的凝视改进方案是朝着通过定向方法缓解观众负面感知迈出的第一步。