Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent human familiarity with faces, heightening the need for accurate and persuasive completions. In addressing challenges specifically related to occlusion removal in this context, our focus is on the progressive task of generating complete images from facial data covered by masks, ensuring both spatial and temporal coherence. Our study introduces a network designed for expression-based video inpainting, employing generative adversarial networks (GANs) to handle static and moving occlusions across all frames. By utilizing facial landmarks and an occlusion-free reference image, our model maintains the user's identity consistently across frames. We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs. Our proposed framework exhibits proficiency in eliminating occlusions from facial videos in an adaptive form, whether appearing static or dynamic on the frames, while providing realistic and coherent results.
翻译:面部视频修复在诸多应用中扮演关键角色,包括但不限于视频会议与远程医疗中的遮挡移除、面部表情分析增强、隐私保护、图形叠加集成以及虚拟化妆等。由于面部特征的复杂性及人类对脸部的先天敏感性,该领域面临严峻挑战,亟需实现精确且具有说服力的修复结果。针对遮挡移除这一具体难题,我们聚焦于从被口罩遮挡的面部数据逐步生成完整图像,确保空间与时间维度的一致性。本研究提出一种基于生成对抗网络(GANs)的表情感知视频修复网络,可处理所有帧中的静态与动态遮挡。通过利用面部关键点与无遮挡参考图像,我们的模型能在不同帧间保持用户身份一致性。我们进一步通过定制化面部表情识别(FER)损失函数增强情感保留能力,确保修复输出的细节精度。该框架能自适应地消除面部视频中的遮挡物(无论其在帧中呈现静态或动态特征),同时生成真实且连贯的修复结果。