Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware.
翻译:头戴显示器(HMDs)是观察扩展现实(XR)环境和虚拟内容不可或缺的设备。然而,HMDs遮挡用户上半部分面部,对体外录制技术构成了障碍。这一限制严重影响了社交XR应用,特别是远程会议场景——其中面部特征和视线信息对营造沉浸式用户体验至关重要。本研究提出了一种基于生成对抗网络(GANs)的新型面向表情的视频修补网络EVI-HRnet(Expression-aware Video Inpainting for HMD Removal Network)。该模型能有效利用面部关键点及单张无遮挡参考图像填补缺失信息。通过参考帧机制,框架及其组件确保了用户身份在帧间的连续性。为提升修补输出的真实感,我们引入了一种新颖的面部表情识别(FER)损失函数以保持情感信息。实验结果表明,本框架在移除面部视频中HMD的同时,能够保持目标的面部表情与身份特征。此外,输出结果中的修补帧间呈现出时间一致性。这种轻量化框架为HMD遮挡移除提供了实用方案,无需额外硬件即可增强各类协作式XR应用。