Virtual Reality (VR) headsets have become increasingly popular for remote collaboration, but video conferencing poses challenges when the user's face is covered by the headset. Existing solutions have limitations in terms of accessibility. In this paper, we propose HeadsetOff, a novel system that achieves photorealistic video conferencing on economical VR headsets by leveraging voice-driven face reconstruction. HeadsetOff consists of three main components: a multimodal attention-based predictor, a generator, and an adaptive controller. The predictor effectively predicts user future behavior based on different modalities. The generator employs voice input, head motion, and eye blink to animate the human face. The adaptive controller dynamically selects the appropriate generator model based on the trade-off between video quality and delay, aiming to maximize Quality of Experience while minimizing latency. Experimental results demonstrate the effectiveness of HeadsetOff in achieving high-quality, low-latency video conferencing on economical VR headsets.
翻译:虚拟现实(VR)头显在远程协作中日益普及,但当用户面部被头显遮挡时,视频会议面临挑战。现有解决方案在可访问性方面存在局限。本文提出HeadsetOff,一种通过语音驱动面部重建在经济型VR头显上实现逼真视频会议的新型系统。HeadsetOff包含三个核心组件:基于多模态注意力的预测器、生成器与自适应控制器。预测器能有效依据不同模态预测用户未来行为;生成器利用语音输入、头部运动与眨眼动作驱动人脸动画;自适应控制器根据视频质量与延迟的权衡动态选择适配的生成器模型,旨在最大化体验质量的同时最小化延迟。实验结果表明,HeadsetOff在经济型VR头显上实现了高质量、低延迟的视频会议效果。