VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence

We introduce VOODOO XP: a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait. Our solution is real-time, view-consistent, and can be instantly used without calibration or fine-tuning. We demonstrate our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communication. Compared to 2D head reenactment methods, 3D-aware approaches aim to preserve the identity of the subject and ensure view-consistent facial geometry for novel camera poses, which makes them suitable for immersive applications. While various facial disentanglement techniques have been introduced, cutting-edge 3D-aware neural reenactment techniques still lack expressiveness and fail to reproduce complex and fine-scale facial expressions. We present a novel cross-reenactment architecture that directly transfers the driver's facial expressions to transformer blocks of the input source's 3D lifting module. We show that highly effective disentanglement is possible using an innovative multi-stage self-supervision approach, which is based on a coarse-to-fine strategy, combined with an explicit face neutralization and 3D lifted frontalization during its initial training stage. We further integrate our novel head reenactment solution into an accessible high-fidelity VR telepresence system, where any person can instantly build a personalized neural head avatar from any photo and bring it to life using the headset. We demonstrate state-of-the-art performance in terms of expressiveness and likeness preservation on a large set of diverse subjects and capture conditions.

翻译：本文介绍VOODOO XP：一种3D感知的单次头部重演方法，能够从任意驱动视频和单张二维人像生成高表现力的面部表情。我们的解决方案具备实时性、视角一致性，且无需校准或微调即可即时使用。我们在单目视频场景及双向通信的端到端VR远程呈现系统中展示了该方案。与二维头部重演方法相比，3D感知方法旨在保持主体身份特征，并确保新颖相机视角下具有视角一致的面部几何结构，这使其适用于沉浸式应用。尽管已有多种面部解耦技术被提出，但前沿的3D感知神经重演技术仍缺乏表现力，难以复现复杂且精细的面部表情。我们提出了一种新颖的交叉重演架构，直接将驱动者的面部表情传递至输入源3D提升模块的transformer块中。我们证明，通过采用基于由粗到精策略的创新多阶段自监督方法，并结合初始训练阶段显式的面部中性化与3D提升正面化处理，可以实现高效的解耦。我们进一步将这一新型头部重演方案集成到易用的高保真VR远程呈现系统中，任何用户均可从任意照片即时构建个性化神经头部化身，并通过头戴设备使其动态呈现。我们在大量多样化主体和采集条件下，展示了该方法在表现力与相似度保持方面的最先进性能。

相关内容

关注 23

IEEE虚拟现实会议一直是展示虚拟现实(VR)广泛领域研究成果的主要国际场所，包括增强现实（AR），混合现实（MR）和3D用户界面中寻求高质量的原创论文。每篇论文应归类为主要涵盖研究，应用程序或系统，并使用以下准则进行分类：研究论文应描述有助于先进软件，硬件，算法，交互或人为因素发展的结果。应用论文应解释作者如何基于现有思想并将其应用到以新颖的方式解决有趣的问题。每篇论文都应包括对给定应用领域中VR/AR/MR使用成功的评估。官网地址：http://dblp.uni-trier.de/db/conf/vr/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日