We introduce VOODOO XP: a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait. Our solution is real-time, view-consistent, and can be instantly used without calibration or fine-tuning. We demonstrate our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communication. Compared to 2D head reenactment methods, 3D-aware approaches aim to preserve the identity of the subject and ensure view-consistent facial geometry for novel camera poses, which makes them suitable for immersive applications. While various facial disentanglement techniques have been introduced, cutting-edge 3D-aware neural reenactment techniques still lack expressiveness and fail to reproduce complex and fine-scale facial expressions. We present a novel cross-reenactment architecture that directly transfers the driver's facial expressions to transformer blocks of the input source's 3D lifting module. We show that highly effective disentanglement is possible using an innovative multi-stage self-supervision approach, which is based on a coarse-to-fine strategy, combined with an explicit face neutralization and 3D lifted frontalization during its initial training stage. We further integrate our novel head reenactment solution into an accessible high-fidelity VR telepresence system, where any person can instantly build a personalized neural head avatar from any photo and bring it to life using the headset. We demonstrate state-of-the-art performance in terms of expressiveness and likeness preservation on a large set of diverse subjects and capture conditions.
翻译:本文介绍VOODOO XP:一种3D感知的单次头部重演方法,能够从任意驱动视频和单张二维人像生成高表现力的面部表情。我们的解决方案具备实时性、视角一致性,且无需校准或微调即可即时使用。我们在单目视频场景及双向通信的端到端VR远程呈现系统中展示了该方案。与二维头部重演方法相比,3D感知方法旨在保持主体身份特征,并确保新颖相机视角下具有视角一致的面部几何结构,这使其适用于沉浸式应用。尽管已有多种面部解耦技术被提出,但前沿的3D感知神经重演技术仍缺乏表现力,难以复现复杂且精细的面部表情。我们提出了一种新颖的交叉重演架构,直接将驱动者的面部表情传递至输入源3D提升模块的transformer块中。我们证明,通过采用基于由粗到精策略的创新多阶段自监督方法,并结合初始训练阶段显式的面部中性化与3D提升正面化处理,可以实现高效的解耦。我们进一步将这一新型头部重演方案集成到易用的高保真VR远程呈现系统中,任何用户均可从任意照片即时构建个性化神经头部化身,并通过头戴设备使其动态呈现。我们在大量多样化主体和采集条件下,展示了该方法在表现力与相似度保持方面的最先进性能。