Towards Practical Capture of High-Fidelity Relightable Avatars

In this paper, we propose a novel framework, Tracking-free Relightable Avatar (TRAvatar), for capturing and reconstructing high-fidelity 3D avatars. Compared to previous methods, TRAvatar works in a more practical and efficient setting. Specifically, TRAvatar is trained with dynamic image sequences captured in a Light Stage under varying lighting conditions, enabling realistic relighting and real-time animation for avatars in diverse scenes. Additionally, TRAvatar allows for tracking-free avatar capture and obviates the need for accurate surface tracking under varying illumination conditions. Our contributions are two-fold: First, we propose a novel network architecture that explicitly builds on and ensures the satisfaction of the linear nature of lighting. Trained on simple group light captures, TRAvatar can predict the appearance in real-time with a single forward pass, achieving high-quality relighting effects under illuminations of arbitrary environment maps. Second, we jointly optimize the facial geometry and relightable appearance from scratch based on image sequences, where the tracking is implicitly learned. This tracking-free approach brings robustness for establishing temporal correspondences between frames under different lighting conditions. Extensive qualitative and quantitative experiments demonstrate that our framework achieves superior performance for photorealistic avatar animation and relighting.

翻译：本文提出了一种新颖框架——无跟踪可重光照虚拟化身（TRAvatar），用于捕获与重建高保真三维虚拟化身。与先前方法相比，TRAvatar 在更实用且高效的设置下运行。具体而言，TRAvatar 通过在光场中采集的动态图像序列（包含多种光照条件）进行训练，实现了虚拟化身在多样化场景中的逼真重光照与实时动画功能。此外，TRAvatar 支持无跟踪的虚拟化身捕获，并避免了对不同光照条件下精确表面跟踪的需求。本文贡献体现在两方面：第一，我们提出了一种新型网络架构，该架构明确基于并确保满足光照的线性特性。通过简单分组光照数据训练，TRAvatar 可通过单次前向传播实时预测外观，在任意环境贴图的光照下实现高质量重光照效果。第二，我们基于图像序列从零开始联合优化面部几何与可重光照外观，其中跟踪过程以隐式方式学习。这种无跟踪方法增强了不同光照条件下帧间时序对应关系的鲁棒性。大量定性与定量实验表明，本框架在逼真虚拟化身动画与重光照任务中取得了优越性能。

相关内容