Recent years have witnessed considerable achievements in facial avatar reconstruction with neural volume rendering. Despite notable advancements, the reconstruction of complex and dynamic head movements from monocular videos still suffers from capturing and restoring fine-grained details. In this work, we propose a novel approach, named Tri$^2$-plane, for monocular photo-realistic volumetric head avatar reconstructions. Distinct from the existing works that rely on a single tri-plane deformation field for dynamic facial modeling, the proposed Tri$^2$-plane leverages the principle of feature pyramids and three top-to-down lateral connections tri-planes for details improvement. It samples and renders facial details at multiple scales, transitioning from the entire face to specific local regions and then to even more refined sub-regions. Moreover, we incorporate a camera-based geometry-aware sliding window method as an augmentation in training, which improves the robustness beyond the canonical space, with a particular improvement in cross-identity generation capabilities. Experimental outcomes indicate that the Tri$^2$-plane not only surpasses existing methodologies but also achieves superior performance across both quantitative metrics and qualitative assessments through experiments.
翻译:近年来,基于神经体渲染的面部虚拟头像重建取得了显著进展。尽管技术不断突破,但从单目视频中重建复杂动态头部运动时,仍存在难以捕捉与还原精细细节的问题。本文提出了一种名为Tri$^2$-plane的新方法,用于实现单目照片级逼真的体积化头部虚拟头像重建。与现有依赖单一三平面形变场进行动态面部建模的工作不同,所提出的Tri$^2$-plane利用特征金字塔原理,通过三条自上而下的横向连接三平面来提升细节质量。该方法在多个尺度上采样并渲染面部细节,实现从整体面部到特定局部区域、再到更精细子区域的渐进式过渡。此外,我们融入了一种基于相机的几何感知滑动窗口方法作为训练增强手段,有效提升了超出规范空间时的鲁棒性,尤其在跨身份生成能力方面表现突出。实验结果表明,Tri$^2$-plane不仅超越了现有方法,在定量指标与定性评估中均实现了更优性能。