The introduction of 3D Gaussian blendshapes has enabled the real-time reconstruction of animatable head avatars from monocular video. Toonify, a StyleGAN-based method, has become widely used for facial image stylization. To extend Toonify for synthesizing diverse stylized 3D head avatars using Gaussian blendshapes, we propose an efficient two-stage framework, ToonifyGB. In Stage 1 (stylized video generation), we adopt an improved StyleGAN to generate the stylized video from the input video frames, which overcomes the limitation of cropping aligned faces at a fixed resolution as preprocessing for normal StyleGAN. This process provides a more stable stylized video, which enables Gaussian blendshapes to better capture the high-frequency details of the video frames, facilitating the synthesis of high-quality animations in the next stage. In Stage 2 (Gaussian blendshapes synthesis), our method learns a stylized neutral head model and a set of expression blendshapes from the generated stylized video. By combining the neutral head model with expression blendshapes, ToonifyGB can efficiently render stylized avatars with arbitrary expressions. We validate the effectiveness of ToonifyGB on benchmark datasets using two representative styles: Arcane and Pixar.
翻译:三维高斯混合形状的引入使得从单目视频实时重建可动画化的头部虚拟形象成为可能。Toonify作为一种基于StyleGAN的方法,已广泛应用于面部图像风格化。为扩展Toonify以利用高斯混合形状合成多样化的风格化三维头部虚拟形象,我们提出了一种高效的两阶段框架ToonifyGB。在第一阶段(风格化视频生成),我们采用改进的StyleGAN从输入视频帧生成风格化视频,克服了传统StyleGAN需以固定分辨率裁剪对齐人脸作为预处理的限制。该过程能提供更稳定的风格化视频,使高斯混合形状能更好地捕捉视频帧的高频细节,为下一阶段高质量动画的合成奠定基础。在第二阶段(高斯混合形状合成),我们的方法从生成的风格化视频中学习一个风格化中性头部模型和一组表情混合形状。通过将中性头部模型与表情混合形状相结合,ToonifyGB能够高效渲染具有任意表情的风格化虚拟形象。我们在基准数据集上使用两种代表性风格(Arcane和Pixar)验证了ToonifyGB的有效性。