3D Face Style Transfer with a Hybrid Solution of NeRF and Mesh Rasterization

Style transfer for human face has been widely researched in recent years. Majority of the existing approaches work in 2D image domain and have 3D inconsistency issue when applied on different viewpoints of the same face. In this paper, we tackle the problem of 3D face style transfer which aims at generating stylized novel views of a 3D human face with multi-view consistency. We propose to use a neural radiance field (NeRF) to represent 3D human face and combine it with 2D style transfer to stylize the 3D face. We find that directly training a NeRF on stylized images from 2D style transfer brings in 3D inconsistency issue and causes blurriness. On the other hand, training a NeRF jointly with 2D style transfer objectives shows poor convergence due to the identity and head pose gap between style image and content image. It also poses challenge in training time and memory due to the need of volume rendering for full image to apply style transfer loss functions. We therefore propose a hybrid framework of NeRF and mesh rasterization to combine the benefits of high fidelity geometry reconstruction of NeRF and fast rendering speed of mesh. Our framework consists of three stages: 1. Training a NeRF model on input face images to learn the 3D geometry; 2. Extracting a mesh from the trained NeRF model and optimizing it with style transfer objectives via differentiable rasterization; 3. Training a new color network in NeRF conditioned on a style embedding to enable arbitrary style transfer to the 3D face. Experiment results show that our approach generates high quality face style transfer with great 3D consistency, while also enabling a flexible style control.

翻译：面部风格迁移是近年来广泛研究的课题。现有方法大多基于二维图像域，当应用于同一人脸的不同视角时存在三维不一致性问题。本文针对三维人脸风格迁移问题展开研究，旨在生成具有多视角一致性的三维人脸风格化新视角图像。我们提出采用神经辐射场（NeRF）表示三维人脸，并将其与二维风格迁移相结合来实现三维人脸风格化。研究发现，直接在二维风格迁移生成的风格化图像上训练NeRF会引入三维不一致性问题，并导致图像模糊。另一方面，将NeRF与二维风格迁移目标联合训练时，由于风格图像与内容图像之间存在身份和头部姿态差异，导致收敛困难。该方法还需对完整图像进行体渲染以应用风格迁移损失函数，给训练时间和内存带来挑战。因此，我们提出NeRF与网格光栅化的混合框架，融合NeRF高保真几何重建与网格快速渲染的优势。该框架包含三个阶段：1. 在输入人脸图像上训练NeRF模型以学习三维几何；2. 从训练好的NeRF模型中提取网格，通过可微光栅化以风格迁移目标优化该网格；3. 在NeRF中训练基于风格嵌入的新颜色网络，实现对三维人脸的任意风格迁移。实验结果表明，该方法能生成高质量且具有出色三维一致性的面部风格迁移结果，同时支持灵活的风格控制。