Free-style and Fast 3D Portrait Synthesis

Efficiently generating a free-style 3D portrait with high quality and consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get a free-style 3D portrait, one can build a large-scale multi-style database to retrain the 3D generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a fast 3D portrait synthesis framework in this paper, which enable one to use text prompts to specify styles. Specifically, for a given portrait style, we first leverage two generative priors, a 3D-aware GAN generator (EG3D) and a text-guided image editor (Ip2p), to quickly construct a few-shot training set, where the inference process of Ip2p is optimized to make editing more stable. Then we replace original triplane generator of EG3D with a Image-to-Triplane (I2T) module for two purposes: 1) getting rid of the style constraints of pre-trained EG3D by fine-tuning I2T on the few-shot dataset; 2) improving training efficiency by fixing all parts of EG3D except I2T. Furthermore, we construct a multi-style and multi-identity 3D portrait database to demonstrate the scalability and generalization of our method. Experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.

翻译：高效生成高质量且具有一致性的自由风格3D肖像是一项前景广阔但颇具挑战的任务。现有方法生成的肖像风格通常受限于其三维生成器，这些生成器在特定面部数据集（如FFHQ）上训练而成。为获得自由风格3D肖像，一种方法是构建大规模多风格数据集以重新训练三维生成器，另一种是使用现成工具进行风格迁移。然而，前者因数据收集和训练过程而耗时巨大，后者则可能破坏多视角一致性。为解决该问题，本文提出一种快速3D肖像合成框架，允许用户通过文本提示指定风格。具体而言，针对给定肖像风格，我们首先利用两种生成先验——三维感知生成对抗网络生成器（EG3D）和文本引导图像编辑器（Ip2p）——快速构建少样本训练集，并通过优化Ip2p推理过程以提高编辑稳定性。随后，将EG3D的原始三平面生成器替换为图像到三平面模块（I2T），以达成两个目标：1）通过在少样本数据集上微调I2T，摆脱预训练EG3D的风格限制；2）固定EG3D除I2T以外的所有组件，提升训练效率。此外，我们构建了一个多风格、多身份的三维肖像数据库，以验证方法的可扩展性与泛化能力。实验结果表明，本方法能在数分钟内合成指定风格的高质量3D肖像，其性能优于当前最先进技术。