Existing 3D-aware portrait synthesis methods can generate impressive high-quality images while preserving strong 3D consistency. However, most of them cannot support the fine-grained part-level control over synthesized images. Conversely, some GAN-based 2D portrait synthesis methods can achieve clear disentanglement of facial regions, but they cannot preserve view consistency due to a lack of 3D modeling abilities. To address these issues, we propose 3D-SSGAN, a novel framework for 3D-aware compositional portrait image synthesis. First, a simple yet effective depth-guided 2D-to-3D lifting module maps the generated 2D part features and semantics to 3D. Then, a volume renderer with a novel 3D-aware semantic mask renderer is utilized to produce the composed face features and corresponding masks. The whole framework is trained end-to-end by discriminating between real and synthesized 2D images and their semantic masks. Quantitative and qualitative evaluations demonstrate the superiority of 3D-SSGAN in controllable part-level synthesis while preserving 3D view consistency.
翻译:现有的三维感知肖像合成方法能够生成令人印象深刻的高质量图像,同时保持强大的三维一致性。然而,大多数方法无法支持对合成图像进行细粒度的部位级控制。相反,一些基于GAN的二维肖像合成方法能够实现面部区域的清晰解耦,但由于缺乏三维建模能力,无法保持视角一致性。为解决这些问题,我们提出3D-SSGAN,一个用于三维感知组合式肖像图像合成的新型框架。首先,一个简单而有效的深度引导的二维到三维提升模块将生成的二维部位特征和语义映射到三维。然后,利用体渲染器与新型三维感知语义掩膜渲染器,生成组合后的面部特征及对应掩膜。整个框架通过判别真实与合成二维图像及其语义掩膜进行端到端训练。定量和定性评估证明了3D-SSGAN在保持三维视角一致性的同时,在可控部位级合成方面的优越性。