Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.
翻译:肖像风格化的目标是为肖像照片赋予来自风格示例的生动艺术效果。尽管存在海量训练数据集和大型网络权重,但由于面部照片与风格化图像之间面部特征分布的差异,现有方法难以保持几何一致性并实现令人满意的风格化效果,这限制了其在稀有风格和移动设备上的应用。为缓解此问题,我们提出在肖像与风格样本之间建立有意义的几何关联,通过对齐相应的面部特征来简化风格化过程。具体而言,我们将可微分薄板样条(TPS)模块集成到端到端生成对抗网络(GAN)框架中,以提高训练效率并增强面部身份一致性。通过利用面部固有结构信息(如面部关键点),TPS模块能够在像素空间和特征空间中,从全局和局部尺度建立两个域之间的几何对齐,从而克服上述挑战。在一系列肖像风格化任务上的定量与定性比较表明,我们的模型不仅在保真度和风格一致性方面优于现有模型,而且在训练数据效率上实现了2倍提升,计算复杂度降低100倍,使得我们的轻量化模型能够在移动设备上以512*512分辨率实现实时推理(30 FPS)。