StyO: Stylize Your Face in Only One-shot

This paper focuses on face stylization with a single artistic target. Existing works for this task often fail to retain the source content while achieving geometry variation. Here, we present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem. In particular, StyO exploits a disentanglement and recombination strategy. It first disentangles the content and style of source and target images into identifiers, which are then recombined in a cross manner to derive the stylized face image. In this way, StyO decomposes complex images into independent and specific attributes, and simplifies one-shot face stylization as the combination of different attributes from input images, thus producing results better matching face geometry of target image and content of source one. StyO is implemented with latent diffusion models (LDM) and composed of two key modules: 1) Identifier Disentanglement Learner (IDL) for disentanglement phase. It represents identifiers as contrastive text prompts, ie. positive and negative descriptions. And it introduces a novel triple reconstruction loss to fine-tune the pre-trained LDM for encoding style and content into corresponding identifiers; 2) Fine-grained Content Controller (FCC) for the recombination phase. It recombines disentangled identifiers from IDL to form an augmented text prompt for generating stylized faces. In addition, FCC also constrains the cross-attention maps of latent and text features to preserve source face details in results. The extensive evaluation shows that StyO produces high-quality images on numerous paintings of various styles and outperforms the current state-of-the-art.

翻译：本文聚焦于基于单一艺术目标的面部风格化任务。现有方法在实现几何形变的同时往往难以保持源图像内容。为此，我们提出一种新颖的StyO模型（即仅需单次示例的面部风格化）以解决上述问题。StyO采用解耦与重组策略：首先将源图像与目标图像的内容和风格解耦为标识符，随后通过交叉重组方式生成风格化面部图像。该方法将复杂图像分解为独立且特定的属性，将单次面部风格化简化为输入图像不同属性的组合，从而生成更贴合目标图像面部几何结构与源图像内容的结果。StyO基于隐扩散模型（LDM）构建，包含两个核心模块：1）用于解耦阶段的标识符解耦学习器（IDL），该模块将标识符表示为对比性文本提示（即正面与负面描述），并引入新颖的三重重建损失对预训练LDM进行微调，以将风格与内容编码至对应标识符；2）用于重组阶段的细粒度内容控制器（FCC），该模块重组来自IDL的解耦标识符以构建增强文本提示，进而生成风格化面部。此外，FCC通过约束隐特征与文本特征的交叉注意力图，以在结果中保留源面部细节。大量实验表明，StyO能在多种风格的绘画作品上生成高质量图像，其性能优于当前最优方法。