This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation.
翻译:本技术报告提出了一种基于扩散模型的人脸交换框架,用于实现两幅肖像图像之间的人脸交换。该基础框架由三个组件构成,即IP-Adapter、ControlNet和Stable Diffusion的图像修复流水线,分别用于面部特征编码、多条件生成和人脸修复。此外,本文引入了面部引导优化和基于CodeFormer的融合技术以进一步提升生成质量。具体而言,我们采用了一种近期提出的轻量化定制方法(即DreamBooth-LoRA),通过以下方式确保身份一致性:1)使用稀有标识符"sks"表示源身份,2)将源肖像的图像特征像文本特征一样注入每个交叉注意力层。随后,我们借助Stable Diffusion强大的图像修复能力,利用目标肖像的Canny边缘图像和人脸检测标注作为条件,引导ControlNet的生成过程,使源肖像与目标肖像对齐。为校正面部对齐误差,我们在样本生成过程中添加面部引导损失函数以优化文本嵌入。