ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

翻译：基于扩散的技术在个性化与定制化面部生成领域取得了显著进展。然而，现有方法在实现高保真且细节丰富的身份一致性方面面临挑战，主要源于对面部区域缺乏细粒度控制，以及未能全面考虑复杂面部细节与整体面容而缺乏完整的身份保持策略。为克服这些局限，我们提出ConsistentID——一种创新方法，旨在通过单一参考图像，在细粒度多模态面部提示下生成多样化的身份保持人像。ConsistentID包含两个关键组件：多模态面部提示生成器，它融合面部特征、相应面部描述及整体面部上下文以增强面部细节精度；以及通过面部注意力定位策略优化的身份保持网络，旨在维持面部区域的ID一致性。两者共同通过引入来自面部区域的细粒度多模态身份信息，显著提升身份保持的准确性。为便于ConsistentID的训练，我们提出细粒度人像数据集FGID，包含超过50万张面部图像，相比现有公共面部数据集具有更高的多样性与全面性。实验结果证明，我们的ConsistentID在个性化面部生成中达到了卓越的精度与多样性，在MyStyle数据集上超越现有方法。此外，尽管ConsistentID引入了更多多模态身份信息，其在生成过程中仍保持快速推理速度。