The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}
翻译:扩散模型的快速发展催生了多样化的应用场景。身份保留文本到图像生成(ID-T2I)因其在AI肖像、广告等广泛领域的应用而备受关注。尽管现有ID-T2I方法已取得显著成果,但仍面临若干关键挑战:(1)难以精准保留参考肖像的身份特征;(2)尤其在强制保留身份时,生成图像缺乏美学吸引力;(3)无法同时兼容基于LoRA和基于Adapter的方法。针对这些问题,我们提出**ID-Aligner**——一种通用的反馈学习框架以增强ID-T2I性能。为解决身份特征丢失问题,我们引入身份一致性奖励微调,利用人脸检测与识别模型的反馈优化生成图像的身份保留效果。此外,我们提出身份美学奖励微调,通过人工标注偏好数据及自动构建的角色结构生成反馈提供美学优化信号。得益于其通用反馈微调框架,本方法可便捷应用于LoRA和Adapter模型,实现持续性能增益。在SD1.5和SDXL扩散模型上的大量实验验证了本方法的有效性。**项目页面:\url{https://idaligner.github.io/}**