Personalized image generation has made significant strides in adapting content to novel concepts. However, a persistent challenge remains: balancing the accurate reconstruction of unseen concepts with the need for editability according to the prompt, especially when dealing with the complex nuances of facial features. In this study, we delve into the temporal dynamics of the text-to-image conditioning process, emphasizing the crucial role of stage partitioning in introducing new concepts. We present PersonaMagic, a stage-regulated generative technique designed for high-fidelity face customization. Using a simple MLP network, our method learns a series of embeddings within a specific timestep interval to capture face concepts. Additionally, we develop a Tandem Equilibrium mechanism that adjusts self-attention responses in the text encoder, balancing text description and identity preservation, improving both areas. Extensive experiments confirm the superiority of PersonaMagic over state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, its robustness and flexibility are validated in non-facial domains, and it can also serve as a valuable plug-in for enhancing the performance of pretrained personalization models.
翻译:个性化图像生成在适应新概念内容方面已取得显著进展。然而,一个持续存在的挑战是:在根据提示词进行编辑的需求与对未见概念的精确重建之间取得平衡,尤其是在处理面部特征的复杂细微差别时。本研究深入探讨了文本到图像条件生成过程的时间动态特性,强调了阶段划分在引入新概念中的关键作用。我们提出了PersonaMagic,一种专为高保真人脸定制设计的阶段性调控生成技术。该方法通过简单的MLP网络,在特定的时间步区间内学习一系列嵌入向量以捕捉人脸概念。此外,我们开发了一种级联均衡机制,通过调整文本编码器中的自注意力响应,平衡文本描述与身份保持,从而同时提升这两方面的性能。大量实验证实,PersonaMagic在定性和定量评估上均优于当前最先进的方法。此外,其鲁棒性和灵活性在非人脸领域也得到了验证,并且该技术还可作为有价值的插件,用于提升预训练个性化模型的性能。