Customizing image generation remains a core challenge in controllable image synthesis. For single-concept generation, maintaining both identity preservation and prompt alignment is challenging. In multi-concept scenarios, relying solely on a prompt without additional conditions like layout boxes or semantic masks, often leads to identity loss and concept omission. In this paper, we introduce ShowFlow, a comprehensive framework designed to tackle these challenges. We propose ShowFlow-S for single-concept image generation, and ShowFlow-M for handling multiple concepts. ShowFlow-S introduces a KronA-WED adapter, which integrates a Kronecker adapter with weight and embedding decomposition, and together with a novel Semantic-Aware Attention Regularization (SAR) training objective to enhance single-concept generation. Building on this foundation, ShowFlow-M directly reuses robust models learned by ShowFlow-S to support multi-concept generation without extra conditions, incorporating a Subject-Adaptive Matching Attention (SAMA) and a Layout Consistency guidance as the plug-and-play module. Extensive experiments and user studies validate ShowFlow's effectiveness, highlighting its potential in real-world applications like advertising and virtual dressing. Our source code will be publicly available at: https://htrvu.github.io/showflow.
翻译:中文摘要:定制化图像生成始终是可控图像合成领域的核心挑战。在单概念生成中,同时保持身份保真度与提示对齐性极具挑战性。而在多概念场景下,仅依靠文本提示而未辅以布局框或语义掩码等额外条件,常导致身份丢失与概念遗漏。本文提出ShowFlow综合框架以应对上述挑战:其中ShowFlow-S专攻单概念图像生成,ShowFlow-M处理多概念生成任务。ShowFlow-S创新性地引入KronA-WED适配器——该模块将克罗内克适配器与权重及嵌入分解相融合,并配合新型语义感知注意力正则化训练目标,显著提升单概念生成质量。基于此基础,ShowFlow-M可直接复用ShowFlow-S学习到的稳健模型,通过嵌入主体自适应匹配注意力模块与布局一致性引导插件模块,在无需额外条件的情况下支持多概念生成。大量实验与用户研究验证了ShowFlow的有效性,展示了其在广告、虚拟试衣等实际场景中的应用潜力。我们的源代码将在https://htrvu.github.io/showflow公开。