The customization of text-to-image models has seen significant advancements, yet generating multiple personalized concepts remains a challenging task. Current methods struggle with attribute leakage and layout confusion when handling multiple concepts, leading to reduced concept fidelity and semantic consistency. In this work, we introduce a novel training-free framework, Concept Conductor, designed to ensure visual fidelity and correct layout in multi-concept customization. Concept Conductor isolates the sampling processes of multiple custom models to prevent attribute leakage between different concepts and corrects erroneous layouts through self-attention-based spatial guidance. Additionally, we present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. This technique injects the structure and appearance of personalized concepts through feature fusion in the attention layers, ensuring harmony in the final image. Extensive qualitative and quantitative experiments demonstrate that Concept Conductor can consistently generate composite images with accurate layouts while preserving the visual details of each concept. Compared to existing baselines, Concept Conductor shows significant performance improvements. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts. The code and models are available at https://github.com/Nihukat/Concept-Conductor.
翻译:文本到图像模型的定制化已取得显著进展,但生成多个个性化概念仍然是一项具有挑战性的任务。现有方法在处理多个概念时,常因属性泄漏和布局混淆而导致概念保真度和语义一致性下降。本文提出了一种新颖的无训练框架——概念指挥家,旨在确保多概念定制中的视觉保真度与正确布局。该框架通过隔离多个定制模型的采样过程,防止不同概念间的属性泄漏,并利用基于自注意力的空间引导来校正错误布局。此外,我们提出了一种概念注入技术,该技术采用形状感知掩码来指定每个概念的生成区域,通过在注意力层中进行特征融合,注入个性化概念的结构与外观,从而确保最终图像的和谐统一。大量定性与定量实验表明,概念指挥家能够持续生成布局准确且保持各概念视觉细节的复合图像。与现有基线方法相比,概念指挥家展现出显著的性能提升。我们的方法支持任意数量概念的组合,即使在处理视觉相似概念时也能保持高保真度。代码与模型已发布于 https://github.com/Nihukat/Concept-Conductor。