We propose a novel, zero-shot image generation technique called "Visual Concept Blending" that provides fine-grained control over which features from multiple reference images are transferred to a source image. If only a single reference image is available, it is difficult to isolate which specific elements should be transferred. However, using multiple reference images, the proposed approach distinguishes between common and unique features by selectively incorporating them into a generated output. By operating within a partially disentangled Contrastive Language-Image Pre-training (CLIP) embedding space (from IP-Adapter), our method enables the flexible transfer of texture, shape, motion, style, and more abstract conceptual transformations without requiring additional training or text prompts. We demonstrate its effectiveness across a diverse range of tasks, including style transfer, form metamorphosis, and conceptual transformations, showing how subtle or abstract attributes (e.g., brushstroke style, aerodynamic lines, and dynamism) can be seamlessly combined into a new image. In a user study, participants accurately recognized which features were intended to be transferred. Its simplicity, flexibility, and high-level control make Visual Concept Blending valuable for creative fields such as art, design, and content creation, where combining specific visual qualities from multiple inspirations is crucial.
翻译:我们提出了一种新颖的零样本图像生成技术,称为"视觉概念融合",该技术能够精细控制将多个参考图像中的哪些特征转移到源图像中。如果仅有一张参考图像可用,则难以确定应转移哪些具体元素。然而,通过使用多张参考图像,所提出的方法能够区分共有特征与独特特征,并有选择性地将它们融入生成结果中。通过在部分解耦的对比语言-图像预训练嵌入空间(来自IP-Adapter)中操作,我们的方法能够灵活地转移纹理、形状、运动、风格以及更抽象的概念变换,而无需额外的训练或文本提示。我们在一系列多样化任务中展示了其有效性,包括风格迁移、形态蜕变和概念转换,展示了如何将细微或抽象的属性(例如笔触风格、空气动力学线条和动感)无缝融合到新图像中。在一项用户研究中,参与者能够准确识别出哪些特征是有意被转移的。其简洁性、灵活性和高级控制能力使得视觉概念融合在艺术、设计和内容创作等创意领域具有重要价值,这些领域的关键在于将来自多个灵感来源的特定视觉特质进行结合。