While generative models have become powerful tools for image synthesis, they are typically optimized for executing carefully crafted textual prompts, offering limited support for the open-ended visual exploration that often precedes idea formation. In contrast, designers frequently draw inspiration from loosely connected visual references, seeking emergent connections that spark new ideas. We propose Inspiration Seeds, a generative framework that shifts image generation from final execution to exploratory ideation. Given two input images, our model produces diverse, visually coherent compositions that reveal latent relationships between inputs, without relying on user-specified text prompts. Our approach is feed-forward, trained on synthetic triplets of decomposed visual aspects derived entirely through visual means: we use CLIP Sparse Autoencoders to extract editing directions in CLIP latent space and isolate concept pairs. By removing the reliance on language and enabling fast, intuitive recombination, our method supports visual ideation at the early and ambiguous stages of creative work.
翻译:尽管生成模型已成为图像合成的强大工具,但它们通常针对执行精心设计的文本提示进行优化,对创意形成前常见的开放式视觉探索支持有限。相比之下,设计师经常从松散关联的视觉参考中汲取灵感,寻求能够激发新想法的涌现性关联。我们提出“灵感种子”,一种将图像生成从最终执行转向探索性构思的生成框架。给定两幅输入图像,我们的模型无需依赖用户指定的文本提示,即可生成多样化且视觉连贯的构图,揭示输入之间的潜在关系。我们的方法是前馈式的,基于完全通过视觉手段生成的合成三元组进行训练:我们使用CLIP稀疏自编码器提取CLIP潜在空间中的编辑方向并分离概念对。通过摆脱对语言的依赖并实现快速、直观的重组,我们的方法支持在创意工作早期和模糊阶段进行视觉构思。