While text-to-image generative models can synthesize diverse and faithful contents, subject variation across multiple creations limits the application in long content generation. Existing approaches require time-consuming tuning, references for all subjects, or access to other creations. We introduce Contrastive Concept Instantiation (CoCoIns) to effectively synthesize consistent subjects across multiple independent creations. The framework consists of a generative model and a mapping network, which transforms input latent codes into pseudo-words associated with certain instances of concepts. Users can generate consistent subjects with the same latent codes. To construct such associations, we propose a contrastive learning approach that trains the network to differentiate the combination of prompts and latent codes. Extensive evaluations of human faces with a single subject show that CoCoIns performs comparably to existing methods while maintaining higher flexibility. We also demonstrate the potential of extending CoCoIns to multiple subjects and other object categories.
翻译:尽管文本到图像生成模型能够合成多样且忠实的内容,但多个创作间主题的差异性限制了其在长内容生成中的应用。现有方法需要耗时的调参、所有主题的参考图像,或依赖于其他创作。我们提出对比概念实例化(CoCoIns)方法,以有效合成跨多个独立创作的一致性主题。该框架包含一个生成模型和一个映射网络,后者将输入的潜在编码转换为与概念特定实例相关联的伪词。用户可通过相同潜在编码生成具有一致性的主题。为构建此类关联,我们提出一种对比学习方法,训练网络区分提示词与潜在编码的组合。针对单一人脸主题的大量评估表明,CoCoIns在保持更高灵活性的同时,其性能与现有方法相当。我们还展示了将CoCoIns扩展至多主题及其他对象类别的潜力。