Inversion methods, such as Textual Inversion, generate personalized images by incorporating concepts of interest provided by user images. However, existing methods often suffer from overfitting issues, where the dominant presence of inverted concepts leads to the absence of other desired concepts. It stems from the fact that during inversion, the irrelevant semantics in the user images are also encoded, forcing the inverted concepts to occupy locations far from the core distribution in the embedding space. To address this issue, we propose a method that guides the inversion process towards the core distribution for compositional embeddings. Additionally, we introduce a spatial regularization approach to balance the attention on the concepts being composed. Our method is designed as a post-training approach and can be seamlessly integrated with other inversion methods. Experimental results demonstrate the effectiveness of our proposed approach in mitigating the overfitting problem and generating more diverse and balanced compositions of concepts in the synthesized images. The source code is available at https://github.com/zhangxulu1996/Compositional-Inversion.
翻译:反演方法(如文本反演)通过整合用户图像中感兴趣的概念来生成个性化图像。然而,现有方法常面临过拟合问题,即反演概念的主导性存在导致其他期望概念的缺失。其根源在于反演过程中,用户图像中的无关语义也被编码,迫使反演概念在嵌入空间中偏离核心分布区域。针对该问题,我们提出一种引导反演过程朝向组合嵌入核心分布的方法。此外,我们引入空间正则化方法以平衡被组合概念的注意力权重。该方法设计为后训练模式,可无缝集成至其他反演方法。实验结果表明,所提方法能有效缓解过拟合问题,在合成图像中生成更丰富多样且平衡的概念组合。源代码参见 https://github.com/zhangxulu1996/Compositional-Inversion。