Inversion methods, such as Textual Inversion, generate personalized images by incorporating concepts of interest provided by user images. However, existing methods often suffer from overfitting issues, where the dominant presence of inverted concepts leads to the absence of other desired concepts. It stems from the fact that during inversion, the irrelevant semantics in the user images are also encoded, forcing the inverted concepts to occupy locations far from the core distribution in the embedding space. To address this issue, we propose a method that guides the inversion process towards the core distribution for compositional embeddings. Additionally, we introduce a spatial regularization approach to balance the attention on the concepts being composed. Our method is designed as a post-training approach and can be seamlessly integrated with other inversion methods. Experimental results demonstrate the effectiveness of our proposed approach in mitigating the overfitting problem and generating more diverse and balanced compositions of concepts in the synthesized images. The source code is available at https://github.com/zhangxulu1996/Compositional-Inversion.
翻译:反演方法(如文本反演)通过整合用户图像中感兴趣的概念来生成个性化图像。然而,现有方法常存在过拟合问题:反演概念的主导性存在导致其他期望概念的缺失。这一问题源于反演过程中用户图像中的无关语义也被编码,迫使反演概念在嵌入空间中远离核心分布区域。为解决该问题,我们提出一种引导反演过程向组合式嵌入核心分布收敛的方法。此外,我们引入空间正则化策略以平衡被组合概念间的注意力分配。该方法设计为训练后处理方案,可无缝集成于现有反演方法中。实验结果表明,我们的方法能有效缓解过拟合问题,生成概念组合更丰富、更均衡的合成图像。源代码已发布于https://github.com/zhangxulu1996/Compositional-Inversion。