In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ($V$) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like "a photo of $V$ wearing glasses and playing guitar" into simpler, less contextually rich forms such as "a photo of $V$" but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding $V$ to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem. Our method is broadly applicable across different personalization methods and demonstrates significant improvements in text-image alignment in diverse use cases. Our code is anonymously published at https://github.com/tuananhbui89/Embedding-Adjustment
翻译:本文研究了生成式个性化中的语义坍缩问题,这是一个尚未被充分探索的议题:在多概念输入提示中,习得的视觉概念($V$)会逐渐偏离其原始文本含义,并开始主导其他概念。该问题不仅会将复杂输入提示(如“一张$V$戴着眼镜弹吉他的照片”)的语义丰富性简化为更简单、语境更贫乏的形式(如“一张$V$的照片”),还会导致输出图像过于简化,无法捕捉预期概念。我们指出其根本原因在于无约束优化过程,这使得习得的嵌入$V$在嵌入空间中可在方向和幅度上任意漂移。为解决此问题,我们提出一种简单而有效的免训练方法,在推理阶段调整预训练嵌入的幅度与方向,从而有效缓解语义坍缩问题。该方法可广泛适用于不同的个性化生成方法,并在多样化的应用场景中显著提升了文本-图像对齐效果。我们的代码已匿名发布于 https://github.com/tuananhbui89/Embedding-Adjustment