Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt semantics, often misrepresenting or overlooking specific attributes. To address this, we propose a simple, training-free approach that modulates the guidance direction of diffusion models during inference. We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept. Our key observation is that deviations in model's adherence to prompt semantics are highly correlated with divergence of the guidance from one or more of these concepts. Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges. Extensive experimentation validates that our method improves the semantic alignment of images generated by diffusion models in response to prompts. Project page is available at: https://korguy.github.io/
翻译:近来,文本到图像扩散模型在零样本泛化能力下生成高质量图像方面取得了显著成功。然而,现有模型仍难以严格遵循提示语义,常常误表达或忽略特定属性。针对这一问题,我们提出了一种简单且无需训练的方法,该方法在推理过程中调节扩散模型的引导方向。我们首先将提示语义分解为一组概念,并监测引导轨迹与每个概念的关联。关键发现是,模型对提示语义遵循的偏差与引导方向偏离一个或多个概念高度相关。基于此发现,我们设计了一种技术,可将引导方向转向模型所偏离的概念。大量实验验证了该方法能有效提升扩散模型根据提示生成图像的语义对齐度。项目页面见:https://korguy.github.io/