Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt semantics, often misrepresenting or overlooking specific attributes. To address this, we propose a simple, training-free approach that modulates the guidance direction of diffusion models during inference. We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept. Our key observation is that deviations in model's adherence to prompt semantics are highly correlated with divergence of the guidance from one or more of these concepts. Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges. Extensive experimentation validates that our method improves the semantic alignment of images generated by diffusion models in response to prompts. Project page is available at: https://korguy.github.io/
翻译:近期在文本到图像(T2I)扩散模型方面的进展,在生成具有零样本泛化能力的高质量图像方面取得了显著成功。然而,当前模型在严格遵循提示语义方面仍存在困难,常常曲解或忽略特定属性。为解决这一问题,我们提出了一种简单、无需训练的方法,在推理过程中对扩散模型的引导方向进行调制。我们首先将提示语义分解为一组概念,并监测与每个概念相关的引导轨迹。我们的关键观察是,模型对提示语义的遵循程度偏差与引导方向偏离这些概念中的一个或多个高度相关。基于这一观察,我们设计了一种技术,将引导方向转向模型偏离的概念。大量实验验证了我们的方法能够改善扩散模型根据提示生成的图像的语义对齐性。项目页面见:https://korguy.github.io/