Diffusion models have shown impressive performance in many domains, including image generation, time series prediction, and reinforcement learning. The algorithm demonstrates superior performance over the traditional GAN and transformer-based methods. However, the model's capability to follow natural language instructions (e.g., spatial relationships between objects, generating complex scenes) is still unsatisfactory. It has been an important research area to enhance such capability. Prior works have shown that using Reinforcement Learning can effectively train diffusion models to enhance fidelity on specific objectives. However, existing RL methods require collecting a large amount of data to train an effective reward model. They also don't receive feedback when the generated image is incorrect. In this work, we propose Iterative Prompt Relabeling (IPR), a novel algorithm that aligns images to text through iterative image sampling and prompt relabeling. IPR first samples a batch of images conditioned on the text then relabels the text prompts of unmatched text-image pairs with classifier feedback. We conduct thorough experiments on SDv2 and SDXL, testing their capability to follow instructions on spatial relations. With IPR, we improved up to 15.22% (absolute improvement) on the challenging spatial relation VISOR benchmark, demonstrating superior performance compared to previous RL methods.
翻译:扩散模型在图像生成、时间序列预测和强化学习等多个领域展现出卓越的性能。该算法在性能上超越了传统的GAN和基于Transformer的方法。然而,模型遵循自然语言指令(例如对象间的空间关系、生成复杂场景)的能力仍不尽如人意。提升这种能力已成为一个重要研究方向。先前的研究表明,使用强化学习可以有效训练扩散模型以提升特定目标上的保真度。然而,现有的强化学习方法需要收集大量数据来训练有效的奖励模型,并且在生成图像不正确时无法获得反馈。本文提出迭代提示重标注(IPR),一种通过迭代图像采样和提示重标注来实现图像与文本对齐的新算法。IPR首先基于文本采样一批图像,然后利用分类器反馈对不匹配的文本-图像对的文本提示进行重标注。我们在SDv2和SDXL上进行了全面实验,测试其遵循空间关系指令的能力。通过IPR,我们在具有挑战性的空间关系VISOR基准上实现了高达15.22%(绝对提升)的改进,证明了其相较于先前强化学习方法的优越性能。