Recent works demonstrate that using reinforcement learning (RL) with quality rewards can enhance the quality of generated images in text-to-image (T2I) generation. However, a simple aggregation of multiple rewards may cause over-optimization in certain metrics and degradation in others, and it is challenging to manually find the optimal weights. An effective strategy to jointly optimize multiple rewards in RL for T2I generation is highly desirable. This paper introduces Parrot, a novel multi-reward RL framework for T2I generation. Through the use of the batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards during the RL optimization of the T2I generation. Additionally, Parrot employs a joint optimization approach for the T2I model and the prompt expansion network, facilitating the generation of quality-aware text prompts, thus further enhancing the final image quality. To counteract the potential catastrophic forgetting of the original user prompt due to prompt expansion, we introduce original prompt centered guidance at inference time, ensuring that the generated image remains faithful to the user input. Extensive experiments and a user study demonstrate that Parrot outperforms several baseline methods across various quality criteria, including aesthetics, human preference, image sentiment, and text-image alignment.
翻译:近期研究表明,在文本到图像生成中,利用带有质量奖励的强化学习可提升生成图像质量。然而,简单聚合多个奖励可能导致某些指标过度优化而其他指标退化,且人工寻找最优权重极具挑战性。因此,亟需一种在强化学习框架中联合优化文本到图像生成多奖励的有效策略。本文提出Parrot——一种面向文本到图像生成的新型多奖励强化学习框架。通过采用批量帕累托最优选择机制,Parrot能在文本到图像生成的强化学习优化过程中自动识别不同奖励间的最优权衡。此外,Parrot采用文本到图像模型与提示扩展网络的联合优化方法,促进生成质量感知型文本提示,从而进一步提升最终图像质量。针对提示扩展可能导致的原始用户提示灾难性遗忘问题,我们在推理阶段引入原始提示中心引导机制,确保生成图像忠实于用户输入。大量实验及用户研究表明,Parrot在美学质量、人类偏好、图像情感及图文对齐等多项质量指标上均优于多种基线方法。