Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.
翻译:近期研究表明,在文本到图像生成任务中,采用结合多种质量奖励的强化学习方法能够提升生成图像的质量。然而,手动调整奖励权重存在挑战,并可能导致某些指标的过度优化。为解决这一问题,我们提出Parrot框架,其通过多目标优化方法应对该挑战,并引入一种有效的多奖励优化策略以逼近帕累托最优。通过利用批式帕累托最优选择机制,Parrot能够自动识别不同奖励之间的最优权衡。我们采用该新颖的多奖励优化算法联合优化文本到图像生成模型与提示扩展网络,从而显著提升图像质量,并允许在推理阶段通过奖励相关提示控制不同奖励的权衡关系。此外,我们在推理阶段引入以原始提示为中心的引导机制,确保提示扩展后仍能忠实于用户输入。大量实验与用户研究证实,Parrot在美学质量、人类偏好、图文对齐度及图像情感等多种质量指标上均优于现有基线方法。