Recently, there has been an increasing interest in automated prompt optimization based on reinforcement learning (RL). This approach offers important advantages, such as generating interpretable prompts and being compatible with black-box foundation models. However, the substantial prompt space size poses challenges for RL-based methods, often leading to suboptimal policy convergence. This paper introduces MultiPrompter, a new framework that views prompt optimization as a cooperative game between prompters which take turns composing a prompt together. Our cooperative prompt optimization effectively reduces the problem size and helps prompters learn optimal prompts. We test our method on the text-to-image task and show its ability to generate higher-quality images than baselines.
翻译:近期,基于强化学习的自动化提示优化方法日益受到关注。此类方法具有生成可解释性提示、兼容黑盒基础模型等重要优势。然而,巨大的提示空间规模给基于强化学习的方法带来挑战,常导致策略收敛至次优解。本文提出MultiPrompter这一全新框架,将提示优化视为多个提示器轮流协作构建提示的合作博弈。这种协作式提示优化有效降低了问题规模,有助于提示器学习最优提示。我们在文生图任务上验证了该方法,结果表明其能够生成比基线方法更高质量的图像。