Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.
翻译:精心设计的提示词能够引导文本到图像模型生成令人惊叹的图像。然而,性能优异的提示词往往具有模型特异性,且与用户输入不一致。为避免繁琐的人工设计,我们提出提示词适配这一通用框架,该框架可自动将原始用户输入转化为模型偏好的提示词。具体而言,我们首先在人工设计的少量高质量提示词数据集上,通过预训练语言模型进行监督微调;随后利用强化学习探索更优提示词。我们设计的奖励函数在保留用户原始意图的同时,鼓励策略生成更具美学吸引力的图像。在Stable Diffusion上的实验结果表明,我们的方法在自动评估指标和人工偏好评分上均优于人工提示词工程。此外,强化学习进一步提升了模型性能,尤其在处理域外提示词时优势显著。预训练模型检查点可通过https://aka.ms/promptist获取,演示程序见https://aka.ms/promptist-demo。