Text-to-image generative models, specifically those based on diffusion models like Imagen and Stable Diffusion, have made substantial advancements. Recently, there has been a surge of interest in the delicate refinement of text prompts. Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images. However, the success of fine-control prompts depends on the accuracy of the text prompts and the careful selection of weights and time steps, which requires significant manual intervention. To address this, we introduce the \textbf{P}rompt \textbf{A}uto-\textbf{E}diting (PAE) method. Besides refining the original prompts for image generation, we further employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts. The reward function during training encourages the model to consider aesthetic score, semantic consistency, and user preferences. Experimental results demonstrate that our proposed method effectively improves the original prompts, generating visually more appealing images while maintaining semantic alignment. Code is available at https://github.com/Mowenyii/PAE.
翻译:基于扩散模型的文本到图像生成模型(如Imagen和Stable Diffusion)已取得了显著进展。近期,文本提示的精细化调整引发了广泛关注。用户通过为文本提示中的特定词分配权重或调整其注入时间步长,以提升生成图像的质量。然而,精细控制提示的成功依赖于文本提示的准确性以及权重和时间步长的精心选择,这需要大量人工干预。为解决这一问题,我们提出了**提示自动编辑**(PAE)方法。除了对原始提示进行精细化处理以生成图像外,我们进一步采用在线强化学习策略,探索每个词的权重和注入时间步长,从而生成动态精细控制提示。训练过程中的奖励函数引导模型综合考虑美学评分、语义一致性和用户偏好。实验结果表明,所提方法能有效优化原始提示,在保持语义对齐的同时生成视觉上更具吸引力的图像。代码已开源至https://github.com/Mowenyii/PAE。