Large language models (LLMs) have showcased remarkable potential across various tasks by conditioning on prompts. However, the quality of different human-written prompts leads to substantial discrepancies in LLMs' performance, and improving prompts usually necessitates considerable human effort and expertise. To this end, this paper proposes Prompt with Actor-Critic Editing (PACE) for LLMs to enable automatic prompt editing. Drawing inspiration from the actor-critic algorithm in reinforcement learning, PACE leverages LLMs as the dual roles of actors and critics, conceptualizing prompt as a type of policy. PACE refines prompt, taking into account the feedback from both actors performing prompt and critics criticizing response. This process helps LLMs better align prompt to a specific task, thanks to real responses and thinking from LLMs. We conduct extensive experiments on 24 instruction induction tasks and 21 big-bench tasks. Experimental results indicate that PACE elevates the relative performance of medium/low-quality human-written prompts by up to 98\%, which has comparable performance to high-quality human-written prompts. Moreover, PACE also exhibits notable efficacy for prompt generation.
翻译:摘要:大型语言模型(LLMs)通过依赖提示(prompts)已在各类任务中展现出卓越潜力。然而,不同人工编写提示的质量差异会导致LLMs性能出现显著偏差,而改进提示通常需要大量人工投入与专业知识。为此,本文提出面向LLMs的演员-评论家提示编辑方法(PACE),实现提示的自动编辑。该算法受强化学习中演员-评论家范式启发,将LLMs同时作为演员与评论家双重角色,将提示概念化为一种策略。PACE通过整合执行提示的演员与评估响应的评论家双方反馈来优化提示。得益于LLMs生成的真实响应与思维过程,该机制显著提升了提示与特定任务的适配度。我们在24项指令归纳任务与21项大杂烩任务上开展广泛实验。结果表明,PACE将中/低质量人工提示的相对性能提升高达98%,其效果可媲美高质量人工提示。此外,PACE在提示生成任务中也展现出显著效能。