Large language models (LLMs) have showcased remarkable potential across various tasks by conditioning on prompts. However, the quality of different human-written prompts leads to substantial discrepancies in LLMs' performance, and improving prompts usually necessitates considerable human effort and expertise. To this end, this paper proposes Prompt with Actor-Critic Editing (PACE) for LLMs to enable automatic prompt editing. Drawing inspiration from the actor-critic algorithm in reinforcement learning, PACE leverages LLMs as the dual roles of actors and critics, conceptualizing prompt as a type of policy. PACE refines prompt, taking into account the feedback from both actors performing prompt and critics criticizing response. This process helps LLMs better align prompt to a specific task, thanks to real responses and thinking from LLMs. We conduct extensive experiments on 24 instruction induction tasks and 21 big-bench tasks. Experimental results indicate that PACE elevates the relative performance of medium/low-quality human-written prompts by up to 98\%, which has comparable performance to high-quality human-written prompts. Moreover, PACE also exhibits notable efficacy for prompt generation.
翻译:大型语言模型(LLMs)通过依赖提示已在各类任务中展现出显著潜力。然而,不同人工编写提示的质量差异会导致LLM性能出现显著差距,而改进提示通常需要大量人力投入和专业知识。为此,本文提出基于演员-评论家编辑的提示优化方法(PACE),使LLM能够实现自动化提示编辑。受强化学习中演员-评论家算法的启发,PACE将LLM同时用作演员和评论家双重角色,将提示概念化为一种策略类型。PACE通过整合执行提示的演员与批评响应的评论家提供的反馈来优化提示。该过程借助LLM的真实响应与思考,帮助LLM使提示更好地适配特定任务。我们在24个指令归纳任务和21个big-bench任务上开展了广泛实验。实验结果表明,PACE可将中/低质量人工编写提示的相对性能提升高达98%,达到与高质量人工编写提示相当的性能水平。此外,PACE在提示生成方面也展现出显著效能。