PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.

翻译：高效且面向特定任务的提示通常需要专家基于对大型语言模型特性及目标任务复杂性的深刻理解，通过精心设计集成详细指令与领域洞察。然而，自动化生成此类专家级提示仍具挑战性。现有提示优化方法往往忽视领域知识的深度，难以高效探索专家级提示的广阔空间。针对此问题，我们提出PromptAgent——一种能够自主生成堪比专家手写质量提示的优化方法。其核心思想将提示优化视为战略规划问题，采用基于蒙特卡洛树搜索的规划算法，策略性地导航专家级提示空间。受人类试错式探索的启发，PromptAgent通过反思模型错误并生成建设性错误反馈，归纳出精准的专家级洞察与深度指令。这一新颖框架使智能体能够迭代检查中间提示（状态）、基于错误反馈优化提示（动作）、模拟未来奖励，并搜索通往专家提示的高奖励路径。我们在涵盖三大实际领域的12项任务（包括BIG-Bench Hard、领域特定任务及通用NLP任务）上应用PromptAgent，结果表明其显著优于强链式思维推理及近期提示优化基线方法。大量分析强调了该方法在高效与泛化性方面构建专家级、细粒度且富含领域洞见的提示的能力。