PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.

翻译：摘要：高度有效的任务特定提示通常需要专家基于对大型语言模型（LLM）习性的深刻理解及目标任务的复杂性，通过大量工程化手段整合详细指令与领域洞察。然而，自动化生成此类专家级提示仍具挑战性。现有提示优化方法往往忽视领域知识的深度，且难以高效探索专家级提示的广阔空间。为解决这一问题，我们提出PromptAgent——一种能够自主生成与专家手工编写质量相当的提示的优化方法。其核心将提示优化视为战略规划问题，并采用基于蒙特卡洛树搜索的原则性规划算法，在专家级提示空间中进行战略性导航。受人类试错探索模式的启发，PromptAgent通过反思模型错误并生成建设性错误反馈，诱导出精准的专家级洞察与深度指令。这一创新框架使智能体能够迭代评估中间提示（状态）、基于错误反馈进行优化（动作）、模拟未来奖励，并搜索通向专家级提示的高奖励路径。我们将PromptAgent应用于涵盖三大实际领域的12项任务：BIG-Bench Hard（BBH）、领域特定任务及通用NLP任务，结果表明其显著优于强链式思维及最新提示优化基线方法。广泛分析强调该方法在高效性、泛化性方面具备生成专家级、细节丰富且蕴含领域洞见的提示的能力。