AutoPDL：面向大语言模型智能体的自动提示优化方法 (AutoPDL: Automatic Prompt Optimization for LLM Agents)

The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot demonstrations). Manually tuning this combination is tedious, error-prone, and specific to a given LLM and task. Therefore, this paper proposes AutoPDL, an automated approach to discovering good LLM agent configurations. Our approach frames this as a structured AutoML problem over a combinatorial space of agentic and non-agentic prompting patterns and demonstrations, using successive halving to efficiently navigate this space. We introduce a library implementing common prompting patterns using the PDL prompt programming language. AutoPDL solutions are human-readable, editable, and executable PDL programs that use this library. This approach also enables source-to-source optimization, allowing human-in-the-loop refinement and reuse. Evaluations across three tasks and seven LLMs (ranging from 3B to 70B parameters) show consistent accuracy gains ($9.21\pm15.46$ percentage points), up to 67.5pp, and reveal that selected prompting strategies vary across models and tasks.

翻译：大语言模型（LLMs）的性能表现高度依赖于其提示方式，这涉及从高层提示模式（如零样本、思维链、ReAct、ReWOO）到具体提示内容（指令与少样本示例）的多层次选择。人工调整这种组合不仅繁琐易错，且通常仅适用于特定的大语言模型与任务。为此，本文提出AutoPDL——一种自动发现优质大语言模型智能体配置的方法。我们将该问题构建为结构化自动机器学习问题，在包含智能体与非智能体提示模式及示例的组合空间中进行搜索，并采用逐次减半算法实现高效探索。我们开发了一个基于PDL提示编程语言的函数库，实现了常见提示模式的标准化封装。AutoPDL生成的解决方案是可直接阅读、编辑与执行的人类可读PDL程序，这些程序调用该函数库实现功能。该方法同时支持源码级优化，允许进行人机协同的精细化调整与复用。通过在三个任务和七种大语言模型（参数量覆盖3B至70B）上的实验评估，本方法实现了稳定的准确率提升（平均提升$9.21\pm15.46$个百分点，最高达67.5个百分点），并揭示了不同模型与任务间最优提示策略的差异性。