The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot demonstrations). Manually tuning this combination is tedious, error-prone, and specific to a given LLM and task. Therefore, this paper proposes AutoPDL, an automated approach to discovering good LLM agent configurations. Our approach frames this as a structured AutoML problem over a combinatorial space of agentic and non-agentic prompting patterns and demonstrations, using successive halving to efficiently navigate this space. We introduce a library implementing common prompting patterns using the PDL prompt programming language. AutoPDL solutions are human-readable, editable, and executable PDL programs that use this library. This approach also enables source-to-source optimization, allowing human-in-the-loop refinement and reuse. Evaluations across three tasks and seven LLMs (ranging from 3B to 70B parameters) show consistent accuracy gains ($9.21\pm15.46$ percentage points), up to 67.5pp, and reveal that selected prompting strategies vary across models and tasks.
翻译:大语言模型(LLMs)的性能表现高度依赖于其提示方式,这涉及从高层提示模式(如零样本、思维链、ReAct、ReWOO)到具体提示内容(指令与少样本示例)的多层次选择。人工调整这种组合不仅繁琐易错,且通常仅适用于特定的大语言模型与任务。为此,本文提出AutoPDL——一种自动发现优质大语言模型智能体配置的方法。我们将该问题构建为结构化自动机器学习问题,在包含智能体与非智能体提示模式及示例的组合空间中进行搜索,并采用逐次减半算法实现高效探索。我们开发了一个基于PDL提示编程语言的函数库,实现了常见提示模式的标准化封装。AutoPDL生成的解决方案是可直接阅读、编辑与执行的人类可读PDL程序,这些程序调用该函数库实现功能。该方法同时支持源码级优化,允许进行人机协同的精细化调整与复用。通过在三个任务和七种大语言模型(参数量覆盖3B至70B)上的实验评估,本方法实现了稳定的准确率提升(平均提升$9.21\pm15.46$个百分点,最高达67.5个百分点),并揭示了不同模型与任务间最优提示策略的差异性。