Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly effective for all modules. We study prompt optimization for LM programs, i.e. how to update these prompts to maximize a downstream metric without access to module-level labels or gradients. To make this tractable, we factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module and introduce several strategies to craft task-grounded instructions and navigate credit assignment across modules. Our strategies include (i) program- and data-aware techniques for proposing effective instructions, (ii) a stochastic mini-batch evaluation function for learning a surrogate model of our objective, and (iii) a meta-optimization procedure in which we refine how LMs construct proposals over time. Using these insights we develop MIPRO, a novel optimizer that outperforms baselines on five of six diverse LM programs using a best-in-class open-source model (Llama-3-8B), by as high as 12.9% accuracy. We will release our new optimizers and benchmark in DSPy at https://github.com/stanfordnlp/dspy
翻译:语言模型程序,即模块化语言模型(LM)调用的复杂流水线,正日益推动自然语言处理任务的发展,但这些程序需要精心设计对所有模块均有效的提示。本文研究语言模型程序的提示优化问题,即如何在无法获取模块级标签或梯度的情况下更新这些提示以最大化下游指标。为使问题可解,我们将问题分解为优化每个模块的自由形式指令和少量示例演示,并引入多种策略来构建任务导向的指令及处理模块间的信用分配。我们的策略包括:(一)基于程序与数据的感知技术以生成有效指令;(二)通过随机小批量评估函数学习目标函数的代理模型;(三)采用元优化流程随时间优化语言模型生成提案的方式。基于这些洞见,我们开发了新型优化器MIPRO,在使用顶尖开源模型(Llama-3-8B)的六种多样化语言模型程序测试中,其在五项任务上超越基线方法,最高提升准确率达12.9%。我们将在DSPy框架中发布新型优化器与基准测试平台:https://github.com/stanfordnlp/dspy