DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Omar Khattab,Arnav Singhvi,Paridhi Maheshwari,Zhiyuan Zhang,Keshav Santhanam,Sri Vardhamanan,Saiful Haq,Ashutosh Sharma,Thomas T. Joshi,Hanna Moazam,Heather Miller,Matei Zaharia,Christopher Potts

The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy

翻译：机器学习社区正在快速探索对语言模型（LM）进行提示的技术，以及将多个LM堆叠成流水线以解决复杂任务的方法。然而，现有的LM流水线通常通过硬编码的“提示模板”（即通过试错发现的冗长字符串）实现。为开发与优化LM流水线提供更系统化的方法，我们引入了DSPy，这是一种编程模型，将LM流水线抽象为文本转换图，即通过声明式模块调用LM的指令式计算图。DSPy模块是可参数化的，意味着它们可以通过创建和收集演示样本，学习如何组合应用提示、微调、增强和推理技术。我们设计了一个编译器，用于优化任意DSPy流水线，以最大化给定指标。通过两项案例研究，我们展示了简洁的DSPy程序能够表达并优化复杂的LM流水线，这些流水线可解决数学应用题、处理多跳检索、回答复杂问题以及控制智能体循环。在编译几分钟内，少量DSPy代码使GPT-3.5和llama2-13b-chat能够自举流水线，其性能优于标准少样本提示（分别高出约25%和65%）以及使用专家创建演示样本的流水线（分别高出5-46%和16-40%）。此外，编译后的DSPy程序（面向770M参数的T5和llama2-13b-chat等小型开源LM）在性能上可与依赖专家编写的、为专有GPT-3.5设计的提示链的方法相媲美。DSPy可从https://github.com/stanfordnlp/dspy 获取。