A Declarative System for Optimizing AI Workloads

Modern AI models provide the key to a long-standing dream: processing analytical queries about almost any kind of data. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or insights from image and video corpora. Today's models can accomplish these tasks with high accuracy. However, a programmer who wants to answer a substantive AI-powered query must orchestrate large numbers of models, prompts, and data operations. For even a single query, the programmer has to make a vast number of decisions such as the choice of model, the right inference method, the most cost-effective inference hardware, the ideal prompt design, and so on. The optimal set of decisions can change as the query changes and as the rapidly-evolving technical landscape shifts. In this paper we present Palimpzest, a system that enables anyone to process AI-powered analytical queries simply by defining them in a declarative language. The system uses its cost optimization framework -- which explores the search space of AI models, prompting techniques, and related foundation model optimizations -- to implement the query with the best trade-offs between runtime, financial cost, and output data quality. We describe the workload of AI-powered analytics tasks, the optimization methods that Palimpzest uses, and the prototype system itself. We evaluate Palimpzest on tasks in Legal Discovery, Real Estate Search, and Medical Schema Matching. We show that even our simple prototype offers a range of appealing plans, including one that is 3.3x faster, 2.9x cheaper, and offers better data quality than the baseline method. With parallelism enabled, Palimpzest can produce plans with up to a 90.3x speedup at 9.1x lower cost relative to a single-threaded GPT-4 baseline, while obtaining an F1-score within 83.5% of the baseline. These require no additional work by the user.

翻译：现代AI模型为实现一个长期以来的梦想提供了关键：处理关于几乎所有类型数据的分析查询。直到最近，从公司文档中提取事实、从科学论文中获取数据，或从图像与视频语料库中获取洞见，仍是困难且昂贵的任务。如今的模型能够以高精度完成这些任务。然而，想要回答一个实质性的AI驱动查询的程序员，必须协调大量的模型、提示词和数据操作。即使对于单个查询，程序员也必须做出大量决策，例如模型的选择、正确的推理方法、最具成本效益的推理硬件、理想的提示设计等等。最优的决策组合会随着查询的变化以及快速演进的技术格局而改变。在本文中，我们提出了Palimpzest系统，它使任何人都能够仅通过使用声明式语言定义查询来处理AI驱动的分析查询。该系统利用其成本优化框架——该框架探索了AI模型、提示技术及相关基础模型优化的搜索空间——以在运行时间、财务成本和输出数据质量之间实现最佳权衡的方式来执行查询。我们描述了AI驱动分析任务的工作负载、Palimpzest使用的优化方法以及原型系统本身。我们在法律发现、房地产搜索和医疗模式匹配等任务上对Palimpzest进行了评估。结果表明，即使是我们简单的原型系统也能提供一系列有吸引力的执行计划，其中包括一个比基线方法快3.3倍、成本低2.9倍且提供更好数据质量的计划。在启用并行的情况下，相对于单线程GPT-4基线，Palimpzest能够生成速度提升高达90.3倍、成本降低9.1倍的执行计划，同时获得的F1分数达到基线的83.5%。所有这些都无需用户进行额外工作。