Abacus: A Cost-Based Optimizer for Semantic Operator Systems

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic operators: a declarative set of AI-powered data transformations with natural language specifications. These include LLM-powered maps, filters, joins, etc. used for document processing tasks such as information extraction, summarization, and more. While systems of semantic operators have achieved strong performance on benchmarks, they can be difficult to optimize. An optimizer for this setting must determine how to physically implement each semantic operator in a way that optimizes the system globally. Existing optimizers are limited in the number of optimizations they can apply, and most (if not all) cannot optimize system quality, cost, or latency subject to constraint(s) on the other dimensions. In this paper we present Abacus, an extensible, cost-based optimizer which searches for the best implementation of a semantic operator system given a (possibly constrained) optimization objective. Abacus estimates operator performance by leveraging a minimal set of validation examples, prior beliefs about operator performance, and/or an LLM judge. We evaluate Abacus on document processing workloads in the biomedical and legal domains (BioDEX; CUAD) and multi-modal question answering (MMQA). We demonstrate that, on-average, systems optimized by Abacus achieve 6.7%-39.4% better quality and are 10.8x cheaper and 3.4x faster than the next best system.

翻译：大型语言模型（LLM）为处理大规模非结构化文档集合开启了一类令人兴奋的新型数据处理应用。多个新兴编程框架使开发者能够通过组合语义算子来构建此类应用：这是一组声明式的、由人工智能驱动的数据转换操作，其规范使用自然语言描述。这些算子包括用于文档处理任务（如信息提取、摘要等）的LLM驱动的映射、过滤、连接等操作。尽管语义算子系统在基准测试中已展现出强大的性能，但其优化过程可能颇具挑战。针对此场景的优化器必须确定如何以物理方式实现每个语义算子，从而在全局范围内优化整个系统。现有优化器所能应用的优化手段有限，且大多数（若非全部）无法在给定其他维度约束条件的情况下，对系统质量、代价或延迟进行优化。本文提出Abacus，一个可扩展的、基于代价的优化器，它能够在给定（可能带约束的）优化目标下，搜索语义算子系统的最佳实现方案。Abacus通过利用一组最简验证示例、关于算子性能的先验信念和/或一个LLM评判器来估计算子性能。我们在生物医学与法律领域的文档处理任务（BioDEX；CUAD）以及多模态问答（MMQA）上对Abacus进行评估。实验表明，经Abacus优化的系统平均而言，其质量比次优系统高出6.7%–39.4%，同时代价降低10.8倍，速度提升3.4倍。