Task Cascades for Efficient Unstructured Data Processing

Modern database systems allow users to query or process unstructured text or document columns using LLM-powered functions. Users can express an operation in natural language (e.g., "identify if this review mentions billing issues"), with the system executing the operation on each document, in a row-by-row fashion. One way to reduce cost on a batch of documents is to employ the model cascade framework: a cheap proxy model processes each document, and only uncertain cases are escalated to a more accurate, expensive oracle. However, model cascades miss important optimization opportunities; for example, often only part of a document is needed to answer a query, or other related, but simpler operations (e.g., "is the review sentiment negative?", "does the review mention money?") can be handled by cheap models more effectively than the original operation, while still being correlated with it. We introduce the task cascades framework, which generalizes model cascades by varying not just the model, but also the document portion and operation at each stage. Our framework uses an LLM agent to generate simplified, decomposed, or otherwise related operations and selects the most relevant document portions, constructing hundreds of candidate tasks from which it assembles a task cascade. We show that optimal cascade selection is intractable via reduction from Minimum Sum Set Cover, but our iterative approach constructs effective cascades. We also provide an extension that offers statistical accuracy guarantees: the resulting cascade meets a user-defined accuracy target (with respect to the oracle) up to a bounded failure probability. Across eight real-world document processing tasks at a 90% target accuracy, task cascades reduce end-to-end cost by an average of 36% compared to model cascades, at a production scale.

翻译：现代数据库系统允许用户使用基于大型语言模型（LLM）的函数来查询或处理非结构化文本或文档列。用户可以用自然语言表达操作（例如“识别此评论是否提及计费问题”），系统以逐行方式在每个文档上执行该操作。降低批量文档处理成本的一种方法是采用模型级联框架：廉价的代理模型处理每个文档，仅将不确定的案例升级至更准确但昂贵的专家模型。然而，模型级联忽略了重要的优化机会；例如，通常仅需文档的一部分即可回答查询，或者其它相关但更简单的操作（例如“评论情感是否为负面？”“评论是否提及金钱？”）可以由廉价模型比原始操作更有效地处理，同时仍与原始操作保持相关性。我们提出任务级联框架，该框架通过不仅在不同阶段改变模型，还改变文档片段和操作来推广模型级联。我们的框架使用LLM代理生成简化、分解或其它相关操作，并选择最相关的文档片段，从而构建数百个候选任务，并从中组装任务级联。我们通过从最小和集覆盖问题的归约证明最优级联选择是难解的，但我们的迭代方法能构建有效的级联。我们还提供了一个提供统计准确性保证的扩展：所得级联在有限失败概率内满足用户定义的（相对于专家模型的）准确性目标。在八个真实世界文档处理任务中，以90%目标准确性为基准，任务级联相比模型级联平均降低端到端成本36%，且适用于生产规模。