LLM-augmented data systems enable semantic querying over structured and unstructured data, but executing queries with LLM-powered operators introduces a fundamental runtime--accuracy trade-off. In this paper, we present Stretto, a new execution engine that provides end-to-end query guarantees while efficiently navigating this trade-off in a holistic manner. For this, Stretto formulates query planning as a constrained optimization problem and uses a gradient-based optimizer to jointly select operator implementations and allocate error budgets across pipelines. Moreover, to enable fine-grained execution choices, Stretto introduces a novel idea on how KV-caching can be used to realize a spectrum of different physical operators that transform a sparse design space into a dense continuum of runtime--accuracy trade-offs. Experiments show that Stretto outperforms state-of-the-art systems while consistently meeting quality guarantees.
翻译:LLM增强数据系统支持对结构化和非结构化数据进行语义查询,但使用LLM驱动的算子执行查询会引入运行时与准确性的根本性权衡。本文提出Stretto——一种新型执行引擎,该引擎以整体方式高效协调这种权衡,同时提供端到端的查询保证。为此,Stretto将查询规划构建为约束优化问题,并采用基于梯度的优化器联合选择算子实现方案及在流水线间分配误差预算。此外,为实现细粒度执行选择,Stretto提出一项创新理念:通过KV缓存技术实现一系列不同的物理算子,从而将稀疏的设计空间转化为运行时与准确性权衡的连续密集谱系。实验表明,Stretto在持续满足质量保证的同时,性能优于现有先进系统。