Modern data warehouses extend SQL with semantic operators that invoke large language models on each qualifying row, but the per-row inference cost is prohibitive at scale. Model cascades reduce this cost by routing most rows through a fast proxy model and delegating uncertain cases to an expensive oracle. Existing frameworks, however, require global dataset access and optimize a single quality metric, limiting their applicability in distributed systems where data is partitioned across independent workers. We present two adaptive cascade algorithms designed for streaming, per-partition execution in which each worker processes its partition independently without inter-worker communication. SUPG-IT extends the SUPG statistical framework to streaming execution with iterative threshold refinement and joint precision-recall guarantees. GAMCAL replaces user-specified quality targets with a learned calibration model: a Generalized Additive Model maps proxy scores to calibrated probabilities with uncertainty quantification, enabling direct optimization of a cost-quality tradeoff through a single parameter. Experiments on six datasets in a production semantic SQL engine show that both algorithms achieve F1 > 0.95 on every dataset. GAMCAL achieves higher F1 per oracle call at cost-sensitive operating points, while SUPG-IT reaches a higher quality ceiling with formal guarantees on precision and recall.
翻译:现代数据仓库通过语义操作符扩展SQL,对每行符合条件的数据调用大语言模型,但每行推理成本在规模扩展时高得令人望而却步。模型级联通过将大部分行路由至快速代理模型,并将不确定情况委托给昂贵的oracle模型来降低此成本。然而,现有框架需要全局数据集访问并优化单一质量指标,这限制了它们在数据跨独立工作节点分区的分布式系统中的适用性。我们提出两种专为流式、每分区执行设计的自适应级联算法,其中每个工作节点独立处理其分区,无需节点间通信。SUPG-IT将SUPG统计框架扩展至具有迭代阈值细化和联合精确率-召回率保证的流式执行。GAMCAL用学习到的校准模型替代用户指定的质量目标:一个广义加性模型将代理分数映射至具有不确定性量化的校准概率,使得通过单一参数直接优化成本-质量权衡成为可能。在生产级语义SQL引擎的六个数据集上的实验表明,两种算法在每个数据集上均实现F1 > 0.95。GAMCAL在成本敏感的操作点上每次oracle调用实现更高的F1值,而SUPG-IT以精确率和召回率的正式保证达到更高的质量上限。