Database workloads are increasingly nesting artificial intelligence (AI) and machine learning (ML) pipelines and AI/ML model inferences with data processing, yielding hybrid SQL+AI/ML queries that mix relational operators with expensive, opaque AI/ML operators, often expressed as UDFs. These workloads are challenging to optimize because ML operators behave like black boxes, data-dependent effects such as sparsity, selectivity, and cardinalities can dominate runtime, domain experts often rely on practical heuristics that are difficult to develop with monolithic optimizers, and AI/ML operators introduce numerous co-optimization opportunities such as factorization, pushdown, ML-to-SQL conversion, and linear-algebra-to-relational-algebra rewrites, significantly enlarging the search space of equivalent execution plans. At the same time, research prototypes for SQL+ML optimization are difficult to evaluate fairly because they are typically developed on different platforms and evaluated using different queries. We present OptBench, an interactive workbench for building and benchmarking query optimizers for hybrid SQL+AI/ML queries in a transparent, apples-to-apples manner. OptBench runs all optimizers on a unified backend using DuckDB and exposes an interactive web interface that allows users to (i) construct query optimizers by leveraging and extending abstracted logical plan rewrite actions, (ii) benchmark and compare different optimizer implementations over a suite of diverse queries while recording decision traces and latency, and (iii) visualize logical plans produced by different optimizers side-by-side. The system enables practitioners and researchers to prototype optimizer ideas, inspect plan transformations, and quantitatively compare optimizer designs on multimodal inference queries within a single workbench.
翻译:数据库工作负载日益将人工智能(AI)与机器学习(ML)流水线、AI/ML模型推断与数据处理相嵌套,产生了混合型SQL+AI/ML查询。这类查询将关系型算子与通常以UDF形式表达的、计算代价高昂且不透明的AI/ML算子相结合。此类工作负载的优化面临多重挑战:ML算子行为如同黑盒;数据相关效应(如稀疏性、选择性和基数)常主导运行时性能;领域专家通常依赖难以通过单体化优化器开发的实用启发式方法;同时,AI/ML算子引入了大量协同优化机会,例如因子化、下推、ML-to-SQL转换以及线性代数到关系代数的重写,这显著扩大了等效执行计划的搜索空间。与此同时,由于SQL+ML优化研究原型通常基于不同平台开发并使用不同查询进行评估,难以进行公平比较。本文提出OptBench——一个用于以透明、公平可比的方式构建和评估混合SQL+AI/ML查询优化器的交互式工作台。OptBench基于DuckDB统一后端运行所有优化器,并通过交互式Web界面支持用户:(1)利用并扩展抽象化的逻辑计划重写操作来构建查询优化器;(2)在多样化查询套件上对不同的优化器实现进行基准测试与比较,同时记录决策轨迹与延迟;(3)并排可视化不同优化器生成的逻辑计划。该系统使实践者与研究者能够在单一工作台内快速原型化优化器构想、检查计划转换过程,并对多模态推断查询的优化器设计进行定量比较。