Relational database management systems (RDBMSes) can process general-purpose queries, but often have lower performance compared to custom-built solutions for specific queries. For example, consider a group-by query over a few known groups (e.g., grouping by country). While an RDBMS would likely use a hash map to do the grouping, a faster method could hard-code the expected groups into the query executor. But such workload-specific techniques, which we call query accelerators, are not widely used in practice because the engineering effort (optimizer and engine changes, potential bugs) does not justify the isolated performance gains (speedup on a single specific query). We propose Tailwind: an external query planner that brings accelerators into any RDBMS that supports data import/export. Users define their accelerators using abstract logical plans (ALPs): a new mostly-declarative abstraction over relational operators built on regular tree expressions. ALPs allow Tailwind to automatically build customized neural network models to estimate when using a particular accelerator is beneficial. At runtime, Tailwind sits atop an RDBMS and transparently rewrites queries to run across one or more accelerators when predicted to be beneficial, falling back to the underlying RDBMS when not. On Redshift and DuckDB with a library of four diverse accelerators, Tailwind accelerates TPC-H queries by 1.38x on average (up to 29x).
翻译:关系数据库管理系统(RDBMS)可以处理通用查询,但与针对特定查询的定制解决方案相比,性能通常较低。例如,考虑对少量已知分组(如按国家分组)进行分组查询。尽管RDBMS可能使用哈希表进行分组,但更快速的方法可以将预期分组硬编码到查询执行器中。然而,这种被称为查询加速器的工作负载特定技术在实践中并未被广泛采用,因为工程投入(优化器和引擎的修改、潜在错误)无法匹配孤立的性能收益(单个特定查询的提速)。我们提出Tailwind:一种外部查询规划器,可将加速器引入任何支持数据导入/导出的RDBMS中。用户使用抽象逻辑计划(ALP)定义加速器:这是一种基于正则树表达式的、在关系算子之上的新型声明式抽象。ALP使Tailwind能够自动构建定制的神经网络模型,用于评估何时使用特定加速器是有益的。运行时,Tailwind位于RDBMS之上,并透明地重写查询,使其在预测有益时跨一个或多个加速器运行,否则回退到底层RDBMS。在Redshift和DuckDB上,使用包含四个不同加速器的库,Tailwind平均将TPC-H查询加速1.38倍(最高达29倍)。