Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, are increasingly reliant on recursive queries for data analysis. Yet traditional relational algebra-based query optimization techniques do not scale well to recursive query processing due to the iterative nature of query evaluation, where relation cardinalities can change unpredictably during the course of a single query execution. To avoid error-prone cardinality estimation, adaptive query processing techniques use runtime information to inform query optimization, but these systems are not optimized for the specific needs of recursive query processing. In this paper, we introduce Adaptive Metaprogramming, an innovative technique that shifts recursive query optimization and code generation from compile-time to runtime using principled metaprogramming, enabling dynamic optimization and re-optimization before and after query execution has begun. We present a custom join-ordering optimization applicable at multiple stages during query compilation and execution. Through Carac, a custom Datalog engine, we evaluate the optimization potential of Adaptive Metaprogramming and show unoptimized recursive query execution time can be improved by three orders of magnitude and hand-optimized queries by 6x.
翻译:性能关键型工业应用(包括大规模程序分析、网络分析和分布式系统分析)日益依赖递归查询进行数据处理。然而,传统基于关系代数的查询优化技术难以有效扩展至递归查询处理,因为查询评估的迭代特性导致单个查询执行过程中关系基数可能发生不可预测的变化。为避免易出错的基数估计,自适应查询处理技术利用运行时信息指导查询优化,但这些系统并未针对递归查询处理的特定需求进行优化。本文提出自适应元编程(Adaptive Metaprogramming)这一创新技术,通过原则性元编程将递归查询优化与代码生成从编译时转移至运行时,从而在查询执行开始前后实现动态优化与重优化。我们提出一种可在查询编译和执行多个阶段应用的定制化连接顺序优化方案。通过自定义Datalog引擎Carac,我们评估了自适应元编程的优化潜力,结果表明未经优化的递归查询执行时间可提升三个数量级,而手工优化查询的性能可提升6倍。