Trees can accelerate queries that search or aggregate values over large collections. They achieve this by storing metadata that enables quick pruning (or inclusion) of subtrees when predicates on that metadata can prove that none (or all) of the data in a subtree affect the query result. Existing systems implement this pruning logic manually for each query predicate and data structure. We generalize and mechanize this class of optimization. Our method derives conditions for when subtrees can be pruned (or included wholesale), expressed in terms of the metadata available at each node. We efficiently generate these conditions using symbolic interval analysis, extended with new rules to handle geometric predicates (e.g., intersection, containment). Additionally, our compiler fuses compound queries (e.g., reductions on filters) into a single tree traversal. These techniques enable the automatic derivation of generalized single-index and dual-index tree joins that support a wide class of join predicates beyond standard equality and range predicates. The generated traversals match the behavior of expert-written code that implements query-specific traversals, and can asymptotically outperform the linear scans and nested-loop joins that existing systems fall back to when hand-written cases do not apply.
翻译:树结构通过存储元数据,可在对大规模数据集进行搜索或聚合查询时实现加速。其原理是:当基于元数据的谓词能够证明子树中无任何数据(或全部数据)影响查询结果时,可快速剪枝(或整体包含)该子树。现有系统需针对每个查询谓词和数据结构手动实现此类剪枝逻辑。本文对此类优化方法进行了泛化与机制化,推导出基于各节点可用元数据表达的子树可被剪枝(或整体包含)的条件。通过扩展符号区间分析并引入处理几何谓词(如相交、包含)的新规则,我们高效生成了这些条件。此外,编译器将复合查询(如针对过滤器的归约操作)融合为单次树遍历。这些技术能够自动推导广义单索引与双索引树连接操作,支持超越标准等值谓词和范围谓词的广泛连接谓词类别。所生成的遍历行为与专家编写的查询专用遍历代码完全一致,在现有系统因缺乏手工实现方案而退化为线性扫描和嵌套循环连接的情况下,可实现渐进性能提升。