Avoiding Materialisation for Guarded Aggregate Queries

Database systems are often confronted with queries that join many tables but ultimately only output comparatively small aggregate information. Despite all advances in query optimisation, the explosion of intermediate results as opposed to a much smaller final result challenges modern relational database management systems (DBMSs). In this work, we propose the integration of optimisation techniques into relational DBMSs that aim at minimising, and often entirely eliminating, the need for materialising join results for aggregate queries, provided that they satisfy certain conditions. Apart from novel logical optimisations aimed at practicability, we also provide new, natural, physical operators for combining joins and counting with the aim of reducing the size of intermediate results. We experimentally validate the efficacy of our optimisations through their implementation in Spark SQL, but we note that they are naturally applicable in any RDBMS. Our experiments show consistent significant speed-ups -- often by factor 2 and higher -- for analytical and graph queries. At the same time, we observe no performance degradation, even on queries which, from a theoretical point of view, are least amenable to the proposed optimisations.

翻译：数据库系统常常面临需要连接多张表但最终仅输出相对较小聚合信息的查询。尽管查询优化技术已取得诸多进展，中间结果的爆炸性增长与最终结果的较小规模形成鲜明对比，这对现代关系数据库管理系统（DBMSs）构成了挑战。在本研究中，我们提出将优化技术集成到关系型DBMS中，旨在最小化乃至完全消除对满足特定条件的聚合查询进行连接结果物化的需求。除了针对实用性设计的新型逻辑优化方法外，我们还提出了新颖且自然的物理操作符，用于结合连接与计数操作，以缩减中间结果的规模。我们通过在Spark SQL中实现这些优化技术，对其有效性进行了实验验证，但需指出这些技术天然适用于任何关系型DBMS。实验结果表明，对于分析型查询和图查询，我们的优化方案能持续带来显著的加速效果——通常可达2倍或更高。同时，即使在理论上最不适合采用所提优化方案的查询中，我们也未观察到任何性能下降。

相关内容

AIM

关注 0

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日