Database systems are often confronted with queries that join many tables but ultimately only output comparatively small aggregate information. Despite all advances in query optimisation, the explosion of intermediate results as opposed to a much smaller final result challenges modern relational database management systems (DBMSs). In this work, we propose the integration of optimisation techniques into relational DBMSs that aim at minimising, and often entirely eliminating, the need for materialising join results for aggregate queries, provided that they satisfy certain conditions. Apart from novel logical optimisations aimed at practicability, we also provide new, natural, physical operators for combining joins and counting with the aim of reducing the size of intermediate results. We experimentally validate the efficacy of our optimisations through their implementation in Spark SQL, but we note that they are naturally applicable in any RDBMS. Our experiments show consistent significant speed-ups -- often by factor 2 and higher -- for analytical and graph queries. At the same time, we observe no performance degradation, even on queries which, from a theoretical point of view, are least amenable to the proposed optimisations.
翻译:数据库系统常常面临需要连接多张表但最终仅输出相对较小聚合信息的查询。尽管查询优化技术已取得诸多进展,中间结果的爆炸性增长与最终结果的较小规模形成鲜明对比,这对现代关系数据库管理系统(DBMSs)构成了挑战。在本研究中,我们提出将优化技术集成到关系型DBMS中,旨在最小化乃至完全消除对满足特定条件的聚合查询进行连接结果物化的需求。除了针对实用性设计的新型逻辑优化方法外,我们还提出了新颖且自然的物理操作符,用于结合连接与计数操作,以缩减中间结果的规模。我们通过在Spark SQL中实现这些优化技术,对其有效性进行了实验验证,但需指出这些技术天然适用于任何关系型DBMS。实验结果表明,对于分析型查询和图查询,我们的优化方案能持续带来显著的加速效果——通常可达2倍或更高。同时,即使在理论上最不适合采用所提优化方案的查询中,我们也未观察到任何性能下降。