Optimising queries with many joins is known to be a hard problem. The explosion of intermediate results as opposed to a much smaller final result poses a serious challenge to modern database management systems (DBMSs). This is particularly glaring in case of analytical queries that join many tables, but ultimately only output comparatively small aggregate information. Analogous problems are faced by graph database systems when processing analytical queries with aggregates on top of complex path queries. In this work, we propose novel optimisation techniques both, on the logical and physical level, that allow us to avoid the materialisation of join results for certain types of aggregate queries. The key to these optimisations is the notion of guardedness, by which we impose restrictions on the occurrence of attributes in GROUP BY clauses and in aggregate expressions. The efficacy of our optimisations is validated through their implementation in Spark SQL and extensive empirical evaluation on various standard benchmarks.
翻译:优化包含大量连接操作的查询一直是一个公认的难题。相较于规模小得多的最终结果,中间结果的爆炸性增长对现代数据库管理系统构成了严峻挑战。这在连接多表但最终仅输出相对较小聚合信息的分析型查询中尤为突出。图数据库系统在处理基于复杂路径查询的聚合分析查询时,也面临着类似问题。在本研究中,我们提出了在逻辑层和物理层均适用的新型优化技术,使得我们能够避免为特定类型的聚合查询物化连接结果。这些优化的关键在于防护性概念,即对GROUP BY子句和聚合表达式中属性的出现施加限制。我们通过在Spark SQL中实现这些优化,并在多种标准基准测试上进行广泛的实证评估,验证了其有效性。