Analytical tools often require real-time responses for highly concurrent parameterized workloads. A common solution is to answer queries using materialized subexpressions, hence reducing processing at runtime. However, as queries are still processed individually, concurrent outstanding computations accumulate and increase response times. By contrast, shared execution mitigates the effect of concurrency and improves scalability by exploiting overlapping work between queries but does so using heavyweight shared operators that result in high response times. Thus, on their own, both reuse and work sharing fail to provide real-time responses for large batches. Furthermore, naively combining the two approaches is ineffective and can deteriorate performance due to increased filtering costs, reduced marginal benefits, and lower reusability. In this work, we present ParCuR, a framework that harmonizes reuse with work sharing. ParCuR adapts reuse to work sharing in four aspects: i) to reduce filtering costs, it builds access methods on materialized results, ii) to resolve the conflict between benefits from work sharing and materialization, it introduces a sharing-aware materialization policy, iii) to incorporate reuse into sharing-aware optimization, it introduces a two-phase optimization strategy, and iv) to improve reusability and to avoid performance cliffs when queries are partially covered, especially during workload shifts, it combines partial reuse with data clustering based on historical batches. ParCuR outperforms a state-of-the-art work-sharing database by 6.4x and 2x in the SSB and TPC-H benchmarks respectively
翻译:分析工具通常需要对高度并发的参数化工作负载提供实时响应。常见解决方案是利用物化子表达式回答查询,从而减少运行时处理开销。然而,由于查询仍被逐一处理,并发未完成计算会累积并增加响应时间。相比之下,共享执行通过利用查询间的重叠工作来减轻并发影响并提升可扩展性,但采用的重型共享运算符会导致高响应时间。因此,单独使用重用或工作共享都无法为大批量查询提供实时响应。此外,简单结合这两种方法效果不佳,会因过滤成本增加、边际效益降低以及可重用性下降而导致性能恶化。本文提出名为ParCuR的框架,该框架能协调重用与工作共享。ParCuR从四个方面将重用适配至工作共享:i)为降低过滤成本,它在物化结果上构建访问方法;ii)为解决工作共享收益与物化之间的冲突,它引入共享感知物化策略;iii)为将重用融入共享感知优化,它提出两阶段优化策略;iv)为提升可重用性并避免查询部分覆盖时的性能悬崖(尤其在工作负载变化时),它结合基于历史批次的局部重用与数据聚类。在SSB和TPC-H基准测试中,ParCuR相比最先进的工作共享数据库分别获得6.4倍和2倍的性能提升。