Data dependency-based query optimization techniques can considerably improve database system performance: we apply three such optimization techniques to five database management systems (DBMSs) and observe throughput improvements between 5 % and 33 %. We address two key challenges to achieve these results: (i) efficiently identifying and extracting relevant dependencies from the data, and (ii) making use of the dependencies through SQL rewrites or as transformation rules in the optimizer. First, the schema does not provide all relevant dependencies. We present a workload-driven dependency discovery approach to find additional dependencies within milliseconds. Second, the throughput improvement of a state-of-the-art DBMS is 13 % using only SQL rewrites, but 20 % when we integrate dependency-based optimization into the optimizer and execution engine, e. g., by employing dependency propagation and subquery handling. Using all relevant dependencies, the runtime of four standard benchmarks improves by up to 10 % compared to using only primary and foreign keys, and up to 22 % compared to not using dependencies. The dependency discovery overhead amortizes after a single workload execution.
翻译:基于数据依赖的查询优化技术可显著提升数据库系统性能:我们将三种此类优化技术应用于五种数据库管理系统(DBMS),观察到吞吐量提升介于5%至33%之间。为实现这些结果,我们解决了两个关键挑战:(一)高效识别并提取数据中的相关依赖;(二)通过SQL重写或将依赖作为优化器中的转换规则加以利用。首先,模式(schema)并未提供所有相关依赖。我们提出一种基于工作负载的依赖发现方法,可在毫秒级时间内发现额外依赖。其次,仅使用SQL重写时,前沿DBMS的吞吐量提升为13%,而将基于依赖的优化集成至优化器与执行引擎(例如通过依赖传播和子查询处理)后,吞吐量提升可达20%。使用所有相关依赖时,四个标准基准测试的运行时间相较于仅使用主键和外键的情况最高提升10%,相较于完全不使用依赖的情况最高提升22%。依赖发现的成本在单次工作负载执行后即可摊销。