High-performance OLAP database technology has emerged with the growing demand for massive data analysis. To achieve much higher performance, many DBMSs adopt sophisticated designs including SIMD operators, parallel execution, and dynamic pipeline modification. However, such advanced OLAP query execution mechanisms still lack targeted Query Performance Prediction (QPP) methods because most existing methods target conventional tree-shaped query plans and static serial executors. To address this problem, in this paper, we proposed MERLIN a multi-stage query performance prediction method for high-performance OLAP DBMSs. MERLIN first establishes resource cost models for each physical operator. Then, it constructs a DAG that consists of a data-flow tree backbone and resource competition relationships among concurrent operators. After using a GAT with an extra attention mechanism to calibrate the cost, the cost vector tree is extracted and summarized by a TCN, ultimately enabling effective query performance prediction. Experimental results demonstrate that MERLIN yields higher performance prediction precision than existing methods.
翻译:高性能OLAP数据库技术随着海量数据分析需求的增长而兴起。为实现更高的性能,许多数据库管理系统采用了复杂的设计,包括SIMD算子、并行执行和动态管线调整。然而,此类先进的OLAP查询执行机制仍缺乏针对性的查询性能预测方法,因为现有方法大多面向传统的树形查询计划与静态串行执行器。为解决此问题,本文提出MERLIN——一种面向高性能OLAP数据库系统的多阶段查询性能预测方法。MERLIN首先为每个物理算子建立资源代价模型;随后构建包含数据流树主干与并发算子间资源竞争关系的有向无环图;在通过引入额外注意力机制的图注意力网络校准代价后,提取代价向量树并使用时序卷积网络进行汇总,最终实现有效的查询性能预测。实验结果表明,MERLIN相比现有方法具有更高的性能预测精度。