Query cost estimation is a classical task for database management. Recently, researchers apply the AI-driven model to implement query cost estimation for achieving high accuracy. However, two defects of feature design lead to poor cost estimation accuracy-time efficiency. On the one hand, existing works only encode the query plan and data statistics while ignoring some other important variables, like storage structure, hardware, database knobs, etc. These variables also have significant impact on the query cost. On the other hand, due to the straightforward encoding design, existing works suffer heavy representation learning burden on ineffective dimensions of input. To meet the above two problems, we first propose an efficient feature engineering for query cost estimation, called QCFE. Specifically, we design a novel feature called feature snapshot to efficiently integrate the influences of the ignored variables. Further, we propose a difference-propagation feature reduction method for query cost estimation to filter the useless features. The experimental results demonstrate our QCFE could largely improve the time-accuracy efficiency on extensive benchmarks.
翻译:查询代价估计是数据库管理中的经典任务。近年来,研究者采用人工智能驱动模型实现查询代价估计以获得高精度。然而,特征设计的两处缺陷导致代价估计的精度-时间效率低下。一方面,现有方法仅对查询计划和数据统计进行编码,忽略了存储结构、硬件、数据库配置参数等其他重要变量,这些变量同样显著影响查询代价。另一方面,由于直白的编码设计,现有方法在输入的非有效维度上承受着沉重的表征学习负担。针对上述两个问题,我们首先提出一种面向查询代价估计的高效特征工程——QCFE。具体而言,我们设计了一种称为特征快照的新型特征,以高效融合被忽略变量的影响。进一步,我们提出了一种面向查询代价估计的差异传播特征降维方法,以滤除无用特征。实验结果表明,我们的QCFE能在广泛基准测试中大幅提升时间-精度效率。