Learning representations for query plans play a pivotal role in machine learning-based query optimizers of database management systems. To this end, particular model architectures are proposed in the literature to convert the tree-structured query plans into representations with formats learnable by downstream machine learning models. However, existing research rarely compares and analyzes the query plan representation capabilities of these tree models and their direct impact on the performance of the overall optimizer. To address this problem, we perform a comparative study to explore the effect of using different state-of-the-art tree models on the optimizer's cost estimation and plan selection performance in relatively complex workloads. Additionally, we explore the possibility of using graph neural networks (GNN) in the query plan representation task. We propose a novel tree model combining directed GNN with Gated Recurrent Units (GRU) and demonstrate experimentally that the new tree model provides significant improvements to cost estimation tasks and relatively excellent plan selection performance compared to the state-of-the-art tree models.
翻译:学习查询计划的表示在基于机器学习的数据库管理系统查询优化器中起着关键作用。为此,文献中提出了特定的模型架构,将树形结构的查询计划转换为下游机器学习模型可学习的表示格式。然而,现有研究很少比较和分析这些树模型的查询计划表示能力及其对整体优化器性能的直接影响。为解决这一问题,我们开展了一项比较研究,探究在相对复杂的工作负载下,使用不同最先进树模型对优化器的代价估计和计划选择性能的影响。此外,我们探索了在图神经网络(GNN)在查询计划表示任务中的应用可能性。我们提出了一种结合有向GNN与门控循环单元(GRU)的新型树模型,并通过实验证明,与最先进的树模型相比,该新树模型在代价估计任务上带来了显著改进,并在计划选择性能上表现出相对优异的效果。