Reqo：一种用于鲁棒且可解释查询优化的综合性学习型成本模型 (Reqo: A Comprehensive Learning-Based Cost Model for Robust and Explainable Query Optimization)

Although machine learning (ML) shows potential in improving query optimization by generating and selecting more efficient plans, ensuring the robustness of learning-based cost models (LCMs) remains challenging. These LCMs currently lack explainability, which undermines user trust and limits the ability to derive insights from their cost predictions to improve plan quality. Accurately converting tree-structured query plans into representations via tree models is also essential, as omitting any details may negatively impact subsequent cost model performance. Additionally, inherent uncertainty in cost estimation leads to inaccurate predictions, resulting in suboptimal plan selection. To address these challenges, we introduce Reqo, a Robust and Explainable Query Optimization cost model that comprehensively enhances three main stages in query optimization: plan generation, plan representation, and plan selection. Reqo integrates three innovations: the first explainability technique for LCMs that quantifies subgraph contributions and produces plan generation hints to enhance candidate plan quality; a novel tree model based on Bidirectional Graph Neural Networks (Bi-GNNs) with a Gated Recurrent Unit (GRU) aggregator to further capture both node-level and structural information and effectively strengthen plan representation; and an uncertainty-aware learning-to-rank cost estimator that adaptively integrates cost estimates with uncertainties to enhance plan selection robustness. Extensive experiments demonstrate that Reqo outperforms state-of-the-art approaches across all three stages.

翻译：尽管机器学习（ML）在通过生成和选择更高效计划以改进查询优化方面展现出潜力，但确保学习型成本模型（LCM）的鲁棒性仍具挑战性。当前这些LCM缺乏可解释性，这削弱了用户信任，并限制了从其成本预测中获取洞见以提升计划质量的能力。通过树模型将树状查询计划准确转换为表示形式同样至关重要，因为遗漏任何细节都可能对后续成本模型性能产生负面影响。此外，成本估计中固有的不确定性会导致预测不准确，进而造成次优的计划选择。为应对这些挑战，我们提出了Reqo，一种鲁棒且可解释的查询优化成本模型，它全面增强了查询优化中的三个主要阶段：计划生成、计划表示和计划选择。Reqo集成了三项创新：首个针对LCM的可解释性技术，用于量化子图贡献并生成计划生成提示以提升候选计划质量；一种基于双向图神经网络（Bi-GNN）与门控循环单元（GRU）聚合器的新型树模型，以进一步捕获节点级和结构信息，有效强化计划表示；以及一种不确定性感知的学习排序成本估计器，能够自适应地整合成本估计与不确定性，以增强计划选择的鲁棒性。大量实验表明，Reqo在所有三个阶段均优于现有最先进方法。