Query-driven learned estimators are accurate, flexible, and lightweight alternatives to traditional estimators in query optimization. However, existing query-driven approaches struggle with the Out-of-distribution (OOD) problem, where the test workload distribution differs from the training workload, leading to performancedegradation. In this paper, we present CardOOD, a general learning framework designed to construct robust query-driven cardinality estimators that are resilient against the OOD problem. Our framework focuses on offline training algorithms that develop one-off models from a static workload, suitable for model initialization and periodic retraining. In CardOOD, we extend classical transfer/robust learning techniques to train query-driven cardinalityestimators, and the algorithms fall into three categories: representation learning, data manipulation, and new learning strategies. As these learning techniques are originally evaluated in computervision tasks, we also propose a new learning algorithm that exploits the property of cardinality estimation. This algorithm, lying in the category of new learning strategy, models the partial order constraint of cardinalities by a self-supervised learning task. Comprehensive experimental studies demonstrate the efficacy of the algorithms of CardOOD in mitigating the OOD problem to varying extents. We further integrate CardOOD into PostgreSQL, showcasing its practical utility in query optimization.
翻译:查询驱动学习估计器是查询优化中传统估计器的准确、灵活且轻量级替代方案。然而,现有查询驱动方法在处理分布外问题时面临挑战,即测试工作负载分布与训练工作负载分布存在差异,导致性能下降。本文提出CardOOD,一个旨在构建对分布外问题具有鲁棒性的查询驱动基数估计器的通用学习框架。该框架专注于从静态工作负载开发一次性模型的离线训练算法,适用于模型初始化和定期重训练。在CardOOD中,我们扩展了经典的迁移/鲁棒学习技术以训练查询驱动基数估计器,相关算法分为三类:表示学习、数据操纵和新学习策略。由于这些学习技术最初在计算机视觉任务中进行评估,我们还提出了一种利用基数估计特性的新学习算法。该算法属于新学习策略范畴,通过自监督学习任务对基数偏序约束进行建模。综合实验研究表明,CardOOD的各类算法在不同程度上有效缓解了分布外问题。我们进一步将CardOOD集成到PostgreSQL中,展示了其在查询优化中的实际效用。