In recent years, machine learning-based cardinality estimation methods are replacing traditional methods. This change is expected to contribute to one of the most important applications of cardinality estimation, the query optimizer, to speed up query processing. However, none of the existing methods do not precisely estimate cardinalities when relational schemas consist of many tables with strong correlations between tables/attributes. This paper describes that multiple density estimators can be combined to effectively target the cardinality estimation of data with large and complex schemas having strong correlations. We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
翻译:近年来,基于机器学习的基数估计方法正逐步取代传统方法。这一变革有望推动基数估计最重要的应用——查询优化器——加速查询处理。然而,当关系模式包含多个存在强表间/属性间关联的表时,现有方法均无法精确估计基数。本文论证了多重密度估计器可被组合使用,以有效应对具有大规模复杂模式且存在强关联的数据的基数估计问题。我们提出Scardina,一种基于模式结构、采用多个分区模型的新型连接基数估计方法。