In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe "PANDA", an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes a full join query in time given by the largest output size, and computes a Boolean query in time given by the submodular width. It represents a significant simplification of the original version in [ANS17].
翻译:近年来,针对数据库连接查询的输出大小和评估代价,学术界提出了若干信息论上界。这些上界的效力因输入关系统计信息类型及其支持的查询计划而异,这促使研究者探索能够以相应信息论上界为时间复杂度计算连接查询结果的算法。本文描述了“PANDA”算法,该算法利用支撑上界的香农不等式,将每个证明步骤转化为对应某种数据库操作的算法步骤。PANDA能够在最大输出规模所给定的时间内计算完整连接查询,并在子模宽度所给定的时间内计算布尔查询。该算法显著简化了[ANS17]中的原始版本。