In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].
翻译:近年来,针对数据库连接查询的输出规模与评估代价,学界提出了若干基于信息论的上界。这些上界的效果强弱,既取决于输入关系上的统计信息类型,也受限于其所支持的查询计划类型。这促使研究者寻求能够以相应信息论上界为时间限制来计算连接查询输出的算法。本文提出PANDA算法,该算法以支撑上界的香农不等式为基础,将证明的每一步转化为对应特定数据库操作的算法步骤。PANDA计算连接查询答案的时间复杂度由查询的子模宽度与输出规模共同决定。本文版本较原始版本[ANS, PODS'17]实现了显著简化。