In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].
翻译:近年来,针对数据库连接查询的输出规模和评估成本,研究者提出了若干信息理论上界。这些上界的效果强弱取决于输入关系统计信息的类型及其支持的查询计划类型,这促使人们寻找能在相应信息理论上界时间内计算连接查询输出的算法。本文提出PANDA算法,该算法以支撑上界的香农不等式为基础,将每个证明步骤转化为对应特定数据库操作的算法步骤。PANDA计算连接查询答案的时间复杂度由查询的子模宽度与输出规模共同决定。本文版本较原始版本[ANS, PODS'17]实现了显著简化。