We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent dichotomy result rules out the existence of such an algorithm for a general family of queries and orders. Specifically, for acyclic JQs without self-joins, the problem becomes intractable for ordering by sum whenever we join more than two relations (and these joins are not trivial intersections). Moreover, even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ. Our solution uses two subroutines: The first one needs to select what we call a "pivot answer". The second subroutine partitions the space of query answers according to this pivot, and continues searching in one partition that is represented as new %JQ over a new database. For pivot selection, we develop an algorithm that works for a large class of ranking functions that are appropriately monotone. The second subroutine requires a customized construction for the specific ranking function at hand. We show the benefit and generality of our approach by using it to establish several new complexity results. First, we prove the tractability of min and max for all acyclic JQs, thereby resolving the above question. Second, we extend the previous %JQ dichotomy for sum to all partial sums. Third, we handle the intractable cases of sum by devising a deterministic approximation scheme that applies to every acyclic JQ.
翻译:我们提出了用于分位数连接查询的高效算法,缩写为%JQ。%JQ要求在连接查询(JQ)的答案中,按特定排序(例如中位数对应50%位置)返回指定相对位置的答案。我们的目标是在不物化所有连接答案的前提下,实现与数据库大小近似线性时间(无论答案总数多少)的求解。近期一项二分性结论表明,对于一般查询和排序族,此类算法不可能存在。具体而言,对于无自连接的无环JQ,当按求和排序且连接关系超过两个(且这些连接非平凡交集)时,问题变得不可解。此外,对于求和以外的简单排序函数(如不同属性上的最小值或最大值),目前尚不清楚是否存在任何非平凡可解的%JQ。本文提出了一种求解%JQ的新方法。我们的解法包含两个子程序:第一个子程序需选择所谓的"枢轴答案";第二个子程序根据此枢轴划分查询答案空间,并在新数据库上表示为新%JQ的某一分区中继续搜索。对于枢轴选择,我们设计了一种适用于满足恰当单调性的大类排序函数的算法。第二个子程序需针对具体排序函数进行定制化构造。我们通过该方法推导出的若干新型复杂度结果,展示了其优势与通用性:首先,证明了所有无环JQ上最小值与最大值函数的可解性,从而解决了前述问题;其次,将此前求和函数的%JQ二分性推广至所有部分和函数;第三,针对求和函数的不可解情形,设计了适用于所有无环JQ的确定性近似方案。