Direct Access for Answers to Conjunctive Queries with Aggregation

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate such queries by constructing in loglinear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than loglinear in the size of the input, so the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research established a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order. Yet, past algorithms do not apply to the count-distinct aggregation, which has no efficient representation as a commutative semiring; for this aggregation, we establish the corresponding tractability condition. We then show how the complexity of the problem changes when we include the aggregate and annotation value in the order. We also study the impact of having all relations but one annotated by the multiplicative identity (one), as happens when we translate aggregate queries into semiring annotations, and having a semiring with an idempotent addition, such as the case of min, max, and count-distinct over a logarithmic-size domain.

翻译：我们研究了带分组和聚合的合取查询的细粒度复杂度。对于常见的聚合函数（例如min、max、count、sum），此类查询可以表述为在带有适当交换半环注释的数据库上的普通合取查询。我们探讨了通过在对数线性时间内构建一种数据结构来实现对按给定词典序排序的答案进行对数时间直接访问的能力。由于答案数量可能超过输入规模的对数线性大小，因此该任务具有挑战性——数据结构需要紧凑地表示答案空间。在没有聚合和注释的情况下，已有研究确定了查询和排序的充分可处理性条件。对于无自连接的查询，该条件不仅充分，而且（在细粒度复杂度的常规下界假设下）也是必要的。我们证明，所有已有结果在注释数据库中仍然成立，前提是注释本身不参与词典序。然而，对于没有高效交换半环表示的count-distinct聚合，现有算法不适用；针对该聚合，我们建立了相应的可处理性条件。随后，我们展示了当将聚合和注释值纳入排序时问题复杂度的变化。我们还研究了当所有关系（除一个外）都用乘法单位元（1）注释时（正如我们将聚合查询转换为半环注释时的情况），以及使用具有幂等加法的半环（例如对数规模域上的min、max和count-distinct）时的影响。