Direct Access for Answers to Conjunctive Queries with Aggregation

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate such queries by constructing in loglinear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than loglinear in the size of the input, so the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research established a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order. Yet, past algorithms do not apply to the count-distinct aggregation, which has no efficient representation as a commutative semiring; for this aggregation, we establish the corresponding tractability condition. We then show how the complexity of the problem changes when we include the aggregate and annotation value in the order. We also study the impact of having all relations but one annotated by the multiplicative identity (one), as happens when we translate aggregate queries into semiring annotations, and having a semiring with an idempotent addition, such as the case of min, max, and count-distinct over a logarithmic-size domain.

翻译：我们研究了带分组和聚合操作的合取查询的细粒度复杂度。对于常见的聚合函数（如min、max、count、sum），此类查询可表述为在带有合适交换半环注释的数据库上的普通合取查询。我们探究通过在线性对数时间内构建数据结构来评估此类查询的能力，该数据结构能对按给定字典序排序的答案提供对数时间直接访问。这项任务颇具挑战性，因为答案数量可能超过输入规模的线性对数，因此数据结构需要提供答案空间的紧凑表示。在不涉及聚合和注释的情况下，既往研究建立了查询和排序的充分可处理性条件。对于无自连接的查询，该条件不仅是充分的，而且在传统细粒度复杂度下界假设下也是必要的。我们证明所有既往结果在注释数据库中仍然成立，前提是注释本身不参与字典序排序。然而，现有算法不适用于count-distinct聚合（该聚合无法高效表示为交换半环）；针对此聚合，我们建立了相应的可处理性条件。随后，我们展示了当在排序中包含聚合值和注释值时问题复杂度的变化。我们还研究了当所有关系（除一个外）均被单位元注释（即乘以1，如同将聚合查询转化为半环注释时的情况）以及带幂等加法半环（如min、max和对数规模域上的count-distinct）时的影响。