Output-Optimal Algorithms for Join-Aggregate Queries

The classic Yannakakis framework proposed in 1981 is still the state-of-the-art approach for tackling acyclic join-aggregate queries defined over commutative semi-rings. It has been shown that the time complexity of the Yannakakis framework is $O(N + \OUT)$ for any free-connex join-aggregate query, where $N$ is the input size of database and $\OUT$ is the output size of the query result. This is already output-optimal. However, only a general upper bound $O(N \cdot \OUT)$ on the time complexity of the Yannakakis framework is known for the remaining class of acyclic but non-free-connex queries. We first show a lower bound $\Omega\left(N \cdot \OUT^{1- \frac{1}{\outw}} + \OUT\right)$ for computing an acyclic join-aggregate query by {\em semi-ring algorithms}, where $\outw$ is identified as the {\em out-width} of the input query, $N$ is the input size of the database, and $\OUT$ is the output size of the query result. For example, $\outw =2$ for the chain matrix multiplication query, and $\outw=k$ for the star matrix multiplication query with $k$ relations. We give a tighter analysis of the Yannakakis framework and show that Yannakakis framework is already output-optimal on the class of {\em aggregate-hierarchical} queries. However, for the large remaining class of non-aggregate-hierarchical queries, such as chain matrix multiplication query, Yannakakis framework indeed requires $\Theta(N \cdot \OUT)$ time. We next explore a hybrid version of the Yannakakis framework and present an output-optimal algorithm for computing any general acyclic join-aggregate query within $\O\left(N\cdot \OUT^{1-\frac{1}{\outw}} + \OUT\right)$ time, matching the out-width-dependent lower bound up to a poly-logarithmic factor. To the best of our knowledge, this is the first polynomial improvement for computing acyclic join-aggregate queries since 1981.

翻译：1981年提出的经典Yannakakis框架至今仍是处理定义在交换半环上的无环连接聚合查询的最先进方法。研究表明，对于任意自由连接（free-connex）的连接聚合查询，Yannakakis框架的时间复杂度为$O(N + \OUT)$，其中$N$是数据库的输入规模，$\OUT$是查询结果的输出规模。该复杂度已达输出最优。然而，对于剩余类别的无环但非自由连接查询，目前仅知Yannakakis框架时间复杂度的一般上界为$O(N \cdot \OUT)$。我们首先证明了通过{\em 半环算法}计算无环连接聚合查询的下界为$\Omega\left(N \cdot \OUT^{1- \frac{1}{\outw}} + \OUT\right)$，其中$\outw$被定义为输入查询的{\em 输出宽度}，$N$是数据库的输入规模，$\OUT$是查询结果的输出规模。例如，链式矩阵乘法查询的$\outw =2$，而涉及$k$个关系的星型矩阵乘法查询的$\outw=k$。通过对Yannakakis框架进行更精细的分析，我们证明该框架在{\em 聚合层次化}查询类别上已达输出最优。然而，对于非聚合层次化查询这一更大的剩余类别（如链式矩阵乘法查询），Yannakakis框架确实需要$\Theta(N \cdot \OUT)$的时间复杂度。我们进一步探索了Yannakakis框架的混合版本，提出了能在$\O\left(N\cdot \OUT^{1-\frac{1}{\outw}} + \OUT\right)$时间内计算任意通用无环连接聚合查询的输出最优算法，该结果与输出宽度相关的下界仅相差多对数因子。据我们所知，这是自1981年以来在计算无环连接聚合查询方面首次实现多项式级别的改进。