Output-Optimal Algorithms for Join-Aggregate Queries

The classic Yannakakis framework proposed in 1981 is still the state-of-the-art approach for tackling acyclic join-aggregate queries defined over commutative semi-rings. It has been shown that the time complexity of the Yannakakis framework is $O(N + \OUT)$ for any free-connex join-aggregate query, where $N$ is the input size of database and $\OUT$ is the output size of the query result. This is already output-optimal. However, only a general upper bound $O(N \cdot \OUT)$ on the time complexity of the Yannakakis framework is known for the remaining class of acyclic but non-free-connex queries. We first show a lower bound $\Omega\left(N \cdot \OUT^{1- \frac{1}{\outw}} + \OUT\right)$ for computing an acyclic join-aggregate query by {\em semi-ring algorithms}, where $\outw$ is identified as the {\em out-width} of the input query, $N$ is the input size of the database, and $\OUT$ is the output size of the query result. For example, $\outw =2$ for the chain matrix multiplication query, and $\outw=k$ for the star matrix multiplication query with $k$ relations. We give a tighter analysis of the Yannakakis framework and show that Yannakakis framework is already output-optimal on the class of {\em aggregate-hierarchical} queries. However, for the large remaining class of non-aggregate-hierarchical queries, such as chain matrix multiplication query, Yannakakis framework indeed requires $\Theta(N \cdot \OUT)$ time. We next explore a hybrid version of the Yannakakis framework and present an output-optimal algorithm for computing any general acyclic join-aggregate query within $\O\left(N\cdot \OUT^{1-\frac{1}{\outw}} + \OUT\right)$ time, matching the out-width-dependent lower bound up to a poly-logarithmic factor. To the best of our knowledge, this is the first polynomial improvement for computing acyclic join-aggregate queries since 1981.

翻译：1981年提出的经典Yannakakis框架至今仍是处理定义在交换半环上的无环连接-聚合查询的最先进方法。研究表明，对于任意自由连接（free-connex）的连接-聚合查询，Yannakakis框架的时间复杂度为$O(N + \OUT)$，其中$N$是数据库的输入规模，$\OUT$是查询结果的输出规模。该复杂度已达到输出最优。然而，对于剩余的无环但非自由连接的查询类别，目前仅知Yannakakis框架的时间复杂度存在通用上界$O(N \cdot \OUT)$。我们首先证明了通过{\em 半环算法}计算无环连接-聚合查询的下界为$\Omega\left(N \cdot \OUT^{1- \frac{1}{\outw}} + \OUT\right)$，其中$\outw$被定义为输入查询的{\em 输出宽度}，$N$是数据库的输入规模，$\OUT$是查询结果的输出规模。例如，链式矩阵乘法查询的$\outw =2$，而涉及$k$个关系的星型矩阵乘法查询的$\outw=k$。我们对Yannakakis框架进行了更精细的分析，证明该框架在{\em 聚合层次化}查询类别上已达到输出最优。然而，对于剩余的大量非聚合层次化查询（如链式矩阵乘法查询），Yannakakis框架确实需要$\Theta(N \cdot \OUT)$的时间复杂度。随后，我们探索了Yannakakis框架的混合版本，提出了输出最优的通用算法，可在$\O\left(N\cdot \OUT^{1-\frac{1}{\outw}} + \OUT\right)$时间内计算任意无环连接-聚合查询，该结果与输出宽度相关的下界仅相差多对数因子。据我们所知，这是自1981年以来在无环连接-聚合查询计算领域的首次多项式级改进。