In the semi-streaming model for processing massive graphs, an algorithm makes multiple passes over the edges of a given $n$-vertex graph and is tasked with computing the solution to a problem using $O(n \cdot \text{polylog}(n))$ space. Semi-streaming algorithms for Maximal Independent Set (MIS) that run in $O(\log\log{n})$ passes have been known for almost a decade, however, the best lower bounds can only rule out single-pass algorithms. We close this large gap by proving that the current algorithms are optimal: Any semi-streaming algorithm for finding an MIS with constant probability of success requires $\Omega(\log\log{n})$ passes. This settles the complexity of this fundamental problem in the semi-streaming model, and constitutes one of the first optimal multi-pass lower bounds in this model. We establish our result by proving an optimal round vs communication tradeoff for the (multi-party) communication complexity of MIS. The key ingredient of this result is a new technique, called hierarchical embedding, for performing round elimination: we show how to pack many but small hard $(r-1)$-round instances of the problem into a single $r$-round instance, in a way that enforces any $r$-round protocol to effectively solve all these $(r-1)$-round instances also. These embeddings are obtained via a novel application of results from extremal graph theory -- in particular dense graphs with many disjoint unique shortest paths -- together with a newly designed graph product, and are analyzed via information-theoretic tools such as direct-sum and message compression arguments.
翻译:在半流式处理大规模图模型中,算法需对给定 $n$ 顶点图的边进行多趟扫描,并在 $O(n \cdot \text{polylog}(n))$ 空间内求解问题。近十年来,已知最大独立集(MIS)的半流式算法可在 $O(\log\log{n})$ 趟内运行,但最佳下界仅能排除单趟算法。我们通过证明当前算法的最优性填补了这一显著空白:任何以常数成功概率寻找 MIS 的半流式算法均需 $\Omega(\log\log{n})$ 趟。这确定了半流式模型中该基本问题的复杂度,并成为该模型下首批最优多趟下界之一。我们通过证明 MIS 的(多方)通信复杂度中轮数与通信量的最优折衷建立了该结果。该结果的关键要素是一种称为分层嵌入的新技术,用于实现轮消除:我们展示如何将多个小型但困难的 $(r-1)$-轮实例打包到单个 $r$-轮实例中,强制任何 $r$-轮协议实质上解决所有这些 $(r-1)$-轮实例。这些嵌入通过极值图论结果的新颖应用(特别是具有许多不相交唯一最短路径的稠密图)与全新设计的图积运算获得,并通过直接和与消息压缩论证等信息论工具进行分析。