A New Information Complexity Measure for Multi-pass Streaming with Applications

We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of $n$ i.i.d. uniform bits and one would like to compute the majority with constant advantage. We show that any constant pass algorithm must use $\Omega(\log n)$ bits of memory, significantly extending an earlier $\Omega(\log n)$ bit lower bound for single-pass algorithms of Braverman-Garg-Woodruff (FOCS, 2020). This also gives the first $\Omega(\log n)$ bit lower bound for the problem of approximating a counter up to a constant factor in worst-case turnstile streams for more than one pass. In the needle problem, one either sees a stream of $n$ i.i.d. uniform samples from a domain $[t]$, or there is a randomly chosen needle $\alpha \in[t]$ for which each item independently is chosen to equal $\alpha$ with probability $p$, and is otherwise uniformly random in $[t]$. The problem of distinguishing these two cases is central to understanding the space complexity of the frequency moment estimation problem in random order streams. We show tight multi-pass space bounds for this problem for every $p < 1/\sqrt{n \log^3 n}$, resolving an open question of Lovett and Zhang (FOCS, 2023); even for $1$-pass our bounds are new. To show optimality, we improve both lower and upper bounds from existing results. Our information complexity framework significantly extends the toolkit for proving multi-pass streaming lower bounds, and we give a wide number of additional streaming applications of our lower bound techniques, including multi-pass lower bounds for $\ell_p$-norm estimation, $\ell_p$-point query and heavy hitters, and compressed sensing problems.

翻译：我们提出了一种面向多遍流式处理问题的新型信息复杂度概念，并将其用于解决数据流领域的若干重要问题。在硬币问题中，观测到由$n$个独立同分布均匀比特构成的流，目标是具有恒定优势地计算多数原则。我们证明：任意常数遍算法必须使用$\Omega(\log n)$比特内存，这显著扩展了Braverman-Garg-Woodruff（FOCS, 2020）针对单遍算法的$\Omega(\log n)$比特下界。该结果同时首次给出了在worst-case旋转门流中，对计数器进行常数因子近似时多遍算法的$\Omega(\log n)$比特下界。在针问题中，观测对象要么是来自定义域$[t]$的$n$个独立同分布均匀样本，要么存在一个随机选择的针$\alpha \in[t]$：每个项目独立地以概率$p$取值为$\alpha$，否则在$[t]$中均匀随机。区分这两种情形对理解随机顺序流中频率矩估计问题的空间复杂度至关重要。我们针对每个$p < 1/\sqrt{n \log^3 n}$给出了该问题的紧致多遍空间界，解决了Lovett和Zhang（FOCS, 2023）提出的公开问题；即使对于1遍情形，我们的界也是全新的。为证明最优性，我们改进了现有结果中的上下界。我们的信息复杂度框架显著扩展了多遍流式处理下界证明的工具集，并提供了大量下界技术的附加流式应用，包括$\ell_p$-范数估计、$\ell_p$-点查询与高权重项，以及压缩感知问题的多遍下界。