We study the streaming complexity of $k$-counter approximate counting. In the $k$-counter approximate counting problem, we are given an input string in $[k]^n$, and we are required to approximate the number of each $j$'s ($j\in[k]$) in the string. Typically we require an additive error $\leq\frac{n}{3(k-1)}$ for each $j\in[k]$ respectively, and we are mostly interested in the regime $n\gg k$. We prove a lower bound result that the deterministic and worst-case $k$-counter approximate counting problem requires $\Omega(k\log(n/k))$ bits of space in the streaming model, while no non-trivial lower bounds were known before. In contrast, trivially counting the number of each $j\in[k]$ uses $O(k\log n)$ bits of space. Our main proof technique is analyzing a novel potential function. Our lower bound for $k$-counter approximate counting also implies the optimality of some other streaming algorithms. For example, we show that the celebrated Misra-Gries algorithm for heavy hitters [MG82] has achieved optimal space usage.
翻译:本文研究$k$计数器近似计数问题的流式复杂度。在$k$计数器近似计数问题中,给定定义域为$[k]^n$的输入字符串,需要近似计算字符串中每个$j$($j\in[k]$)的出现次数。通常要求对每个$j\in[k]$分别满足加性误差$\leq\frac{n}{3(k-1)}$,主要关注$n\gg k$的计数场景。我们证明了确定性最坏情况下$k$计数器近似计数问题在流式模型中需要$\Omega(k\log(n/k))$比特的空间下界,而此前该问题未发现非平凡下界。作为对比,平凡计数每个$j\in[k]$出现次数的方法需要$O(k\log n)$比特空间。我们的核心证明技术是分析一种新颖的势函数。针对$k$计数器近似计数的下界结果同时证明了若干其他流式算法的最优性。例如,我们证明了著名的Misra-Gries频繁项检测算法[MG82]已达到最优空间复杂度。