We study the space complexity of estimating the diameter of a subset of points in an arbitrary metric space in the dynamic (turnstile) streaming model. The input is given as a stream of updates to a frequency vector $x \in \mathbb{Z}_{\geq 0}^n$, where the support of $x$ defines a multiset of points in a fixed metric space $M = ([n], \mathsf{d})$. The goal is to estimate the diameter of this multiset, defined as $\max\{\mathsf{d}(i,j) : x_i, x_j > 0\}$, to a specified approximation factor while using as little space as possible. In insertion-only streams, a simple $O(\log n)$-space algorithm achieves a 2-approximation. In sharp contrast to this, we show that in the dynamic streaming model, any algorithm achieving a constant-factor approximation to diameter requires polynomial space. Specifically, we prove that a $c$-approximation to the diameter requires $n^{\Omega(1/c)}$ space. Our lower bound relies on two conceptual contributions: (1) a new connection between dynamic streaming algorithms and linear sketches for {\em scale-invariant} functions, a class that includes diameter estimation, and (2) a connection between linear sketches for diameter and the {\em minrank} of graphs, a notion previously studied in index coding. We complement our lower bound with a nearly matching upper bound, which gives a $c$-approximation to the diameter in general metrics using $n^{O(1/c)}$ space.
翻译:我们研究了在动态(十字转门)流模型中估计任意度量空间中点集直径的空间复杂度问题。输入以对频率向量 $x \in \mathbb{Z}_{\geq 0}^n$ 的更新流形式给出,其中 $x$ 的支撑集定义了固定度量空间 $M = ([n], \mathsf{d})$ 中的一个多重点集。目标是在使用尽可能少空间的前提下,以指定的近似比估计该多重点集的直径(定义为 $\max\{\mathsf{d}(i,j) : x_i, x_j > 0\}$)。在仅插入流中,简单的 $O(\log n)$ 空间算法可实现 2-近似。与此形成鲜明对比的是,我们证明在动态流模型中,任何实现常数因子近似的直径估计算法都需要多项式空间。具体而言,我们证明了 $c$ 近似的直径估计需要 $n^{\Omega(1/c)}$ 空间。我们的下界依赖于两个概念性贡献:(1)动态流算法与针对包含直径估计在内的{\em 尺度不变}函数的线性草图之间的新联系;(2)直径线性草图与图的{\em 最小秩}之间的联系,后者是索引编码中曾研究过的概念。我们通过一个几乎匹配的上界补充了下界结果,该上界使用 $n^{O(1/c)}$ 空间在一般度量空间中实现了 $c$ 近似的直径估计。