In this paper we consider the problem of estimating the $f$-moment ($\sum_{v\in [n]} (f(\mathbf{x}(v))-f(0))$) of a dynamic vector $\mathbf{x}\in \mathbb{G}^n$ over some abelian group $(\mathbb{G},+)$, where the $\|f\|_\infty$ norm is bounded. We propose a simple sketch and new estimation framework based on the \emph{Fourier transform} of $f$. I.e., we decompose $f$ into a linear combination of homomorphisms $f_1,f_2,\ldots$ from $(\mathbb{G},+)$ to $(\mathbb{C},\times)$, estimate the $f_k$-moment for each $f_k$, and synthesize them to obtain an estimate of the $f$-moment. Our estimators are asymptotically unbiased and have variance asymptotic to $\|\mathbf{x}\|_0^2 (\|f\|_\infty^2 m^{-1} + \|\hat{f}\|_1^2 m^{-2})$, where the size of the sketch is $O(m\log n\log|\mathbb{G}|)$ bits. When $\mathbb{G}=\mathbb{Z}$ this problem can also be solved using off-the-shelf $\ell_0$-samplers with space $O(m\log^2 n)$ bits, which does not obviously generalize to finite groups. As a concrete benchmark, we extend Ganguly, Garofalakis, and Rastogi's singleton-detector-based sampler to work over $\mathbb{G}$ using $O(m\log n\log|\mathbb{G}|\log(m\log n))$ bits. We give some experimental evidence that the Fourier-based estimation framework is significantly more accurate than sampling-based approaches at the same memory footprint.
翻译:本文研究动态向量 $\mathbf{x}\in \mathbb{G}^n$ 在阿贝尔群 $(\mathbb{G},+)$ 上的 $f$-矩($\sum_{v\in [n]} (f(\mathbf{x}(v))-f(0))$)估计问题,其中 $\|f\|_\infty$ 范数有界。我们提出一种基于 $f$ 的\emph{傅里叶变换}的简洁草图与新型估计框架。具体而言,我们将 $f$ 分解为从 $(\mathbb{G},+)$ 到 $(\mathbb{C},\times)$ 的同态映射 $f_1,f_2,\ldots$ 的线性组合,对每个 $f_k$ 估计其 $f_k$-矩,再通过合成得到 $f$-矩的估计值。我们的估计量渐近无偏,且方差渐近于 $\|\mathbf{x}\|_0^2 (\|f\|_\infty^2 m^{-1} + \|\hat{f}\|_1^2 m^{-2})$,其中草图大小为 $O(m\log n\log|\mathbb{G}|)$ 比特。当 $\mathbb{G}=\mathbb{Z}$ 时,该问题也可用现成的 $\ell_0$ 采样器解决,所需空间为 $O(m\log^2 n)$ 比特,但该方法无法直接推广到有限群。作为具体基准,我们将 Ganguly、Garofalakis 和 Rastogi 的基于单例检测的采样器扩展至 $\mathbb{G}$ 上,所需空间为 $O(m\log n\log|\mathbb{G}|\log(m\log n))$ 比特。实验表明,在相同内存占用下,基于傅里叶变换的估计框架显著优于采样方法。