Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and \emph{without sacrificing space} for a large number of data stream algorithms, such as $F_p$ estimation in the parameter regimes $p > 2$ and $0 < p < 2$ and CountSketch with tight estimation guarantees as analyzed by Minton and Price (SODA 2014) which assumed access to a random oracle. We also show a recent analysis of Private CountSketch can be derandomized using our techniques. For a $d$-dimensional vector $x$ being updated in a turnstile stream, we show that $\|x\|_{\infty}$ can be estimated up to an additive error of $\varepsilon\|x\|_{2}$ using $O(\varepsilon^{-2}\log(1/\varepsilon)\log d)$ bits of space. Additionally, the update time of this algorithm is $O(\log 1/\varepsilon)$ in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors $x$ with $\|x\|_{\infty} = \Theta(\|x\|_{2})$, we show that the lower bound can be broken by giving an algorithm that uses $O(\varepsilon^{-2}\log d)$ bits of space which approximates $\|x\|_{\infty}$ up to an additive error of $\varepsilon\|x\|_{2}$. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also $O(\log 1/\varepsilon)$ in the Word RAM model.

翻译：我们重新审视了Nisan经典的空间受限计算伪随机生成器（STOC 1990）及其在流式算法中的应用。我们提出一种新生成器HashPRG，可视为Nisan生成器在大字母表上的对称版本。该生成器允许在种子长度与生成器输出特定块的计算时间之间实现权衡。HashPRG可用于在大量数据流算法中实现去随机化，且在不牺牲空间的前提下显著提升更新速度，例如在参数区间$p > 2$和$0 < p < 2$的$F_p$估计问题，以及Minton和Price（SODA 2014）分析中假设可访问随机预言机的紧致估计保障CountSketch算法。我们还证明了近期关于Private CountSketch的分析可通过我们的技术实现去随机化。对于在旋转流中更新的$d$维向量$x$，我们证明可使用$O(\varepsilon^{-2}\log(1/\varepsilon)\log d)$比特空间，在$\varepsilon\|x\|_{2}$加法误差内估计$\|x\|_{\infty}$。该算法在Word RAM模型中的更新时间为$O(\log 1/\varepsilon)$。我们证明该算法的空间复杂度在常数因子内达到最优。然而，对于满足$\|x\|_{\infty} = \Theta(\|x\|_{2})$的向量$x$，我们通过提出一种仅需$O(\varepsilon^{-2}\log d)$比特空间即可在$\varepsilon\|x\|_{2}$加法误差内近似$\|x\|_{\infty}$的算法，打破了该下界。我们利用上述对CountSketch数据结构的去随机化实现该算法，并借助HashPRG的时间-空间权衡，证明该算法在Word RAM模型中的更新时间同样为$O(\log 1/\varepsilon)$。