We study how to verify specific frequency distributions when we observe a stream of $N$ data items taken from a universe of $n$ distinct items. We introduce the \emph{relative Fr\'echet distance} to compare two frequency functions in a homogeneous manner. We consider two streaming models: insertions only and sliding windows. We present a Tester for a certain class of functions, which decides if $f $ is close to $g$ or if $f$ is far from $g$ with high probability, when $f$ is given and $g$ is defined by a stream. If $f$ is uniform we show a space $\Omega(n)$ lower bound. If $f$ decreases fast enough, we then only use space $O(\log^2 n\cdot \log\log n)$. The analysis relies on the Spacesaving algorithm \cite{MAE2005,Z22} and on sampling the stream.
翻译:本文研究当观察到一个包含$N$个数据项的流(这些数据项取自包含$n$个不同元素的宇宙)时,如何验证特定的频次分布。我们引入\emph{相对Fr\'echet距离}来以同质方式比较两个频次函数。我们考虑两种流模型:仅插入模型和滑动窗口模型。针对某类函数,我们提出一个检验器,该检验器能够以高概率判定:当$f$给定且$g$由流定义时,$f$是否接近$g$或$f$是否远离$g$。若$f$为均匀分布,我们证明空间下界为$\Omega(n)$。若$f$足够快速递减,则我们仅需使用$O(\log^2 n\cdot \log\log n)$空间。分析依赖于Spacesaving算法\cite{MAE2005,Z22}及对流的采样。