We study sketching trimmed statistics of a frequency vector, including the $F_p$ moment of the top-$k$ coordinates and of the trimmed-$k$ vector. Despite their natural role in robust analytics, this is the first time these problems have been studied in any sublinear space setting. For $p \in [0,2]$, we obtain $poly(\log n/\varepsilon)$-space algorithms for both tasks when $k$ is moderately large, and for general $k$ we identify a sharp structural threshold that characterizes exactly when sublinear space is possible: in particular, it is actually determined by the ratio between $a_k^2$ and $\|x_{-k}\|_2^2/k$. We extend these results to $p > 2$ and present several applications including algorithms for thresholded $F_p$ estimation and generalized impact indices. Notably, we improve the space bounds of Govindan, Monemizadeh, and Muthukrishnan (PODS 2017) for computing the $h$-index.
翻译:我们研究了频率向量修整统计量的草图化问题,包括前$k$坐标的$F_p$矩以及修整-$k$向量的$F_p$矩。尽管这些问题在鲁棒分析中具有天然重要性,但这是首次在亚线性空间设置下对它们进行研究。对于$p \in [0,2]$,当$k$中等偏大时,我们针对两项任务均获得了$poly(\log n/\varepsilon)$空间算法;对于一般$k$,我们识别出一个尖锐的结构性阈值,该阈值精确刻画了何时亚线性空间是可能的:特别地,这实际上由$a_k^2$与$\|x_{-k}\|_2^2/k$之比决定。我们将这些结果推广到$p > 2$,并提出了若干应用,包括阈值化$F_p$估计和广义影响指数的算法。值得注意地,我们改进了Govindan、Monemizadeh和Muthukrishnan(PODS 2017)中计算$h$指数的空间界。