MinMax sampling is a technique for downsampling a real-valued vector which minimizes the maximum variance over all vector components. This approach is useful for reducing the amount of data that must be sent over a constrained network link (e.g. in the wide-area). MinMax can provide unbiased estimates of the vector elements, along with unbiased estimates of aggregates when vectors are combined from multiple locations. In this work, we propose a biased MinMax estimation scheme, B-MinMax, which trades an increase in estimator bias for a reduction in variance. We prove that when no aggregation is performed, B-MinMax obtains a strictly lower MSE compared to the unbiased MinMax estimator. When aggregation is required, B-MinMax is preferable when sample sizes are small or the number of aggregated vectors is limited. Our experiments show that this approach can substantially reduce the MSE for MinMax sampling in many practical settings.
翻译:MinMax采样是一种对实值向量进行下采样的技术,其通过最小化所有向量分量的最大方差来实现。该方法有助于减少需在受限网络链路(例如广域网)中传输的数据量。MinMax技术既能提供向量元素的无偏估计,也能在从多个位置合并向量时,提供聚合结果的无偏估计。本研究提出一种有偏MinMax估计方案——B-MinMax,其通过牺牲估计量的无偏性来降低方差。我们证明,在不进行聚合操作时,B-MinMax的均方误差严格低于无偏MinMax估计量;而在需要聚合的场景下,当样本量较小或聚合向量数量有限时,B-MinMax更具优势。实验表明,该方法能在多种实际场景中显著降低MinMax采样的均方误差。