Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on large data size requirements as a function of dimension. In this paper, we perform an extensive experimentation over various mean estimation techniques where data size might not meet this requirement due to the high-dimensional setting.
翻译:鲁棒统计旨在计算能够代表数据集的统计量,即使其中一部分数据可能被任意破坏。均值是最基本的统计量,近年来,在高维损坏数据上高效估计均值的理论取得了快速发展。尽管已有多种算法能够达到接近最优的误差,但它们都依赖于与维度相关的较大数据量要求。本文针对高维场景下数据量可能无法满足这一要求的情况,对多种均值估计技术进行了广泛的实验研究。