This paper addresses the following question: given a sample of i.i.d. random variables with finite variance, can one construct an estimator of the unknown mean that performs nearly as well as if the data were normally distributed? One of the most popular examples achieving this goal is the median of means estimator. However, it is inefficient in a sense that the constants in the resulting bounds are suboptimal. We show that a permutation-invariant modification of the median of means estimator admits deviation guarantees that are sharp up to $1+o(1)$ factor if the underlying distribution possesses more than $\frac{3+\sqrt{5}}{2}\approx 2.62$ moments and is absolutely continuous with respect to the Lebesgue measure. This result yields potential improvements for a variety of algorithms that rely on the median of means estimator as a building block. At the core of our argument is are the new deviation inequalities for the U-statistics of order that is allowed to grow with the sample size, a result that could be of independent interest.
翻译:本文研究以下问题:给定一个具有有限方差的独立同分布随机变量样本,能否构造一个未知均值的估计量,使其表现几乎与数据服从正态分布时一样好?实现这一目标的最流行例子之一是均值中位数估计量。然而,该估计量在某种意义上是低效的,因为所得边界中的常数是次优的。我们证明,如果底层分布具有超过$\frac{3+\sqrt{5}}{2}\approx 2.62$阶矩且关于勒贝格测度绝对连续,则对均值中位数估计量进行置换不变修正后,其偏差保证可达$1+o(1)$因子内的锐利性。这一结果有望改进一系列以均值中位数估计量为构建块的算法。我们论证的核心是允许阶数随样本量增长的U-统计量的新偏差不等式,该结果可能具有独立的研究价值。