The trimmed mean of $n$ scalar random variables from a distribution $P$ is the variant of the standard sample mean where the $k$ smallest and $k$ largest values in the sample are discarded for some parameter $k$. In this paper, we look at the finite-sample properties of the trimmed mean as an estimator for the mean of $P$. Assuming finite variance, we prove that the trimmed mean is ``sub-Gaussian'' in the sense of achieving Gaussian-type concentration around the mean. Under slightly stronger assumptions, we show the left and right tails of the trimmed mean satisfy a strong ratio-type approximation by the corresponding Gaussian tail, even for very small probabilities of the order $e^{-n^c}$ for some $c>0$. In the more challenging setting of weaker moment assumptions and adversarial sample contamination, we prove that the trimmed mean is minimax-optimal up to constants.
翻译:截尾均值是从分布$P$中抽取的$n$个标量随机变量的标准样本均值的变体,其通过丢弃样本中$k$个最小值和$k$个最大值($k$为参数)得到。本文研究了截尾均值作为$P$的均值估计量的有限样本性质。在方差有限的假设下,我们证明截尾均值具有“亚高斯”性质,即能在均值附近实现高斯型集中性。在稍强的假设下,我们证明截尾均值的左右尾部满足相应高斯尾部的强比率型逼近,即使对于小至$e^{-n^c}$($c>0$)量级的概率也成立。在更弱的矩假设和对抗性样本污染这一更具挑战性的设定下,我们证明截尾均值在常数因子内达到极小极大最优性。