Beyond Catoni: Sharper Rates for Heavy-Tailed and Robust Mean Estimation

We study the fundamental problem of estimating the mean of a $d$-dimensional distribution with covariance $\Sigma \preccurlyeq \sigma^2 I_d$ given $n$ samples. When $d = 1$, Catoni \cite{catoni} showed an estimator with error $(1+o(1)) \cdot \sigma \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$, with probability $1 - \delta$, matching the Gaussian error rate. For $d>1$, a natural estimator outputs the center of the minimum enclosing ball of one-dimensional confidence intervals to achieve a $1-\delta$ confidence radius of $\sqrt{\frac{2 d}{d+1}} \cdot \sigma \left(\sqrt{\frac{d}{n}} + \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}\right)$, incurring a $\sqrt{\frac{2d}{d+1}}$-factor loss over the Gaussian rate. When the $\sqrt{\frac{d}{n}}$ term dominates by a $\sqrt{\log \frac{1}{\delta}}$ factor, \cite{lee2022optimal-highdim} showed an improved estimator matching the Gaussian rate. This raises a natural question: is the Gaussian rate achievable in general? Or is the $\sqrt{\frac{2 d}{d+1}}$ loss \emph{necessary} when the $\sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$ term dominates? We show that the answer to both these questions is \emph{no} -- we show that \emph{some} constant-factor loss over the Gaussian rate is necessary, but construct an estimator that improves over the above naive estimator by a constant factor. We also consider robust estimation, where an adversary is allowed to corrupt an $\epsilon$-fraction of samples arbitrarily: in this case, we show that the above strategy of combining one-dimensional estimates and incurring the $\sqrt{\frac{2d}{d+1}}$-factor \emph{is} optimal in the infinite-sample limit.

翻译：我们研究在协方差满足 $\Sigma \preccurlyeq \sigma^2 I_d$ 的条件下，基于 $n$ 个样本估计 $d$ 维分布均值的基本问题。当 $d=1$ 时，卡托尼 \cite{catoni} 提出了一种估计量，以概率 $1-\delta$ 达到误差 $(1+o(1)) \cdot \sigma \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$，与高斯误差率一致。对于 $d>1$，一种自然估计方法是通过输出一维置信区间的最小包围球中心，以 $1-\delta$ 的置信半径 $\sqrt{\frac{2 d}{d+1}} \cdot \sigma \left(\sqrt{\frac{d}{n}} + \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}\right)$ 实现，这在高斯率基础上产生了 $\sqrt{\frac{2d}{d+1}}$ 倍的损失。当 $\sqrt{\frac{d}{n}}$ 项主导 $\sqrt{\log \frac{1}{\delta}}$ 因子时，\cite{lee2022optimal-highdim} 提出了一种改进的估计量，使其与高斯率匹配。这引出一个自然问题：高斯率在一般情况下是否可达？或者当 $\sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$ 项主导时，$\sqrt{\frac{2 d}{d+1}}$ 的损失是否 \emph{必然} 存在？我们证明这两个问题的答案均为 \emph{否}——我们表明高斯率上 \emph{一定} 存在常数因子损失，但构造了一种估计量，其改进效果优于上述朴素估计量一个常数因子。我们还考虑了稳健估计场景，其中攻击者可以任意破坏 $\epsilon$ 比例的样本：在这种情况下，我们证明上述结合一维估计量并产生 $\sqrt{\frac{2d}{d+1}}$ 倍损失的策略在无限样本极限下 \emph{是最优的}。