We introduce a new measure of robustness for statistical estimators, which we call \emph{empirical sensitivity}. An estimator $\hat θ$ has bounded empirical sensitivity if, with high probability over a dataset $X = (X_1, \dots, X_n) \sim \mathcal{D}^{\otimes n}$, for any dataset $Y$ obtained by modifying at most $ηn$ points in $X$, we have that $\hat θ(Y)$ is close to $\hat θ(X)$. We study bounds on this quantity for the prototypical problem of Gaussian mean estimation. We prove new lower bounds, showing that for any estimator $\hat μ$ which achieves an optimal $\ell_2$-error bound of $O\left(\sqrt{d/n}\right)$, the empirical sensitivity is at least $Ω\left(η+ \sqrt{ηd/n}\right)$. The two terms arise due to obstructions on the mean and variance (via an Efron-Stein argument) of such an estimator. We show that this bound is tight up to logarithmic factors, by employing recent results for robust empirical mean estimation.
翻译:我们提出了一种新的统计估计量鲁棒性度量指标,称为“经验敏感性”。对于估计量$\hat θ$,若在数据集$X = (X_1, \dots, X_n) \sim \mathcal{D}^{\otimes n}$上以高概率成立:对于任何通过修改$X$中至多$ηn$个数据点得到的数据集$Y$,$\hat θ(Y)$与$\hat θ(X)$均保持接近,则称该估计量具有有界经验敏感性。我们针对高斯均值估计这一典型问题研究了该量的界。首先证明新的下界:对于任何达到最优$\ell_2$误差界$O\left(\sqrt{d/n}\right)$的估计量$\hat μ$,其经验敏感性至少为$Ω\left(η+ \sqrt{ηd/n}\right)$。这两个项分别来源于此类估计量的均值与方差(通过Efron-Stein论证)带来的阻碍。进一步地,通过利用鲁棒经验均值估计的最新研究成果,我们证明该下界在忽略对数因子的意义下是紧的。