We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an $\eta$-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the $L_{p}$-marginal moment with some $p \ge 4$ is equivalent to the corresponding $L_2$-marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. As a part of our analysis, we prove a dimension-free Bai-Yin type theorem in the regime $p > 4$.
翻译:我们提出了一种协方差矩阵估计器,在算子范数下,针对两种标准数据污染概念实现了最优收敛速率(达到常数因子):允许对手任意破坏样本中占比为$\eta$的部分,而剩余数据点的分布仅需满足:对某个$p \ge 4$,$L_{p}$边际矩与对应的$L_2$边际矩等价。尽管仅需存在少量矩,我们的估计器仍能达到与底层分布为高斯分布时相同的尾部估计。作为分析的一部分,我们证明了在$p > 4$情形下的一种无维Bai-Yin型定理。