Robust Mean Estimation Without a Mean: Dimension-Independent Error in Polynomial Time for Symmetric Distributions

In this work, we study the problem of robustly estimating the mean/location parameter of distributions without moment bounds. For a large class of distributions satisfying natural symmetry constraints we give a sequence of algorithms that can efficiently estimate its location without incurring dimension-dependent factors in the error. Concretely, suppose an adversary can arbitrarily corrupt an $\varepsilon$-fraction of the observed samples. For every $k \in \mathbb{N}$, we design an estimator using time and samples $\tilde{O}({d^k})$ such that the dependence of the error on the corruption level $\varepsilon$ is an additive factor of $O(\varepsilon^{1-\frac{1}{2k}})$. The dependence on other problem parameters is also nearly optimal. Our class contains products of arbitrary symmetric one-dimensional distributions as well as elliptical distributions, a vast generalization of the Gaussian distribution. Examples include product Cauchy distributions and multi-variate $t$-distributions. In particular, even the first moment might not exist. We provide the first efficient algorithms for this class of distributions. Previously, such results where only known under boundedness assumptions on the moments of the distribution and in particular, are provably impossible in the absence of symmetry [KSS18, CTBJ22]. For the class of distributions we consider, all previous estimators either require exponential time or incur error depending on the dimension. Our algorithms are based on a generalization of the filtering technique [DK22]. We show how this machinery can be combined with Huber-loss-based approach to work with projections of the noise. Moreover, we show how sum-of-squares proofs can be used to obtain algorithmic guarantees even for distributions without first moment. We believe that this approach may find other application in future works.

翻译：本文研究了无矩边界条件下分布均值/位置参数的鲁棒估计问题。对于一类满足自然对称约束的广泛分布，我们提出了一系列算法，能在误差中不引入维度相关因子的情况下高效估计其位置参数。具体而言，假设攻击者可任意破坏观测样本中的ε比例。对每个k∈ℕ，我们设计了一个运行时间与样本复杂度均为Õ(d^k)的估计器，使误差对破坏水平ε的依赖表现为加性项O(ε^{1-1/(2k)})。对其他问题参数的依赖也近乎最优。我们的分布类别包含任意一维对称分布的乘积及椭圆分布（高斯分布的广泛推广），例如柯西分布乘积与多元t分布。特别地，此类分布甚至可能不存在一阶矩。我们首次为此类分布提供了高效算法。此前，这类结果仅在分布矩有界假设下成立，且已知在无对称性时必然不可实现[KSS18, CTBJ22]。对于所考虑的分布类，所有现有估计器要么需要指数时间，要么依赖维度的误差。我们的算法基于过滤技术[DK22]的推广，展示了如何将该机制与Huber损失方法结合来处理噪声投影。此外，我们证明了即使对无一阶矩的分布，平方和证明方法仍可实现算法保证。我们相信该方法将在未来研究中找到更多应用。