We consider the high-dimensional linear regression model and assume that a fraction of the measurements are altered by an adversary with complete knowledge of the data and the underlying distribution. We are interested in a scenario where dense additive noise is heavy-tailed while the measurement vectors follow a sub-Gaussian distribution. Within this framework, we establish minimax lower bounds for the performance of an arbitrary estimator that depend on the the fraction of corrupted observations as well as the tail behavior of the additive noise. Moreover, we design a modification of the so-called Square-Root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, and satisfies performance guarantees in the form of sub-Gaussian deviation inequalities that match the lower error bounds, up to logarithmic factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the additive noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that may be of independent interest.
翻译:我们考虑高维线性回归模型,假设部分测量值被对数据和潜在分布有完全了解的对手恶意篡改。我们关注以下场景:密集加性噪声具有重尾分布,而测量向量服从亚高斯分布。在此框架下,我们建立了任意估计器性能的极小化极大下界,该下界依赖于被污染观测值的比例以及加性噪声的尾部行为。此外,我们设计了所谓的平方根斜率估计量的改进版本,该估计量具有以下理想特征:(a)可证明对对抗性污染具有鲁棒性,其性能保证以亚高斯偏差不等式的形式匹配误差下界(仅相差对数因子);(b)对未知稀疏水平和加性噪声方差完全自适应;(c)作为凸优化问题的解具有计算可行性。为分析所提估计量的性能,我们证明了具有亚高斯行向量的矩阵的若干性质,这些性质可能具有独立的研究价值。