We consider outlier-robust and sparse estimation of linear regression coefficients, when covariate vectors and noises are sampled, respectively, from an $\mathfrak{L}$-subGaussian distribution and a heavy-tailed distribution. Additionally, the covariate vectors and noises are contaminated by adversarial outliers. We deal with two cases: the covariance matrix of the covariates is known or unknown. Particularly, in the known case, our estimator can attain a nearly information theoretical optimal error bound, and our error bound is sharper than those of earlier studies dealing with similar situations. Our estimator analysis relies heavily on generic chaining to derive sharp error bounds.
翻译:本文考虑协变量向量与噪声分别采样自$\mathfrak{L}$-次高斯分布和重尾分布时,线性回归系数的异常鲁棒性与稀疏估计问题。此外,协变量向量与噪声受到对抗性异常值污染。我们处理两种情况:协变量的协方差矩阵已知或未知。特别地,在已知情形下,我们的估计量能够达到接近信息理论最优的误差界,且所得误差界优于以往处理类似问题的研究。我们的估计量分析主要依赖于通用链式法则以推导精确的误差界。