High-dimensional datasets are frequently subject to contamination by outliers and heavy-tailed noise, which can severely bias standard regularized estimators like the Lasso. While Maximum Mean Discrepancy (MMD) has recently been introduced as a "universal" framework for robust regression, its application to high-dimensional Generalized Linear Models (GLMs) remains largely unexplored, particularly regarding variable selection. In this paper, we propose a penalized MMD framework for robust estimation and feature selection in GLMs. We introduce an $\ell_1$-penalized MMD objective and develop two versions of the estimator: a full $O(n^2)$ version and a computationally efficient $O(n)$ approximation. To solve the resulting non-convex optimization problem, we employ an algorithm based on the Alternating Direction Method of Multipliers (ADMM) combined with AdaGrad. Through extensive simulation studies involving Gaussian linear regression and binary logistic regression, we demonstrate that our proposed methods significantly outperform classical penalized GLMs and existing robust benchmarks. Our approach shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.
翻译:高维数据集常受异常值和重尾噪声的污染,这会严重偏置如Lasso等标准正则化估计器。尽管最大均值差异(MMD)最近被提出作为鲁棒回归的“通用”框架,但其在高维广义线性模型(GLMs)中的应用,特别是在变量选择方面,仍鲜有探索。本文提出一种用于GLMs中鲁棒估计与特征选择的惩罚化MMD框架。我们引入一个$\ell_1$惩罚的MMD目标函数,并开发了该估计器的两个版本:完整的$O(n^2)$版本和计算高效的$O(n)$近似版本。为求解由此产生的非凸优化问题,我们采用了一种基于交替方向乘子法(ADMM)结合AdaGrad的算法。通过涉及高斯线性回归和二元逻辑回归的广泛模拟研究,我们证明所提出的方法显著优于经典惩罚化GLMs及现有鲁棒基准方法。我们的方法在处理高杠杆点和重尾误差分布方面表现出独特优势,而传统方法在这些情况下往往失效。