This paper investigates the phenomenon of benign overfitting in binary classification problems with heavy-tailed input distributions. We extend the analysis of maximum margin classifiers to $\alpha$ sub-exponential distributions, where $\alpha \in (0,2]$, generalizing previous work that focused on sub-gaussian inputs. Our main result provides generalization error bounds for linear classifiers trained using gradient descent on unregularized logistic loss in this heavy-tailed setting. We prove that under certain conditions on the dimensionality $p$ and feature vector magnitude $\|\mu\|$, the misclassification error of the maximum margin classifier asymptotically approaches the noise level. This work contributes to the understanding of benign overfitting in more robust distribution settings and demonstrates that the phenomenon persists even with heavier-tailed inputs than previously studied.
翻译:本文研究了具有重尾输入分布的二元分类问题中的良性过拟合现象。我们将最大间隔分类器的分析推广到α次指数分布,其中α∈(0,2],从而将先前专注于次高斯输入的研究工作一般化。我们的主要结果在此重尾设定下,为使用梯度下降法在无正则化逻辑损失上训练的线性分类器提供了泛化误差界。我们证明,在维度p和特征向量幅度‖μ‖满足特定条件时,最大间隔分类器的误分类误差渐近地趋近于噪声水平。这项工作有助于在更稳健的分布设定下理解良性过拟合现象,并证明即使输入分布比先前研究的具有更重的尾部,该现象依然存在。