Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data $(y_i,{\boldsymbol x}_i)$, $i\le n$ are i.i.d. with ${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ a $p$-dimensional Gaussian feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$. While the Gaussian model might appear extremely simplistic, universality arguments can be used to show that the results derived in this setting also apply to the output of certain nonlinear featurization maps. We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to \psi$, and derive exact expressions for the limiting generalization error. We use this theory to derive two results of independent interest: $(i)$ Sufficient conditions on $({\boldsymbol \Sigma},{\boldsymbol \theta}_*)$ for `benign overfitting' that parallel previously derived conditions in the case of linear regression; $(ii)$ An asymptotically exact expression for the generalization error when max-margin classification is used in conjunction with feature vectors produced by random one-layer neural networks.
翻译:现代机器学习分类器通常在训练集上表现出趋近于零的分类误差。它们通过学习输入的非线性表示来实现这一点,这种表示将数据映射为线性可分的类别。受这些现象启发,我们重新审视线性可分数据的高维最大间隔分类问题。考虑一个典型设定:数据$(y_i,{\boldsymbol x}_i)$($i\le n$)独立同分布,其中${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$为$p$维高斯特征向量,标签$y_i \in\{+1,-1\}$的分布依赖于协变量的线性组合$\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$。尽管高斯模型看似极度简化,但可通过普适性论证表明,该设定下推导的结果同样适用于某些非线性特征映射的输出。我们考虑比例渐近$n,p\to\infty$且$p/n\to \psi$的情形,推导出极限泛化误差的精确表达式。基于该理论,我们得出两个具有独立意义的结果:$(i)$ 关于$({\boldsymbol \Sigma},{\boldsymbol \theta}_*)$的“良性过拟合”充分条件,该条件平行于线性回归中先前推导的条件;$(ii)$ 当最大间隔分类与随机单层神经网络生成的特征向量结合使用时,泛化误差的渐近精确表达式。