The support vector machine (SVM) is a supervised learning algorithm that finds a maximum-margin linear classifier, often after mapping the data to a high-dimensional feature space via the kernel trick. Recent work has demonstrated that in certain sufficiently overparameterized settings, the SVM decision function coincides exactly with the minimum-norm label interpolant. This phenomenon of support vector proliferation (SVP) is especially interesting because it allows us to understand SVM performance by leveraging recent analyses of harmless interpolation in linear and kernel models. However, previous work on SVP has made restrictive assumptions on the data/feature distribution and spectrum. In this paper, we present a new and flexible analysis framework for proving SVP in an arbitrary reproducing kernel Hilbert space with a flexible class of generative models for the labels. We present conditions for SVP for features in the families of general bounded orthonormal systems (e.g. Fourier features) and independent sub-Gaussian features. In both cases, we show that SVP occurs in many interesting settings not covered by prior work, and we leverage these results to prove novel generalization results for kernel SVM classification.
翻译:支持向量机(SVM)是一种有监督学习算法,旨在寻找最大间隔线性分类器,通常通过核技巧将数据映射到高维特征空间。近期研究表明,在特定充分过参数化设置下,SVM的决策函数与最小范数标签插值完全一致。这种支持向量增殖(SVP)现象尤为引人关注,因为它使我们能够利用线性与核模型中无害插值的最新分析来理解SVM性能。然而,先前关于SVP的研究对数据/特征分布与谱做出了严格的假设。本文提出了一种新颖且灵活的分析框架,可在任意再生核希尔伯特空间中对标签使用灵活生成模型类来证明SVP。我们针对广义有界正交系统(例如傅里叶特征)和独立次高斯特征族,给出了SVP的成立条件。在这两种情况下,我们证明了SVP发生于诸多先前工作未覆盖的有趣场景,并利用这些结果推导出核SVM分类的泛化性新结论。