In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
翻译:本文研究了基于梯度的算法在可线性分离数据分类中所展现的边际最大化偏差。我们深入分析了与(归一化)梯度相关的速度场的特定性质,重点关注其在边际最大化中的作用。受此分析启发,我们提出了一种名为渐进式缩放梯度下降(PRGD)的新型算法,并证明PRGD能够以指数速度最大化边际。这与所有现有算法形成鲜明对比,后者仅能以缓慢的多项式速度最大化边际。具体而言,我们识别了数据分布上的温和条件,在此条件下现有算法(如梯度下降GD和归一化梯度下降NGD)被证明无法高效实现边际最大化。为了验证我们的理论发现,我们进行了合成数据与真实数据实验。值得注意的是,在处理线性不可分数据集和深度神经网络时,PRGD在提升泛化性能方面也展现出显著潜力。