In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
翻译:本文研究了基于梯度的算法在线性可分数据分类中表现出的边界最大化偏差。我们深入分析了与(归一化)梯度相关的速度场的具体特性,重点关注它们在边界最大化中的作用。受此分析启发,我们提出了一种名为渐进式重缩放梯度下降(PRGD)的新算法,并证明PRGD能够以**指数速率**最大化边界。这与所有现有算法形成鲜明对比——后者仅能以缓慢的**多项式速率**最大化边界。具体而言,我们确定了数据分布中的温和条件,在此条件下现有算法(如梯度下降(GD)和归一化梯度下降(NGD))**被证明无法**高效地最大化边界。为验证理论发现,我们展示了合成数据与真实世界实验。值得注意的是,PRGD在应用于线性不可分数据集和深度神经网络时,也展现出提升泛化性能的潜力。