AdaBoost sequentially fits so-called weak learners to minimize an exponential loss, which penalizes mislabeled data points more severely than other loss functions like cross-entropy. Paradoxically, AdaBoost generalizes well in practice as the number of weak learners grows. In the present work, we introduce Penalized Exponential Loss (PENEX), a new formulation of the multi-class exponential loss that is theoretically grounded and, in contrast to the existing formulation, amenable to optimization via first-order methods. We demonstrate both empirically and theoretically that PENEX implicitly maximizes margins of data points. Also, we show that gradient increments on PENEX implicitly parameterize weak learners in the boosting framework. Across computer vision and language tasks, we show that PENEX exhibits a regularizing effect often better than established methods with similar computational cost. Our results highlight PENEX's potential as an AdaBoost-inspired alternative for effective training and fine-tuning of deep neural networks.
翻译:AdaBoost通过顺序拟合所谓的弱学习器来最小化指数损失,这种损失函数对错误标记数据点的惩罚比交叉熵等其他损失函数更为严厉。矛盾的是,随着弱学习器数量的增加,AdaBoost在实践中展现出良好的泛化性能。在本研究中,我们提出了惩罚指数损失(PENEX),这是一种理论上严谨的多类指数损失新形式,与现有公式不同,它适合通过一阶方法进行优化。我们通过实证和理论证明,PENEX能够隐式地最大化数据点的边界。同时,我们还表明PENEX的梯度增量隐式地参数化了提升框架中的弱学习器。在计算机视觉和语言任务中,我们证明PENEX展现出正则化效果,其表现通常优于计算成本相近的现有方法。我们的研究结果突显了PENEX作为受AdaBoost启发的替代方案,在深度神经网络的有效训练和微调方面具有潜力。