In this paper, we introduce a novel discriminative loss function with large margin in the context of Deep Learning. This loss boosts the discriminative power of neural nets, represented by intra-class compactness and inter-class separability. On the one hand, the class compactness is ensured by close distance of samples of the same class to each other. On the other hand, the inter-class separability is boosted by a margin loss that ensures the minimum distance of each class to its closest boundary. All the terms in our loss have an explicit meaning, giving a direct view of the feature space obtained. We analyze mathematically the relation between compactness and margin term, giving a guideline about the impact of the hyper-parameters on the learned features. Moreover, we also analyze properties of the gradient of the loss with respect to the parameters of the neural net. Based on this, we design a strategy called partial momentum updating that enjoys simultaneously stability and consistency in training. Furthermore, we also investigate generalization errors to have better theoretical insights. Our loss function systematically boosts the test accuracy of models compared to the standard softmax loss in our experiments.
翻译:本文在深度学习背景下,提出了一种新颖的大间隔判别损失函数。该损失函数通过类内紧致性和类间分离性来增强神经网络的判别能力。一方面,通过缩小同类样本间的距离来确保类内紧致性;另一方面,通过间隔损失函数增强类间分离性,该间隔损失确保每个类别与其最近决策边界的最小距离。损失函数中的所有项都具有明确的物理意义,可直接反映所获特征空间的特性。我们从数学角度分析了紧致性项与间隔项之间的关系,为超参数对学习特征的影响提供了理论指导。此外,我们还分析了损失函数对神经网络参数的梯度特性。基于此,我们设计了称为"部分动量更新"的训练策略,该策略在训练过程中同时具备稳定性和一致性。进一步地,我们还研究了泛化误差以获得更深入的理论认知。实验结果表明,与标准softmax损失函数相比,我们的损失函数能系统性地提升模型的测试准确率。