Soft targets combined with the cross-entropy loss have shown to improve generalization performance of deep neural networks on supervised classification tasks. The standard cross-entropy loss however assumes data to be categorically distributed, which may often not be the case in practice. In contrast, InfoNCE does not rely on such an explicit assumption but instead implicitly estimates the true conditional through negative sampling. Unfortunately, it cannot be combined with soft targets in its standard formulation, hindering its use in combination with sophisticated training strategies. In this paper, we address this limitation by proposing a principled loss function that is compatible with probabilistic targets. Our new soft target InfoNCE loss is conceptually simple, efficient to compute, and can be derived within the framework of noise contrastive estimation. Using a toy example, we demonstrate shortcomings of the categorical distribution assumption of cross-entropy, and discuss implications of sampling from soft distributions. We observe that soft target InfoNCE performs on par with strong soft target cross-entropy baselines and outperforms hard target NLL and InfoNCE losses on popular benchmarks, including ImageNet. Finally, we provide a simple implementation of our loss, geared towards supervised classification and fully compatible with deep classification model trained with cross-entropy.
翻译:软目标结合交叉熵损失已被证明能提升深度神经网络在监督分类任务上的泛化性能。然而,标准交叉熵损失假设数据服从类别分布,这在实践中往往不成立。相比之下,InfoNCE不依赖此类显式假设,而是通过负采样隐式估计真实条件分布。遗憾的是,其标准公式无法与软目标结合,限制了它在复杂训练策略中的应用。本文通过提出一种与概率目标兼容的基于原理的损失函数来解决这一限制。我们的新型软目标InfoNCE损失在概念上简洁、计算高效,且可在噪声对比估计框架内推导得出。通过一个玩具示例,我们展示了交叉熵的类别分布假设的局限性,并讨论了从软分布中采样的影响。我们观察到,软目标InfoNCE在性能上与强基线软目标交叉熵持平,并在包括ImageNet在内的流行基准上优于硬目标负对数似然和InfoNCE损失。最后,我们提供了该损失的简洁实现,该实现面向监督分类,并与基于交叉熵训练的深度分类模型完全兼容。