Soft targets combined with the cross-entropy loss have shown to improve generalization performance of deep neural networks on supervised classification tasks. The standard cross-entropy loss however assumes data to be categorically distributed, which may often not be the case in practice. In contrast, InfoNCE does not rely on such an explicit assumption but instead implicitly estimates the true conditional through negative sampling. Unfortunately, it cannot be combined with soft targets in its standard formulation, hindering its use in combination with sophisticated training strategies. In this paper, we address this limitation by proposing a loss function that is compatible with probabilistic targets. Our new soft target InfoNCE loss is conceptually simple, efficient to compute, and can be motivated through the framework of noise contrastive estimation. Using a toy example, we demonstrate shortcomings of the categorical distribution assumption of cross-entropy, and discuss implications of sampling from soft distributions. We observe that soft target InfoNCE performs on par with strong soft target cross-entropy baselines and outperforms hard target NLL and InfoNCE losses on popular benchmarks, including ImageNet. Finally, we provide a simple implementation of our loss, geared towards supervised classification and fully compatible with deep classification models trained with cross-entropy.
翻译:软目标与交叉熵损失结合已被证明能提升深度神经网络在有监督分类任务上的泛化性能。然而,标准交叉熵损失假设数据服从分类分布,这一假设在实践中往往不成立。相比之下,InfoNCE 不依赖此类显式假设,而是通过负采样隐式估计真实条件分布。遗憾的是,其标准形式无法与软目标结合,这阻碍了其与复杂训练策略的协同使用。本文通过提出一种兼容概率化目标的损失函数来解决这一局限。我们提出的软目标 InfoNCE 损失在概念上简洁、计算高效,并可通过噪声对比估计框架进行理论推导。通过一个示例实验,我们展示了交叉熵的分类分布假设的缺陷,并讨论了从软分布中采样的影响。实验表明,软目标 InfoNCE 在包括 ImageNet 在内的主流基准测试中,与强基线软目标交叉熵方法表现相当,且优于硬目标负对数似然损失及硬目标 InfoNCE 损失。最后,我们提供了该损失的简洁实现方案,适用于有监督分类任务,并与基于交叉熵训练的深度分类模型完全兼容。