There is extensive interest in metric learning methods for image retrieval. Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity. Our contextual loss implicitly enforces semantic consistency among neighbors while converging to the correct ranking. We empirically show that the proposed loss is more robust to label noise, and is less prone to overfitting even when a large portion of train data is withheld. Extensive experiments demonstrate that our method achieves a new state-of-the-art across four image retrieval benchmarks and multiple different evaluation settings. Code is available at: https://github.com/Chris210634/metric-learning-using-contextual-similarity
翻译:图像检索领域的度量学习方法备受关注。许多度量学习损失函数聚焦于学习训练样本的正确排序,但容易过度拟合语义不一致的标签,且需要大量数据。为解决这些问题,我们提出一种名为上下文损失的新度量学习方法,该方法在余弦相似度之外优化上下文相似性。我们的上下文损失在收敛到正确排序的同时,隐式强制邻域内的语义一致性。实验表明,该损失函数对标签噪声更具鲁棒性,即使 withheld 大部分训练数据也不易过拟合。大量实验证明,我们的方法在四个图像检索基准测试和多种不同评估设置下均实现了最新最优性能。代码开源地址:https://github.com/Chris210634/metric-learning-using-contextual-similarity