There is extensive interest in metric learning methods for image retrieval. Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity. Our contextual loss implicitly enforces semantic consistency among neighbors while converging to the correct ranking. We empirically show that the proposed loss is more robust to label noise, and is less prone to overfitting even when a large portion of train data is withheld. Extensive experiments demonstrate that our method achieves a new state-of-the-art across four image retrieval benchmarks and multiple different evaluation settings. Code is available at: https://github.com/Chris210634/metric-learning-using-contextual-similarity
翻译:图像检索中度量学习方法引起了广泛关注。许多度量学习损失函数专注于学习训练样本的正确排序,但容易过拟合语义不一致的标签,并且需要大量数据。为解决这些不足,我们提出了一种新的度量学习方法,称为上下文损失,它在余弦相似性之外还优化了上下文相似性。我们的上下文损失在收敛到正确排序的同时,隐式地增强了邻居间的语义一致性。实验表明,所提出的损失对标签噪声更具鲁棒性,并且即使在大部分训练数据被 withheld 的情况下也不易过拟合。大量实验证明,我们的方法在四个图像检索基准测试和多种不同评估设置下均达到了新的最优性能。代码地址:https://github.com/Chris210634/metric-learning-using-contextual-similarity