In the case of an imbalance between positive and negative samples, hard negative mining strategies have been shown to help models learn more subtle differences between positive and negative samples, thus improving recognition performance. However, if too strict mining strategies are promoted in the dataset, there may be a risk of introducing false negative samples. Meanwhile, the implementation of the mining strategy disrupts the difficulty distribution of samples in the real dataset, which may cause the model to over-fit these difficult samples. Therefore, in this paper, we investigate how to trade off the difficulty of the mined samples in order to obtain and exploit high-quality negative samples, and try to solve the problem in terms of both the loss function and the training strategy. The proposed balance loss provides an effective discriminant for the quality of negative samples by combining a self-supervised approach to the loss function, and uses a dynamic gradient modulation strategy to achieve finer gradient adjustment for samples of different difficulties. The proposed annealing training strategy then constrains the difficulty of the samples drawn from negative sample mining to provide data sources with different difficulty distributions for the loss function, and uses samples of decreasing difficulty to train the model. Extensive experiments show that our new descriptors outperform previous state-of-the-art descriptors for patch validation, matching, and retrieval tasks.
翻译:在正负样本不平衡的情况下,难负样本挖掘策略已被证明能帮助模型学习正负样本间更细微的差异,从而提升识别性能。然而,若在数据集中推行过于严苛的挖掘策略,可能存在引入假负样本的风险。同时,挖掘策略的实施破坏了真实数据集中样本的难度分布,可能导致模型对这些难样本过拟合。因此,本文研究如何在挖掘样本中权衡难度以获取并利用高质量负样本,并尝试从损失函数和训练策略两方面解决该问题。所提出的平衡损失通过将自监督方法融入损失函数,为负样本质量提供有效的判别依据,并采用动态梯度调制策略实现对不同难度样本更精细的梯度调整。所提出的退火训练策略则约束从负样本挖掘中抽取样本的难度,为损失函数提供不同难度分布的数据源,并使用难度递减的样本训练模型。大量实验表明,我们的新描述子在图像块验证、匹配和检索任务上优于先前最先进的描述子。