The performance of neural networks in content-based image retrieval (CBIR) is highly influenced by the chosen loss (objective) function. The majority of objective functions for neural models can be divided into metric learning and statistical learning. Metric learning approaches require a pair mining strategy that often lacks efficiency, while statistical learning approaches are not generating highly compact features due to their indirect feature optimization. To this end, we propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimizes for the L2 metric without the need of generating pairs. Our loss is formed of three components. One leading objective ensures that the learned features are attracted to each designated learnable class anchor. The second loss component regulates the anchors and forces them to be separable by a margin, while the third objective ensures that the anchors do not collapse to zero. Furthermore, we develop a more efficient two-stage retrieval system by harnessing the learned class anchors during the first stage of the retrieval process, eliminating the need of comparing the query with every image in the database. We establish a set of four datasets (CIFAR-100, Food-101, SVHN, and Tiny ImageNet) and evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures. Compared to existing objective functions, our empirical evidence shows that the proposed objective is generating superior and more consistent results.
翻译:在基于内容的图像检索中,神经网络的性能受损失函数选择的影响极大。大多数神经网络的损失函数可分为度量学习和统计学习两类。度量学习方法需要样本对挖掘策略,但效率较低;而统计学习方法因间接优化特征,难以生成高度紧凑的特征表示。为此,我们提出一种新型的排斥-吸引损失函数,该函数属于度量学习范式,但无需生成样本对即可直接优化L2度量。我们的损失函数由三个部分组成:主要目标确保学习到的特征被吸引到各自指定的可学习类锚点;第二个损失分量调节锚点,强制它们通过间隔保持可分离性;第三个目标则确保锚点不会坍塌为零。此外,我们利用检索过程第一阶段中已学习的类锚点,开发了一种更高效的两阶段检索系统,无需将查询与数据库中的每个图像逐一比较。我们在四个数据集(CIFAR-100、Food-101、SVHN和Tiny ImageNet)上,分别采用卷积架构和Transformer架构,在少样本和全量训练两种场景下对提出的目标函数在基于内容的图像检索任务中进行了评估。与现有目标函数相比,实验证据表明我们提出的目标函数能生成更优且更一致的结果。