Sketch-Based Image Retrieval (SBIR) is a crucial task in multimedia retrieval, where the goal is to retrieve a set of images that match a given sketch query. Researchers have already proposed several well-performing solutions for this task, but most focus on enhancing embedding through different approaches such as triplet loss, quadruplet loss, adding data augmentation, and using edge extraction. In this work, we tackle the problem from various angles. We start by examining the training data quality and show some of its limitations. Then, we introduce a Relative Triplet Loss (RTL), an adapted triplet loss to overcome those limitations through loss weighting based on anchors similarity. Through a series of experiments, we demonstrate that replacing a triplet loss with RTL outperforms previous state-of-the-art without the need for any data augmentation. In addition, we demonstrate why batch normalization is more suited for SBIR embeddings than l2-normalization and show that it improves significantly the performance of our models. We further investigate the capacity of models required for the photo and sketch domains and demonstrate that the photo encoder requires a higher capacity than the sketch encoder, which validates the hypothesis formulated in [34]. Then, we propose a straightforward approach to train small models, such as ShuffleNetv2 [22] efficiently with a marginal loss of accuracy through knowledge distillation. The same approach used with larger models enabled us to outperform previous state-of-the-art results and achieve a recall of 62.38% at k = 1 on The Sketchy Database [30].
翻译:基于草图的图像检索(Sketch-Based Image Retrieval, SBIR)是多媒体检索中的关键任务,其目标是从图像集中检索出与给定草图查询相匹配的图像。研究者已提出多种高性能方案,但多数聚焦于通过三元组损失、四元组损失、数据增强及边缘提取等方法提升嵌入质量。本文从多角度切入该问题:首先分析训练数据质量并揭示其局限性;继而提出相对三元组损失(Relative Triplet Loss, RTL)——一种基于锚点相似度进行损失加权的改进三元组损失,以克服上述局限。系列实验表明,无需任何数据增强,用RTL替代三元组损失即可超越现有最优方法。此外,我们论证了SBIR嵌入中批量归一化比L2归一化更具适应性,并证明其显著提升了模型性能。进一步探究照片域与草图域所需模型容量后,发现照片编码器需要比草图编码器更高的容量,验证了文献[34]的假设。最后提出一种直接方法,通过知识蒸馏高效训练小型模型(如ShuffleNetv2 [22]),仅牺牲微小精度损失。将该方法应用于大型模型后,我们在Sketchy Database [30]上以k=1达到62.38%的召回率,超越此前最优结果。