Visual similarities discovery (VSD) is an important task with broad e-commerce applications. Given an image of a certain object, the goal of VSD is to retrieve images of different objects with high perceptual visual similarity. Although being a highly addressed problem, the evaluation of proposed methods for VSD is often based on a proxy of an identification-retrieval task, evaluating the ability of a model to retrieve different images of the same object. We posit that evaluating VSD methods based on identification tasks is limited, and faithful evaluation must rely on expert annotations. In this paper, we introduce the first large-scale fashion visual similarity benchmark dataset, consisting of more than 110K expert-annotated image pairs. Besides this major contribution, we share insight from the challenges we faced while curating this dataset. Based on these insights, we propose a novel and efficient labeling procedure that can be applied to any dataset. Our analysis examines its limitations and inductive biases, and based on these findings, we propose metrics to mitigate those limitations. Though our primary focus lies on visual similarity, the methodologies we present have broader applications for discovering and evaluating perceptual similarity across various domains.
翻译:视觉相似性发现(VSD)是一项重要任务,在电子商务领域具有广泛应用。给定某个目标物体的图像,VSD的目标是检索在视觉感知上高度相似的不同物体图像。尽管该问题已得到广泛研究,但对VSD方法的评估通常基于识别-检索任务的代理,即评估模型检索同一物体不同图像的能力。我们认为,基于识别任务评估VSD方法存在局限性,而可靠的评估必须依赖于专家标注。本文首次构建了大规模时尚视觉相似性基准数据集,包含超过11万对经专家标注的图像对。除这一主要贡献外,我们还分享了在构建该数据集时所面临挑战的洞察。基于这些见解,我们提出了一种新颖且高效的标注流程,可应用于任意数据集。我们的分析探讨了该流程的局限性及归纳偏差,并据此提出了缓解这些局限性的度量指标。尽管主要聚焦于视觉相似性,但我们所提出的方法在多个领域的感知相似性发现与评估中具有更广泛的应用前景。