We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.
翻译:我们提出了语义召回(Semantic Recall)这一新型评估指标,通过仅考虑理论上可通过精确最近邻搜索检索到的语义相关对象,来评估近似最近邻搜索算法的质量。与传统召回率不同,语义召回不会因算法未能检索到与查询语义无关的对象而施加惩罚——即便这些对象属于查询的最近邻集合。我们证明,在查询结果中最近邻内相关结果稀少的场景中(该场景在嵌入数据集中普遍存在),语义召回对于评估检索质量尤为有效。此外,我们提出了宽容召回(Tolerant Recall)这一代理指标,用于在无法识别语义相关对象时近似估计语义召回。实验表明,我们的指标能够更有效地反映检索质量,且针对这些指标优化搜索算法可提升成本-质量权衡效果。