We define and investigate the problem of $\textit{c-approximate window search}$: approximate nearest neighbor search where each point in the dataset has a numeric label, and the goal is to find nearest neighbors to queries within arbitrary label ranges. Many semantic search problems, such as image and document search with timestamp filters, or product search with cost filters, are natural examples of this problem. We propose and theoretically analyze a modular tree-based framework for transforming an index that solves the traditional c-approximate nearest neighbor problem into a data structure that solves window search. On standard nearest neighbor benchmark datasets equipped with random label values, adversarially constructed embeddings, and image search embeddings with real timestamps, we obtain up to a $75\times$ speedup over existing solutions at the same level of recall.
翻译:我们定义并研究了$\textit{c-近似窗口搜索}$问题:一种近似最近邻搜索,其中数据集中的每个点都有一个数值标签,目标是找到任意标签范围内查询点的最近邻。许多语义搜索问题,例如带时间戳过滤的图像和文档搜索,或带成本过滤的产品搜索,都是该问题的自然实例。我们提出并理论分析了一种模块化的基于树的框架,用于将传统c-近似最近邻问题的索引转换为解决窗口搜索的数据结构。在配备随机标签值、对抗性构造嵌入以及带有真实时间戳的图像搜索嵌入的标准最近邻基准数据集上,我们在相同召回率水平下获得了比现有解决方案高达$75\times$的加速比。