Hybrid queries combining high-dimensional vector similarity with structured attribute filtering have garnered significant attention across both academia and industry. A critical instance of this paradigm is filtered Approximate k Nearest Neighbor (AKNN) search, where embeddings (e.g., image or text) are queried alongside constraints such as labels or numerical range. While essential for rich retrieval, optimizing these queries remains challenging due to the highly variable search cost induced by combined filters. In this paper, we propose a novel cost estimation framework, E2E, for filtered AKNN search and demonstrate its utility in downstream optimization tasks, specifically early termination. Unlike existing approaches, our model explicitly captures the correlation between the query vector distribution and attribute-value selectivity, yielding significantly higher estimation accuracy. By leveraging these estimates to refine search termination conditions, we achieve substantial performance gains. Experimental results on real-world datasets demonstrate that our approach improves retrieval efficiency by 2x-3x over state-of-the-art baselines while maintaining high search accuracy.
翻译:结合高维向量相似度与结构化属性过滤的混合查询在学术界和工业界均引起了广泛关注。该范式的一个重要实例是带过滤的近似k最近邻(AKNN)搜索,其中嵌入向量(如图像或文本)的查询与标签或数值范围等约束条件相结合。尽管这种查询对于丰富检索至关重要,但由于组合过滤器导致搜索成本高度可变,优化此类查询仍然具有挑战性。本文提出了一种新颖的成本估计框架E2E,用于带过滤的AKNN搜索,并展示了其在下游优化任务(特别是提前终止)中的实用性。与现有方法不同,我们的模型显式地捕捉了查询向量分布与属性值选择性之间的相关性,从而实现了显著更高的估计精度。通过利用这些估计来优化搜索终止条件,我们获得了显著的性能提升。在真实数据集上的实验结果表明,我们的方法在保持高搜索精度的同时,将检索效率较现有最优基线提高了2至3倍。