Filtered Approximate Nearest Neighbor Search in Vector Databases: System Design and Performance Analysis

Retrieval-Augmented Generation (RAG) applications increasingly rely on Filtered Approximate Nearest Neighbor Search (FANNS) to combine semantic retrieval with metadata constraints. While algorithmic innovations for FANNS have been proposed, there remains a lack of understanding regarding how generic filtering strategies perform within Vector Databases. In this work, we systematize the taxonomy of filtering strategies and evaluate their integration into FAISS, Milvus, and pgvector. To provide a robust benchmarking framework, we introduce a new relational dataset, \textit{MoReVec}, consisting of two tables, featuring 768-dimensional text embeddings and a rich schema of metadata attributes. We further propose the \textit{Global-Local Selectivity (GLS)} correlation metric to quantify the relationship between filters and query vectors. Our experiments reveal that algorithmic adaptations within the engine often override raw index performance. Specifically, we find that: (1) \textit{Milvus} achieves superior recall stability through hybrid approximate/exact execution; (2) \textit{pgvector}'s cost-based query optimizer frequently selects suboptimal execution plans, favoring approximate index scans even when exact sequential scans would yield perfect recall at comparable latency; and (3) partition-based indexes (IVFFlat) outperform graph-based indexes (HNSW) for low-selectivity queries. To facilitate this analysis, we extend the widely-used \textit{ANN-Benchmarks} to support filtered vector search and make it available online. Finally, we synthesize our findings into a set of practical guidelines for selecting index types and configuring query optimizers for hybrid search workloads.

翻译：检索增强生成（RAG）应用日益依赖带过滤的近似最近邻搜索（FANNS）来结合语义检索与元数据约束。尽管已有针对FANNS的算法创新提出，但对于通用过滤策略在向量数据库中的性能表现，目前仍缺乏深入理解。本研究系统化梳理了过滤策略的分类体系，并评估了它们在FAISS、Milvus和pgvector中的集成效果。为提供稳健的基准测试框架，我们引入了一个新的关系型数据集\textit{MoReVec}，该数据集包含两个表，具有768维文本嵌入向量和丰富的元数据属性模式。我们进一步提出了\textit{全局-局部选择性（GLS）}相关性度量，用于量化过滤器与查询向量之间的关系。实验结果表明，引擎内部的算法适配常常超越原始索引性能。具体而言，我们发现：（1）\textit{Milvus}通过混合近似/精确执行实现了优异的召回率稳定性；（2）\textit{pgvector}基于成本的查询优化器频繁选择次优执行计划，倾向于使用近似索引扫描，即使在精确顺序扫描能以相当延迟实现完美召回的情况下；（3）对于低选择性查询，基于分区的索引（IVFFlat）优于基于图的索引（HNSW）。为支持本分析，我们扩展了广泛使用的\textit{ANN-Benchmarks}以支持带过滤的向量搜索，并将其在线公开。最后，我们将研究结果综合为一套实用指南，用于为混合搜索工作负载选择索引类型和配置查询优化器。