Vector databases increasingly enforce role-based access control, where each top-k approximate nearest neighbor query must return only vectors the querying role is authorized to access. Two extremes bracket the design space. A single global index built over all vectors avoids duplication but wastes search effort on unauthorized vectors and degrades recall, while an oracle index, built with all authorized vectors to the query roles, searches only authorized vectors but duplicates every shared vector between roles or queries. We present Veda and its efficient variant EffVeda, two indexing strategies built on an access-aware lattice to address access control in vector databases. The methods first partitions the dataset into disjoint data blocks by role combination, then leverage the structure of the access-aware lattice to apply copy and merge operations to group co-accessed blocks under a user-specified storage budget. Large nodes in the lattice are then indexed with HNSW, while small nodes are retained for linear scan. To facilitate query processing on the lattice, our methods construct a query plan that selects the minimal set of nodes that covers all authorized data for each role. At query time, coordinated search first queries pure (authorized-only) nodes to populate a global top-k heap, then leverages the resulting distance bound of the k-th data in the heap to prune exploration on impure nodes. Evaluations show that our methods deliver higher throughput at high recall while closely tracking the storage budget.
翻译:向量数据库日益强化基于角色的访问控制,要求每个top-k近似最近邻查询仅返回查询角色有权访问的向量。两种极端方案占据了设计空间的两端。一种覆盖所有向量的单一全局索引避免了数据重复,但会在未授权向量上浪费搜索精力并降低召回率;而一种基于查询角色所有授权向量构建的"神谕索引"虽仅搜索授权向量,却会导致角色间或查询间每个共享向量的重复。我们提出Veda及其高效变体EffVeda,这两种基于访问感知格结构的索引策略旨在解决向量数据库中的访问控制问题。该方法首先按角色组合将数据集划分为互不相交的数据块,随后利用访问感知格的结构特性,通过复制与合并操作将共访问块分组到用户指定的存储预算内。格中大节点采用HNSW索引,小节点则保留用于线性扫描。为加速格上查询处理,我们的方法构建了查询计划,为每个角色选择覆盖全部授权数据的最小节点集。查询时,协调搜索首先查询纯(仅授权)节点以填充全局top-k堆,随后利用堆中第k个数据的距离界对非纯节点进行剪枝探索。评估表明,我们的方法在严格遵守存储预算的同时,可在高召回率下实现更高吞吐量。