Vector databases increasingly enforce role-based access control: each top-k approximate nearest neighbor query must return only vectors the querying role is authorized to access. Two extremes bracket the design space. A single global index avoids data duplication but wastes search effort on unauthorized vectors and degrades recall, while an oracle index, built with all authorized vectors of the query roles, searches only authorized vectors but duplicates every shared vector between roles or queries. We present Veda and its efficient variant EffVeda, two indexing strategies built on an access-aware lattice to address access control in vector databases. The methods first partitions the dataset into disjoint data blocks by role combination, then leverage the structure of the access-aware lattice to apply copy and merge operations to group co-accessed blocks under a user-specified storage budget. Large nodes in the lattice are then indexed with HNSW, while small nodes are retained for linear scan. For each role, our methods construct a query plan that selects the minimal set of nodes that covers the role's authorized data. At query time, coordinated search first queries pure (authorized-only) nodes to populate a global top-k heap. The resulting distance bound then prunes exploration on impure nodes, avoiding the inflated search that independent per-index execution would require.
翻译:向量数据库日益强化基于角色的访问控制:每个近似k近邻查询必须仅返回查询角色有权访问的向量。两种极端方案界定了设计空间。单一全局索引避免数据重复,但会浪费搜索开销在未授权向量上并降低召回率;而预言索引为每个查询角色构建包含所有授权向量的索引,仅搜索授权向量,但会在角色或查询间重复存储每个共享向量。我们提出Veda及其高效变体EffVeda,两种基于访问感知格结构的索引策略,用于解决向量数据库中的访问控制问题。该方法首先按角色组合将数据集划分为互不相交的数据块,随后借助访问感知格的结构,在用户指定的存储预算下通过复制与合并操作将共访问的数据块分组。格中大型节点采用HNSW索引,小型节点保留用于线性扫描。针对每个角色,我们构建查询计划,选择覆盖该角色授权数据的最小节点集合。查询时,协调搜索首先查询纯节点(仅含授权数据)以填充全局top-k堆。由此产生的距离界限随后用于剪枝对非纯节点的探索,避免独立逐索引执行所需的高昂搜索开销。