Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and semantic analytics. Existing state-of-the-art approach exploits work sharing among similar queries but still suffers from redundant index traversals and excessive distance computations. We propose a unified framework for efficient approximate vector joins that (1) introduces soft work sharing to reuse traversal results beyond the join results of previous queries, (2) builds a merged index over both query and data vectors to further speedup graph explorations, and (3) improves robustness for out-of-distribution queries through an adaptive hybrid search strategy. Experiments on eight datasets demonstrate substantial improvements in efficiency-recall trade-off over the state of the art.
翻译:向量连接——在查询向量集与数据向量集之间找出所有距离低于给定阈值的向量对——是现代向量及向量-关系数据库系统的核心操作,这些系统为多模态检索与语义分析提供支撑。现有前沿方法虽利用了相似查询间的工作共享,但仍面临冗余索引遍历与过量距离计算的问题。本文提出一个高效近似向量连接的统一框架,该框架(1)引入软工作共享机制,以复用超出先前查询连接结果的遍历成果;(2)在查询向量与数据向量上构建融合索引,进一步加速图探索过程;(3)通过自适应混合搜索策略提升对分布外查询的鲁棒性。在八个数据集上的实验表明,该方法在效率-召回率权衡方面较现有技术实现了显著提升。