We study the problem of $\textit{vector set search}$ with $\textit{vector set queries}$. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are $\textit{sets}$ of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (${\bf D}$ESSERT ${\bf E}$ffeciently ${\bf S}$earches ${\bf S}$ets of ${\bf E}$mbeddings via ${\bf R}$etrieval ${\bf T}$ables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.
翻译:我们研究使用向量集合查询进行向量集合搜索的问题。该任务与传统近邻搜索类似,区别在于查询和集合中的每个元素均为向量集合。我们发现该问题是语义搜索应用的核心子程序,而现有解决方案速度慢得无法接受。为此,我们提出一种新的近似搜索算法DESSERT(使用检索表高效搜索向量嵌入集合)。DESSERT是一种通用工具,具有强大的理论保证和卓越的实证表现。当我们将DESSERT集成到最先进的语义搜索模型ColBERT中时,在MS MARCO和LoTTE检索基准测试上实现了2-5倍的加速,且召回率损失极小,凸显了该方案的有效性和实际应用价值。