Cross-encoder models, which jointly encode and score a query-item pair, are typically prohibitively expensive for k-nearest neighbor search. Consequently, k-NN search is performed not with a cross-encoder, but with a heuristic retrieve (e.g., using BM25 or dual-encoder) and re-rank approach. Recent work proposes ANNCUR (Yadav et al., 2022) which uses CUR matrix factorization to produce an embedding space for efficient vector-based search that directly approximates the cross-encoder without the need for dual-encoders. ANNCUR defines this shared query-item embedding space by scoring the test query against anchor items which are sampled uniformly at random. While this minimizes average approximation error over all items, unsuitably high approximation error on top-k items remains and leads to poor recall of top-k (and especially top-1) items. Increasing the number of anchor items is a straightforward way of improving the approximation error and hence k-NN recall of ANNCUR but at the cost of increased inference latency. In this paper, we propose a new method for adaptively choosing anchor items that minimizes the approximation error for the practically important top-k neighbors for a query with minimal computational overhead. Our proposed method incrementally selects a suitable set of anchor items for a given test query over several rounds, using anchors chosen in previous rounds to inform selection of more anchor items. Empirically, our method consistently improves k-NN recall as compared to both ANNCUR and the widely-used dual-encoder-based retrieve-and-rerank approach.
翻译:跨编码器模型通过联合编码并评分查询-条目对,通常因计算成本过高而难以直接用于k近邻搜索。因此,实际应用中并非直接使用跨编码器进行k-NN搜索,而是采用启发式检索(例如BM25或双编码器)加重新排序的流水线方法。近期研究提出ANNCUR(Yadav等,2022),该方法利用CUR矩阵分解构建嵌入空间,实现直接逼近跨编码器的高效向量搜索,无需依赖双编码器。ANNCUR通过将测试查询与均匀随机采样的锚定项进行评分,定义了共享的查询-条目嵌入空间。虽然这种方式能最小化所有条目的平均近似误差,但top-k条目(尤其是top-1条目)仍存在较高的近似误差,导致召回率低下。增加锚定项数量是直接提升近似误差表现进而提高ANNCUR的k-NN召回率的途径,但这会增加推理延迟。本文提出一种自适应的锚定项选择方法,通过最小化计算开销,为查询中实际重要的top-k邻居降低近似误差。该方法通过多轮迭代为给定测试查询逐步选取适宜的锚定项集合,并利用前期选定的锚定项指导后续锚定项的增补。实验表明,相较ANNCUR和广泛使用的双编码器检索-重排序方法,本方法在k-NN召回率上实现了持续提升。