[Abridged] - Spectral Retrieval is a plug-in re-ranking stage that interpolates between per-token MaxSim and mean-pool retrieval through a multi-scale sinc convolution over token embeddings. In standard dense retrieval each document is one mean-pooled vector; when relevance localises into a short subspan, the signal averages into noise. Spectral Retrieval reuses per-token embeddings from a late-interaction index and convolves them with a normalised sinc kernel at multiple scales. At L=1 the kernel acts as the identity, recovering per-token MaxSim; as L grows it approaches a uniform filter, recovering mean pooling. The maximum cosine over positions and scales yields a score provably no less informative than either endpoint. On a controlled synthetic benchmark with 1,000 documents and planted single-position spikes, mean-pool retrieval sits at chance (Recall@10 ~ 0.02) regardless of spike strength, while Spectral Retrieval reaches Recall@10 = 1.0 once the planted cosine exceeds the corpus-level token noise floor. On LIMIT-small with a frozen all-mpnet-base-v2 encoder, Spectral Retrieval lifts Recall@10 from 0.33 to 0.90, MRR from 0.22 to 0.79, and strict Success@10 from 0.12 to 0.84, without retraining. The method fits naturally into multi-agent LLM systems, where each agent benefits from a tighter, role-specific retrieval window over a shared corpus.
翻译:谱检索(Spectral Retrieval)是一种即插即用的重排序阶段,通过在token嵌入上执行多尺度辛克卷积,实现了逐token MaxSim与均值池化检索之间的插值。在标准稠密检索中,每个文档对应一个均值池化向量;当相关性局限于短子片段时,信号会淹没于噪声之中。谱检索复用了延迟交互索引中的逐token嵌入,并以归一化辛克核在多个尺度上进行卷积。当尺度参数L=1时,核函数退化为恒等映射,恢复逐token MaxSim;随着L增大,其趋近于均匀滤波器,恢复均值池化。通过取位置与尺度上的最大余弦相似度,所得分数在信息量上必然不低于两个端点。在包含1,000个文档与植入单位置峰值的受控合成基准测试中,无论峰值强度如何,均值池化检索均处于随机水平(Recall@10 ≈ 0.02),而谱检索在植入余弦值超过语料级token噪声底限后即达到Recall@10 = 1.0。在采用冻结all-mpnet-base-v2编码器的LIMIT-small数据集上,谱检索无需重新训练即可将Recall@10从0.33提升至0.90,MRR从0.22提升至0.79,严格的Success@10从0.12提升至0.84。该方法可自然适配多智能体大语言模型系统,其中每个智能体均可在共享语料库上受益于更精准、角色特定的检索窗口。