Document retrieval identifies relevant documents but does not provide fine-grained evidence cues, such as specific relevant spans. A possible solution is to apply an LLM after retrieval; however, this introduces significant computational overhead and limits practical deployment. We propose FGR-ColBERT, a modification of ColBERT retrieval model that integrates fine-grained relevance signals distilled from an LLM directly into the retrieval function. Experiments on MS MARCO show that FGR-ColBERT (110M) achieves a token-level F1 of 64.5, exceeding the 62.8 of Gemma 2 (27B), despite being approximately 245 times smaller. At the same time, it preserves retrieval effectiveness (99% relative Recall@50) and remains efficient, incurring only a ~1.12x latency overhead compared to the original ColBERT.
翻译:文档检索能够识别相关文档,但无法提供细粒度证据线索(例如具体的相关片段)。一种可能的解决方案是在检索后应用大语言模型(LLM),但这会引入显著的计算开销并限制实际部署。我们提出FGR-ColBERT,一种对ColBERT检索模型的改进,该模型将LLM蒸馏得到的细粒度相关性信号直接集成到检索函数中。在MS MARCO上的实验表明,FGR-ColBERT(110M参数)在token级F1分数上达到64.5,超过了Gemma 2(27B参数)的62.8,尽管其规模约为后者的1/245。同时,它保持了检索有效性(相对Recall@50达99%),并且效率依然较高,相较于原始ColBERT仅引入约1.12倍的延迟开销。