We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval. The key insight behind our method is that referrals provide a more complete, multi-view representation of a document, much like incoming page links in algorithms like PageRank provide a comprehensive idea of a webpage's importance. RAR works with both sparse and dense retrievers, and outperforms generative text expansion techniques such as DocT5Query and Query2Doc a 37% and 21% absolute improvement on ACL paper retrieval Recall@10 -- while also eliminating expensive model training and inference. We also analyze different methods for multi-referral aggregation and show that RAR enables up-to-date information retrieval without re-training.
翻译:我们提出引用增强检索(RAR),这是一种简单技术,通过将文档索引与引用(即其他文档中提及或链接至给定文档的文本)相连接,为零样本信息检索带来显著的性能提升。该方法的核心洞见在于,引用提供了更全面的文档多视角表示,类似于PageRank等算法中入链链接能全面反映网页重要性的原理。RAR可同时适用于稀疏检索器和稠密检索器,并在ACL论文检索任务中实现Recall@10指标超越DocT5Query和Query2Doc等生成式文本扩展技术37%和21%的绝对提升,同时消除了昂贵的模型训练与推理开销。我们还分析了多种多引用聚合方法,并证明RAR无需重新训练即可实现信息检索的时效性更新。