Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents. To address this challenge, in this work we leverage instructions to flexibly describe retrieval intents and introduce I3, a unified retrieval system that performs Intent-Introspective retrieval across various tasks, conditioned on Instructions without any task-specific training. I3 innovatively incorporates a pluggable introspector in a parameter-isolated manner to comprehend specific retrieval intents by jointly reasoning over the input query and instruction, and seamlessly integrates the introspected intent into the original retrieval model for intent-aware retrieval. Furthermore, we propose progressively-pruned intent learning. It utilizes extensive LLM-generated data to train I3 phase-by-phase, embodying two key designs: progressive structure pruning and drawback extrapolation-based data refinement. Extensive experiments show that in the BEIR benchmark, I3 significantly outperforms baseline methods designed with task-specific retrievers, achieving state-of-the-art zero-shot performance without any task-specific tuning.
翻译:近期研究表明,密集检索模型在缺乏专用训练数据的各类检索任务中表现欠佳,这是因为不同检索任务往往蕴含迥异的搜索意图。为解决该挑战,本研究利用指令灵活描述检索意图,提出统一检索系统I3——一种能在无需任务特定训练的条件下,基于指令跨任务执行意图内省式检索的框架。I3创新性地以参数隔离方式集成可插拔内省模块,通过联合推理输入查询与指令理解特定检索意图,并将内省所得意图无缝融入原始检索模型以实现意图感知检索。进一步地,我们提出渐进式剪枝意图学习方法,利用大规模大语言模型生成数据分阶段训练I3,其包含两项核心设计:渐进结构剪枝与基于缺陷外推的数据精炼。大量实验表明,在BEIR基准测试中,I3显著超越采用任务特定检索器的基线方法,在无需任何任务特定调优的情况下实现了最先进的零样本性能。