This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.
翻译:本文研究信息检索问题,旨在适应未知任务。现有工作通过从领域特定文档生成合成查询来联合训练检索器。然而,传统查询生成器将查询默认为问题形式,因而无法适应通用搜索意图。更灵活的方法引入了任务自适应元素,例如使用137B参数大语言模型进行少样本学习。本文挑战了将查询与问题等同的研究趋势,转而将查询生成任务概念化为高层意图向任务自适应查询的"编译"过程。具体而言,我们提出EGG查询生成器,能更好地适应BeIR基准测试中表达的广泛搜索意图。在四个具有未充分探索意图的任务上,我们的方法超越了基线模型和现有模型,同时使用的查询生成器参数量比先前最优方法小47倍。研究结果表明,通过显式搜索意图指导语言模型是构建高效查询生成器的关键要素。