Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder. Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker. Furthermore, we find that simple heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance by several nDCG points. The search agent based on HRE (HARE) matches state-of-the-art performance, balanced in both zero-shot and in-domain evaluations, via interpretable actions, and at twice the speed.
翻译:学习搜索是构建能够自主使用搜索框查找信息的人工智能体的任务。目前研究表明,现有语言模型可学习符号化查询重构策略,结合传统基于词项的检索方法,但仍未能超越神经检索器的性能。我们将先前的学习搜索框架扩展至混合环境——该环境在通过双编码器完成初步检索后,支持离散式查询精化操作。在BEIR任务上的实验表明,通过行为克隆训练的搜索代理性能优于基于双编码检索器与交叉编码重排序器组合的基础检索系统。此外,我们发现简单的启发式混合检索环境(HRE)可将基线性能提升数个nDCG点。基于HRE的搜索代理(HARE)在零样本与领域内评估中均达到平衡的最优性能,其动作具有可解释性且速度提升两倍。