In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.
翻译:在本文中,我们展示了大语言模型(LLM)如何有效学习在回答特定问题时,仅当需要额外上下文时才使用现成的信息检索(IR)系统。鉴于IR系统的性能,问答任务的最优策略并非总是依赖外部信息检索,而往往需要利用LLM自身的参数化记忆。先前研究已在PopQA数据集中观察到这一现象:最热门的问题可通过LLM的参数化记忆有效解答,而冷门问题则需要借助IR系统。基于此,我们提出一种针对LLM的定制化训练方法,利用已有的开放域问答数据集。在该方法中,LLM被训练为在无法回答问题时生成特殊标记<RET>。我们在PopQA数据集上对自适应检索LLM(Adapt-LLM)的评估显示,相比同一LLM的三种配置((i)对所有问题检索信息,(ii)始终使用LLM的参数化记忆,(iii)基于流行度阈值决定是否使用检索器),Adapt-LLM均取得了改进。通过分析,我们证明Adapt-LLM能够在判定自身无法回答问题(即需要IR时)生成<RET>标记,同时在仅依赖参数化记忆时达到显著的高准确率水平。