Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information. In this work, we propose search-augmented instruction learning (SAIL), which grounds the language generation and instruction following abilities on complex search results generated by in-house and external search engines. With an instruction tuning corpus, we collect search results for each training case from different search APIs and domains, and construct a new search-grounded training set containing \textit{(instruction, grounding information, response)} triplets. We then fine-tune the LLaMA-7B model on the constructed training set. Since the collected results contain unrelated and disputing languages, the model needs to learn to ground on trustworthy search results, filter out distracting passages, and generate the target response. The search result-denoising process entails explicit trustworthy information selection and multi-hop reasoning, since the retrieved passages might be informative but not contain the instruction-following answer. Experiments show that the fine-tuned SAIL-7B model has a strong instruction-following ability, and it performs significantly better on transparency-sensitive tasks, including open-ended question answering and fact checking.
翻译:大型语言模型(LLMs)通过指令微调已取得显著改进,但仍缺乏透明性以及利用实时知识和信息的能力。本文提出搜索增强指令学习(SAIL),将语言生成与指令跟随能力建立在由内部及外部搜索引擎生成的复杂搜索结果之上。基于指令微调语料库,我们从不同搜索API和领域为每个训练案例收集搜索结果,构建包含\textit{(指令、依据信息、响应)}三元组的新搜索增强训练集,并在此训练集上对LLaMA-7B模型进行微调。由于收集的结果包含无关及矛盾表述,模型需学习依据可信搜索结果、过滤干扰段落并生成目标响应。这一搜索结果去噪过程要求显式的可信信息选择与多跳推理,因为检索段落虽具信息量却未必包含指令跟随答案。实验表明,微调后的SAIL-7B模型具备强大的指令跟随能力,并在对透明度敏感的任务(包括开放式问答与事实核查)中表现显著提升。