FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Modern Large Language Models (LLMs) are capable of following long and complex instructions that enable a diverse amount of user tasks. However, despite Information Retrieval (IR) models using LLMs as the backbone of their architectures, nearly all of them still only take queries as input, with no instructions. For the handful of recent models that do take instructions, it's unclear how they use them. We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions. FollowIR builds off the long history of the TREC conferences: as TREC provides human annotators with instructions (also known as narratives) to determine document relevance, so should IR models be able to understand and decide relevance based on these detailed instructions. Our evaluation benchmark starts with three deeply judged TREC collections and alters the annotator instructions, re-annotating relevant documents. Through this process, we can measure how well IR models follow instructions, through a new pairwise evaluation framework. Our results indicate that existing retrieval models fail to correctly use instructions, using them for basic keywords and struggling to understand long-form information. However, we show that it is possible for IR models to learn to follow complex instructions: our new FollowIR-7B model has significant improvements (over 13%) after fine-tuning on our training set.

翻译：摘要：现代大型语言模型（LLMs）能够遵循复杂的长指令，从而支持多样化的用户任务。然而，尽管信息检索（IR）模型以LLMs为骨干架构，但几乎所有模型仍仅将查询作为输入，而未引入指令。对于少数已采纳指令的近期模型而言，其具体使用方式尚不明确。我们提出了数据集FollowIR，其中包含一套严格的指令评估基准，以及用于帮助IR模型学习更好遵循真实世界指令的训练集。FollowIR基于TREC会议的长期积累：正如TREC为人工标注者提供指令（又称叙述）以判定文档相关性，IR模型也应能基于这些详细指令理解并决定相关性。我们的评估基准首先选取三组经深度评判的TREC数据集，通过修改标注指令并重新标注相关文档，建立能够衡量IR模型遵循指令能力的全新配对评估框架。实验表明，现有检索模型未能正确使用指令——它们仅将其视为基础关键词，而难以处理长文本信息。然而，我们证明IR模型能够通过学习掌握复杂指令：经过在我们训练集上微调后的全新FollowIR-7B模型，其性能提升显著（超过13%）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日