Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Instruction-following capabilities in large language models (LLMs) have significantly progressed, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes. Moreover, although instruction fine-tuning and increased model size lead to better performance, most models fall short of achieving comprehensive instruction compliance as assessed by our benchmark.

翻译：大型语言模型（LLM）的指令遵循能力已取得显著进展，使得通过详细提示进行更复杂的用户交互成为可能。然而，检索系统尚未跟上这些进步，大多数仍依赖于传统的词汇和语义匹配技术，这些技术未能充分捕捉用户意图。近期研究引入了指令感知检索模型，但这些模型主要关注内在内容相关性，忽视了针对更广泛文档级属性的定制化偏好的重要性。本研究评估了各种检索模型在内容相关性之外的指令遵循能力，包括基于LLM的稠密检索和重排序模型。我们开发了InfoSearch，这是一个新颖的检索评估基准，涵盖六个文档级属性：受众、关键词、格式、语言、长度和来源，并引入了新的评估指标——严格指令遵循率（SICR）和加权指令敏感性评估（WISE），以准确评估模型对指令的响应能力。我们的研究结果表明，尽管重排序模型在指令遵循方面通常优于检索模型，但在处理某些属性时仍面临挑战。此外，虽然指令微调和增大模型规模能带来更好的性能，但根据我们的基准评估，大多数模型仍未能实现全面的指令遵循。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日