Despite the critical need to align search targets with users' intention, retrievers often only prioritize query information without delving into the users' intended search context. Enhancing the capability of retrievers to understand intentions and preferences of users, akin to language model instructions, has the potential to yield more aligned search targets. Prior studies restrict the application of instructions in information retrieval to a task description format, neglecting the broader context of diverse and evolving search scenarios. Furthermore, the prevailing benchmarks utilized for evaluation lack explicit tailoring to assess instruction-following ability, thereby hindering progress in this field. In response to these limitations, we propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks. Our approach focuses on user-aligned instructions tailored to each query instance, reflecting the diverse characteristics inherent in real-world search scenarios. Through experimental analysis, we observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts. This underscores potential overfitting issues inherent in constructing retrievers trained on existing instruction-aware retrieval datasets.
翻译:尽管将搜索目标与用户意图对齐至关重要,但检索器往往仅优先处理查询信息,而未深入探究用户预期的搜索语境。增强检索器理解用户意图与偏好的能力(类似于语言模型的指令)有望生成更契合的搜索结果。以往研究将指令在信息检索中的应用局限于任务描述格式,忽视了多样化且不断演变的搜索场景的广泛背景。此外,当前用于评估的基准缺乏针对指令跟随能力的专门设计,从而阻碍了该领域的进展。为解决这些局限,我们提出了一项新型基准INSTRUCTIR,专门用于评估信息检索任务中的指令跟随能力。我们的方法聚焦于为每个查询实例定制用户对齐的指令,以反映真实搜索场景中固有的多样性特征。通过实验分析,我们观察到,经微调以遵循任务式指令的检索器(如INSTRUCTOR)的性能可能低于未进行指令微调的同类模型。这凸显了在基于现有指令感知检索数据集训练检索器时存在的潜在过拟合问题。