Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable -- instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirror ranking relevance improvements and displays insensitivity or counter-intuitive behavior to instructions. Our results indicate that while users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions requiring greater sensitivity to instructions.
翻译:探索性搜索的特点是目标不明确且查询意图不断演变。在此类场景中,能够捕捉用户指定的查询意图细微差别并相应调整结果的检索模型是理想的——遵循指令的检索模型承诺提供这种能力。在本研究中,我们使用专家标注的测试集,针对普遍存在但尚未充分探索的方面条件种子引导探索应用,评估了指导性检索器。我们评估了近期为指令检索微调的大型语言模型,以及使用高性能Pairwise Ranking Prompting进行排序提示的通用大型语言模型。我们发现,最佳指导性检索器在排序相关性方面优于指令无关方法。然而,我们也发现,对用户体验至关重要的指令遵循性能并未反映排序相关性的改进,且对指令表现出不敏感或反直觉的行为。我们的结果表明,虽然用户使用当前指导性检索器可能比使用指令无关模型更有优势,但在需要更高指令敏感度的长期探索性会话中使用它们可能并无益处。