CIS is a prominent area in IR which focuses on developing interactive knowledge assistants. These systems must adeptly comprehend the user's information requirements within the conversational context and retrieve the relevant information. To this aim, the existing approaches model the user's information needs by generating a single query rewrite or a single representation of the query in the query space embedding. However, to answer complex questions, a single query rewrite or representation is often ineffective. To address this, a system needs to do reasoning over multiple passages. In this work, we propose using a generate-then-retrieve approach to improve the passage retrieval performance for complex user queries. In this approach, we utilize large language models (LLMs) to (i) generate an initial answer to the user's information need by doing reasoning over the context of the conversation, and (ii) ground this answer to the collection. Based on the experiments, our proposed approach significantly improves the retrieval performance on TREC iKAT 23, TREC CAsT 20 and 22 datasets, under various setups. Also, we show that grounding the LLM's answer requires more than one searchable query, where an average of 3 queries outperforms human rewrites.
翻译:对话信息检索(CIS)是信息检索领域的一个重要方向,其核心在于开发交互式知识助手。这类系统必须能够准确理解对话情境中用户的信息需求,并检索出相关信息。为实现这一目标,现有方法通常通过生成单一查询重写或在查询空间嵌入中生成单一查询表示来建模用户信息需求。然而,对于复杂问题,单一的查询重写或表示往往效果有限。为解决此问题,系统需对多篇文本进行推理。在本研究中,我们提出一种“生成后检索”方法,以提升针对复杂用户查询的文本检索性能。该方法利用大语言模型(LLMs)实现两个目标:(i)通过对对话上下文进行推理,生成满足用户信息需求的初步答案;(ii)将该答案与文档集进行关联。实验结果表明,在不同设置下,我们提出的方法在TREC iKAT 23、TREC CAsT 20和22数据集上显著提升了检索性能。此外,我们发现将LLM生成的答案与文档集关联需要多个可检索查询,平均使用3个查询的效果优于人工重写结果。