Conversational information seeking (CIS) systems aim to model the user's information need within the conversational context and retrieve the relevant information. One major approach to modeling the conversational context aims to rewrite the user utterance in the conversation to represent the information need independently. Recent work has shown the benefit of expanding the rewritten utterance with relevant terms. In this work, we hypothesize that breaking down the information of an utterance into multi-aspect rewritten queries can lead to more effective retrieval performance. This is more evident in more complex utterances that require gathering evidence from various information sources, where a single query rewrite or query representation cannot capture the complexity of the utterance. To test this hypothesis, we conduct extensive experiments on five widely used CIS datasets where we leverage LLMs to generate multi-aspect queries to represent the information need for each utterance in multiple query rewrites. We show that, for most of the utterances, the same retrieval model would perform better with more than one rewritten query by 85% in terms of nDCG@3. We further propose a multi-aspect query generation and retrieval framework, called MQ4CS. Our extensive experiments show that MQ4CS outperforms the state-of-the-art query rewriting methods. We make our code and our new dataset of generated multi-aspect queries publicly available.
翻译:对话式信息检索系统旨在结合对话上下文对用户的信息需求进行建模,并检索相关信息。对对话上下文建模的一种主要方法是将对话中的用户话语重写为独立表达信息需求的查询。近期研究表明,通过相关术语扩展重写后的话语具有显著优势。本文提出假设:将话语信息分解为多方面重写查询能够带来更有效的检索性能。对于需要从不同信息源收集证据的复杂话语而言,这种优势更为明显,因为单一查询重写或查询表示难以捕捉话语的复杂性。为验证该假设,我们在五个广泛使用的对话式信息检索数据集上进行了大量实验,利用大语言模型为每个话语生成多方面查询,通过多重查询重写来表达信息需求。实验表明,对于85%的话语,相同检索模型在使用多个重写查询时,其nDCG@3指标表现更优。我们进一步提出了名为MQ4CS的多方面查询生成与检索框架。大量实验证明,MQ4CS在性能上超越了当前最先进的查询重写方法。我们已将代码及新构建的多方面查询生成数据集公开发布。