In conversational search, which aims to retrieve passages containing essential information, queries suffer from high dependency on the preceding dialogue context. Therefore, reformulating conversational queries into standalone forms is essential for the effective utilization of off-the-shelf retrievers. Previous methodologies for conversational query search frequently depend on human-annotated gold labels. However, these manually crafted queries often result in sub-optimal retrieval performance and require high collection costs. In response to these challenges, we propose Iterative Conversational Query Reformulation (IterCQR), a methodology that conducts query reformulation without relying on human oracles. IterCQR iteratively trains the QR model by directly leveraging signal from information retrieval (IR) as a reward. Our proposed IterCQR method shows state-of-the-art performance on two datasets, demonstrating its effectiveness on both sparse and dense retrievers. Notably, IterCQR exhibits robustness in domain-shift, low-resource, and topic-shift scenarios.
翻译:中文摘要:在旨在检索包含关键信息的对话式搜索中,查询高度依赖于前序对话上下文。因此,将对话查询重写为独立形式对于有效利用现成检索器至关重要。以往的对话查询搜索方法通常依赖人工标注的真实标签,然而这种人工构造的查询常导致次优的检索性能,且收集成本高昂。针对这些挑战,我们提出了一种无需人类引导即可进行查询重写的方法——迭代式对话查询重写(IterCQR)。该方法直接利用信息检索信号作为奖励来迭代训练查询重写模型。我们的IterCQR方法在两个数据集上取得了最先进的性能,在稀疏检索器和稠密检索器上均展现出有效性。值得注意的是,IterCQR在领域迁移、低资源场景和主题迁移环境下均表现出稳健性。