Conversational search allows a user to interact with a search system in multiple turns. A query is strongly dependent on the conversation context. An effective way to improve retrieval effectiveness is to expand the current query with historical queries. However, not all the previous queries are related to, and useful for expanding the current query. In this paper, we propose a new method to select relevant historical queries that are useful for the current query. To cope with the lack of labeled training data, we use a pseudo-labeling approach to annotate useful historical queries based on their impact on the retrieval results. The pseudo-labeled data are used to train a selection model. We further propose a multi-task learning framework to jointly train the selector and the retriever during fine-tuning, allowing us to mitigate the possible inconsistency between the pseudo labels and the changed retriever. Extensive experiments on four conversational search datasets demonstrate the effectiveness and broad applicability of our method compared with several strong baselines.
翻译:对话式搜索允许用户通过多轮交互与搜索系统进行互动。查询强烈依赖于对话上下文。一种提升检索效果的有效方法是利用历史查询来扩展当前查询。然而,并非所有历史查询都与当前查询相关且有用。本文提出了一种新方法,用于选择对当前查询有用的相关历史查询。为应对人工标注训练数据不足的问题,我们采用伪标注方法,根据历史查询对检索结果的影响程度来标注其有用性。这些伪标注数据被用于训练选择模型。进一步地,我们提出了一种多任务学习框架,在微调过程中联合训练选择器与检索器,从而缓解伪标签与变化后的检索器之间可能存在的不一致性。在四个对话式搜索数据集上的大量实验表明,与多个强基线相比,我们方法在有效性和广泛适用性上均表现优异。