Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a ``rewrite-then-edit'' process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.
翻译:查询重写在将依赖上下文的用户查询转化为独立形式方面发挥着关键作用,从而增强对话式搜索。现有方法主要利用人工重写的查询作为标签来训练查询重写模型。然而,人工重写可能缺乏足够的信息以实现最佳检索性能。为克服这一局限,我们提出利用大型语言模型作为查询重写器,通过精心设计的指令生成信息性查询重写。我们定义了高质量重写的四项基本属性,并将其全部融入指令中。此外,当初始查询重写可用时,我们引入LLM的重写编辑器角色,形成“先重写后编辑”的流程。进一步地,我们提出将LLM的重写能力蒸馏到更小的模型中,以减少重写延迟。在QReCC数据集上的实验评估表明,相较于人工重写,信息性查询重写可显著提升检索性能,尤其是在使用稀疏检索器时。