Query expansion plays a crucial role in information retrieval, which aims to bridge the semantic gap between queries and documents to improve matching performance. This paper introduces LLM-QE, a novel approach that leverages Large Language Models (LLMs) to generate document-based query expansions, thereby enhancing dense retrieval models. Unlike traditional methods, LLM-QE designs both rank-based and answer-based rewards and uses these reward models to optimize LLMs to align with the ranking preferences of both retrievers and LLMs, thus mitigating the hallucination of LLMs during query expansion. Our experiments on the zero-shot dense retrieval model, Contriever, demonstrate the effectiveness of LLM-QE, achieving an improvement of over 8%. Furthermore, by incorporating answer-based reward modeling, LLM-QE generates more relevant and precise information related to the documents, rather than simply producing redundant tokens to maximize rank-based rewards. Notably, LLM-QE also improves the training process of dense retrievers, achieving a more than 5% improvement after fine-tuning. All codes are available at https://github.com/NEUIR/LLM-QE.
翻译:查询扩展在信息检索中起着至关重要的作用,其旨在弥合查询与文档之间的语义鸿沟,以提高匹配性能。本文介绍了LLM-QE,一种利用大语言模型(LLMs)生成基于文档的查询扩展的新方法,从而增强稠密检索模型的性能。与传统方法不同,LLM-QE设计了基于排序和基于答案的奖励,并使用这些奖励模型来优化LLMs,使其与检索器和LLMs的排序偏好对齐,从而减轻LLMs在查询扩展过程中的幻觉。我们在零样本稠密检索模型Contriever上的实验证明了LLM-QE的有效性,实现了超过8%的性能提升。此外,通过引入基于答案的奖励建模,LLM-QE生成了与文档更相关、更精确的信息,而不是简单地产生冗余标记以最大化基于排序的奖励。值得注意的是,LLM-QE也改善了稠密检索器的训练过程,在微调后实现了超过5%的性能提升。所有代码可在 https://github.com/NEUIR/LLM-QE 获取。