Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or pseudo-documents that serve as new queries, relying purely on the model's parametric knowledge or contextual information. The second applies reinforcement learning (RL) to fine-tune LLMs for query rewriting, directly optimizing retrieval metrics. While having respective advantages and limitations, the two approaches have not been compared under consistent experimental conditions. In this work, we present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks, including evidence-seeking, ad hoc, and tool retrieval. Our key finding is that simple, training-free query augmentation often performs on par with, or even surpasses, more expensive RL-based counterparts, especially when using powerful LLMs. Motivated by this discovery, we introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which, instead of rewriting a query, the LLM policy learns to generate a pseudo-document that maximizes retrieval performance, thus merging the flexibility and generative structure of prompting with the targeted optimization of RL. We show OPQE outperforms both standalone prompting and RL-based rewriting, demonstrating that a synergistic approach yields the best results. Our implementation is made available to facilitate reproducibility.
翻译:大型语言模型(LLM)的最新进展引发了信息检索(IR)领域对查询增强的广泛关注。目前主要出现了两种方法。第一种方法提示LLM生成答案或伪文档作为新查询,这完全依赖于模型的参数化知识或上下文信息。第二种方法应用强化学习(RL)对LLM进行微调以实现查询重写,直接优化检索指标。尽管这两种方法各有优势和局限性,但尚未在一致的实验条件下进行比较。在本工作中,我们首次对基于提示和基于RL的查询增强方法进行了系统比较,涵盖了包括证据检索、特定任务检索和工具检索在内的多种基准测试。我们的关键发现是,简单、无需训练的查询增强方法通常与更昂贵的基于RL的方法表现相当,甚至更优,尤其是在使用强大的LLM时。受此发现启发,我们提出了一种新颖的混合方法——在线策略伪文档查询扩展(OPQE)。该方法并非重写查询,而是让LLM策略学习生成一个能最大化检索性能的伪文档,从而将提示的灵活性和生成结构与RL的定向优化相结合。我们证明OPQE优于单独的提示方法和基于RL的重写方法,这表明协同方法能产生最佳结果。我们已公开实现代码以促进可复现性。