Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or pseudo-documents that serve as new queries, relying purely on the model's parametric knowledge or contextual information. The second applies reinforcement learning (RL) to fine-tune LLMs for query rewriting, directly optimizing retrieval metrics. While having respective advantages and limitations, the two approaches have not been compared under consistent experimental conditions. In this work, we present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks, including evidence-seeking, ad hoc, and tool retrieval. Our key finding is that simple, training-free query augmentation often performs on par with, or even surpasses, more expensive RL-based counterparts, especially when using powerful LLMs. Motivated by this discovery, we introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which, instead of rewriting a query, the LLM policy learns to generate a pseudo-document that maximizes retrieval performance, thus merging the flexibility and generative structure of prompting with the targeted optimization of RL. We show OPQE outperforms both standalone prompting and RL-based rewriting, demonstrating that a synergistic approach yields the best results. Our implementation is made available to facilitate reproducibility.
翻译:大型语言模型(LLM)的最新进展引发了信息检索(IR)领域对查询增强的广泛关注。目前主要形成了两种方法。第一种方法提示LLM生成答案或伪文档作为新查询,这完全依赖于模型的参数化知识或上下文信息。第二种方法应用强化学习(RL)对LLM进行微调以实现查询重写,直接优化检索指标。尽管这两种方法各有优势和局限,但尚未在一致的实验条件下进行过比较。在本工作中,我们首次在多种基准测试(包括证据寻求、特定主题检索和工具检索)中,对基于提示的方法和基于RL的查询增强进行了系统比较。我们的关键发现是,简单、无需训练的查询增强方法通常与更昂贵的基于RL的方法表现相当,甚至更优,尤其是在使用强大的LLM时。受此发现启发,我们提出了一种新颖的混合方法——在线策略伪文档查询扩展(OPQE)。该方法不是重写查询,而是让LLM策略学习生成一个能最大化检索性能的伪文档,从而将提示的灵活性和生成结构与RL的有针对性优化相结合。我们证明OPQE优于独立的提示方法和基于RL的重写方法,这表明协同方法能产生最佳结果。我们公开了实现代码以促进可复现性。