Generative retrieval (GR) differs from the traditional index-then-retrieve pipeline by storing relevance in model parameters and generating retrieval cues directly from the query, but it can be brittle out of domain and expensive to scale. We introduce QueStER (QUEry SpecificaTion for gEnerative Keyword-Based Retrieval), which bridges GR and query reformulation by learning to generate explicit keyword-based search specifications. Given a user query, a lightweight LLM produces a keyword query that is executed by a standard retriever (BM25), combining the generalization benefits of generative query rewriting with the efficiency and scalability of lexical indexing. We train the rewriting policy with reinforcement learning techniques. Across in- and out-of-domain evaluations, QueStER consistently improves over BM25 and is competitive with neural IR baselines, while maintaining strong efficiency.
翻译:生成式检索(GR)不同于传统的“索引-检索”流程,它将相关性信息存储在模型参数中,并直接从查询生成检索线索,但这种方法在领域外可能表现脆弱且扩展成本高昂。我们提出了QueStER(面向生成式关键词检索的查询规约),通过学习生成显式的基于关键词的搜索规约,桥接了生成式检索与查询重构。给定用户查询,一个轻量级大语言模型会生成关键词查询,随后由标准检索器(BM25)执行,从而将生成式查询重写的泛化优势与词汇索引的效率和可扩展性相结合。我们采用强化学习技术训练该重写策略。在领域内和领域外的评估中,QueStER始终优于BM25,并与神经信息检索基线方法性能相当,同时保持了高效的运行效率。