Pseudo-relevance feedback (PRF) is a classical approach to address lexical mismatch by enriching the query using first-pass retrieval. Moreover, recent work on generative-relevance feedback (GRF) shows that query expansion models using text generated from large language models can improve sparse retrieval without depending on first-pass retrieval effectiveness. This work extends GRF to dense and learned sparse retrieval paradigms with experiments over six standard document ranking benchmarks. We find that GRF improves over comparable PRF techniques by around 10% on both precision and recall-oriented measures. Nonetheless, query analysis shows that GRF and PRF have contrasting benefits, with GRF providing external context not present in first-pass retrieval, whereas PRF grounds the query to the information contained within the target corpus. Thus, we propose combining generative and pseudo-relevance feedback ranking signals to achieve the benefits of both feedback classes, which significantly increases recall over PRF methods on 95% of experiments.
翻译:伪相关反馈(PRF)是一种经典方法,通过利用首轮检索结果扩充查询来解决词汇不匹配问题。此外,近期关于生成式相关反馈(GRF)的研究表明,利用大语言模型生成的文本进行查询扩展,可在不依赖首轮检索效果的情况下提升稀疏检索性能。本研究将GRF扩展至稠密与学习型稀疏检索范式,并在六个标准文档排序基准上开展实验。实验发现,GRF在精度与召回导向指标上均比同类PRF技术提升约10%。然而,查询分析表明GRF与PRF具有互补优势:GRF提供首轮检索中不存在的外部语境,而PRF则将查询锚定于目标语料库所含信息。为此,我们提出融合生成式与伪相关反馈排序信号的方法,以同时获取两类反馈的收益——该方法在95%的实验场景中显著提升PRF方法的召回率。