Sparse and dense pseudo-relevance feedback (PRF) approaches perform poorly on challenging queries due to low precision in first-pass retrieval. However, recent advances in neural language models (NLMs) can re-rank relevant documents to top ranks, even when few are in the re-ranking pool. This paper first addresses the problem of poor pseudo-relevance feedback by simply applying re-ranking prior to query expansion and re-executing this query. We find that this change alone can improve the retrieval effectiveness of sparse and dense PRF approaches by 5-8%. Going further, we propose a new expansion model, Latent Entity Expansion (LEE), a fine-grained word and entity-based relevance modelling incorporating localized features. Finally, we include an "adaptive" component to the retrieval process, which iteratively refines the re-ranking pool during scoring using the expansion model, i.e. we "re-rank - expand - repeat". Using LEE, we achieve (to our knowledge) the best NDCG, MAP and R@1000 results on the TREC Robust 2004 and CODEC adhoc document datasets, demonstrating a significant advancement in expansion effectiveness.
翻译:稀疏与稠密伪相关反馈(Pseudo-Relevance Feedback, PRF)方法在处理复杂查询时因首轮检索精度低而表现不佳。然而,神经语言模型(Neural Language Models, NLMs)的最新进展能够将相关文档重排序至前列,即便重排序池中仅有少量相关文档。本文首先通过简单地在查询扩展前应用重排序并重新执行该查询来解决伪相关反馈性能低下的问题。我们发现,仅此改进即可将稀疏与稠密PRF方法的检索效果提升5-8%。进一步地,我们提出了一种新的扩展模型——潜在实体扩展(Latent Entity Expansion, LEE),该模型结合局部特征,实现了基于词汇与实体的细粒度相关性建模。最后,我们在检索过程中引入"自适应"组件,在评分过程中利用扩展模型迭代优化重排序池,即执行"重排序—扩展—重复"循环。基于LEE方法,我们在TREC Robust 2004与CODEC ad hoc文档数据集上取得了(据我们所知)最优的NDCG、MAP及R@1000指标,证明了扩展效能的显著提升。