Query expansion has been widely used to improve the search results of first-stage retrievers, yet its influence on second-stage, cross-encoder rankers remains under-explored. A recent work of Weller et al. [44] shows that current expansion techniques benefit weaker models such as DPR and BM25 but harm stronger rankers such as MonoT5. In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers? To answer this question, we first apply popular query expansion methods to state-of-the-art cross-encoder rankers and verify the deteriorated zero-shot performance. We identify two vital steps for cross-encoders in the experiment: high-quality keyword generation and minimal-disruptive query modification. We show that it is possible to improve the generalization of a strong neural ranker, by prompt engineering and aggregating the ranking results of each expanded query via fusion. Specifically, we first call an instruction-following language model to generate keywords through a reasoning chain. Leveraging self-consistency and reciprocal rank weighting, we further combine the ranking results of each expanded query dynamically. Experiments on BEIR and TREC Deep Learning 2019/2020 show that the nDCG@10 scores of both MonoT5 and RankT5 following these steps are improved, which points out a direction for applying query expansion to strong cross-encoder rankers.
翻译:查询扩展已被广泛用于提升第一阶段检索器的搜索结果,但其对第二阶段交叉编码排序器的影响仍鲜有研究。Weller等人[44]的最新工作表明,当前扩展技术虽能提升DPR、BM25等弱模型的效果,却会损害MonoT5等强排序器的性能。本文重新审视这一结论,并提出以下问题:查询扩展能否提升强交叉编码排序器的泛化能力?为解答此问题,我们首先将主流查询扩展方法应用于最先进的交叉编码排序器,验证了零样本性能下降现象。实验中发现交叉编码器关键依赖两个环节:高质量关键词生成与最小干扰式查询修改。研究表明,通过提示工程设计及融合扩展查询的排序结果,可以提升强神经排序器的泛化能力。具体而言,我们首先调用指令跟随语言模型,通过推理链生成关键词;进而利用自一致性与倒数排序权重,动态组合各扩展查询的排序结果。在BEIR和TREC Deep Learning 2019/2020数据集上的实验表明,遵循上述步骤的MonoT5和RankT5在nDCG@10指标上均有提升,这为强交叉编码排序器应用查询扩展指明了方向。