We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. https://github.com/searchivarius/inpars_light/
翻译:我们对InPars进行了复现研究,该方法是一种用于无监督训练神经排序器的技术(Bonifacio等,2022)。作为副产品,我们开发了InPars-light——对InPars的简单而有效的改进版本。与InPars不同,InPars-light使用7倍至100倍更小的排序模型,仅依赖免费可用的语言模型BLOOM——我们发现该模型比专有GPT-3模型能产生更准确的排序器。在所有五个英语检索集合(原始InPars研究所用)上,我们仅使用参数为3000万的六层MiniLM-30M排序器和单次三样本提示,就获得了相比BM25显著(7%-30%)且统计意义上的性能提升(以nDCG和MRR衡量)。相比之下,原始InPars研究中仅规模大100倍的monoT5-3B模型能持续优于BM25,而他们较小的monoT5-220M模型(仍比我们的MiniLM排序器大7倍)仅在MS MARCO和TREC DL 2020数据集上优于BM25。在相同的三样本提示场景下,我们参数为4.35亿的DeBERTA v3排序器与规模大7倍的monoT5-3B性能相当(相对BM25的平均增益分别为1.3和1.32):实际上,在五个数据集的三个中,DeBERTA略优于monoT5-3B。最后,这些优异结果仅通过重排序100个候选文档即可实现,而Bonifacio等(2022)使用了1000个候选文档。我们认为InPars-light是首个真正经济型的基于提示的无监督方案,可用于训练和部署性能超越BM25的神经排序模型。我们的代码和数据已开源:https://github.com/searchivarius/inpars_light/