InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. https://github.com/searchivarius/inpars_light/

翻译：我们对InPars进行了复现研究，该方法是一种用于无监督训练神经排序器的技术（Bonifacio等，2022）。作为副产品，我们开发了InPars-light——对InPars的简单而有效的改进版本。与InPars不同，InPars-light使用7倍至100倍更小的排序模型，仅依赖免费可用的语言模型BLOOM——我们发现该模型比专有GPT-3模型能产生更准确的排序器。在所有五个英语检索集合（原始InPars研究所用）上，我们仅使用参数为3000万的六层MiniLM-30M排序器和单次三样本提示，就获得了相比BM25显著（7%-30%）且统计意义上的性能提升（以nDCG和MRR衡量）。相比之下，原始InPars研究中仅规模大100倍的monoT5-3B模型能持续优于BM25，而他们较小的monoT5-220M模型（仍比我们的MiniLM排序器大7倍）仅在MS MARCO和TREC DL 2020数据集上优于BM25。在相同的三样本提示场景下，我们参数为4.35亿的DeBERTA v3排序器与规模大7倍的monoT5-3B性能相当（相对BM25的平均增益分别为1.3和1.32）：实际上，在五个数据集的三个中，DeBERTA略优于monoT5-3B。最后，这些优异结果仅通过重排序100个候选文档即可实现，而Bonifacio等（2022）使用了1000个候选文档。我们认为InPars-light是首个真正经济型的基于提示的无监督方案，可用于训练和部署性能超越BM25的神经排序模型。我们的代码和数据已开源：https://github.com/searchivarius/inpars_light/