Researchers have successfully applied large language models (LLMs) such as ChatGPT to reranking in an information retrieval context, but to date, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. To address this significant shortcoming, we present RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. Experimental results on the TREC 2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter model, although our effectiveness remains slightly behind reranking with GPT-4. We hope our work provides the foundation for future research on reranking with modern LLMs. All the code necessary to reproduce our results is available at https://github.com/castorini/rank_llm.
翻译:摘要:研究人员已成功将ChatGPT等大语言模型(LLMs)应用于信息检索场景中的重排序任务,然而迄今为止,此类工作大多基于隐藏在不透明API端点后的专有模型构建。这种研究方法导致实验结果无法复现且呈现非确定性,威胁到在此类不稳固基础上建立的结论的真实性。为弥补这一重大缺陷,我们提出RankVicuna——首个能够在零样本场景下执行高质量列表式重排序的完全开源大语言模型。在TREC 2019和2020深度学习赛道上的实验结果表明,我们仅需使用参数规模小得多的7B模型即可取得与GPT-3.5零样本重排序相当的有效性,尽管与GPT-4的重排序效果相比仍有微小差距。我们期望这项工作能为基于现代大语言模型的重排序未来研究奠定基础。复现实验结果所需的所有代码已开源在https://github.com/castorini/rank_llm。