Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover, it raises the concern that the current research findings only hold for GPT models but not LLM in general. In this work, we lift this pre-condition and build for the first time effective listwise rerankers without any form of dependency on GPT. Our passage retrieval experiments show that our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4. Our results also show that the existing training datasets, which were expressly constructed for pointwise ranking, are insufficient for building such listwise rerankers. Instead, high-quality listwise ranking data is required and crucial, calling for further work on building human-annotated listwise data resources.
翻译:基于大语言模型(LLM)的列表式重排序器是零样本场景下的最优方法。然而,当前该方向的研究均依赖于GPT模型,导致科学可复现性存在单点故障。更令人担忧的是,现有研究发现可能仅适用于GPT模型,而非通用的大语言模型。本研究突破这一前提限制,首次构建了完全不依赖GPT的高效列表式重排序器。段落检索实验表明,我们最优的列表式重排序器相较基于GPT-3.5的同类方法性能提升13%,并达到基于GPT-4方法97%的有效性。实验结果同时揭示:现有专为逐点排序构建的训练数据集不足以支持此类列表式重排序器的开发。相反,高质量的列表式排序数据不可或缺且至关重要,这呼吁学界后续开展人工标注列表式数据资源的建设工作。