The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model both as a dense retriever (RepLLaMA) and as a pointwise reranker (RankLLaMA) for both passage retrieval and document retrieval using the MS MARCO datasets. Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models. Additionally, since LLMs can inherently handle longer contexts, they can represent entire documents holistically, obviating the need for traditional segmenting and pooling strategies. Furthermore, evaluations on BEIR demonstrate that our RepLLaMA-RankLLaMA pipeline exhibits strong zero-shot effectiveness. Model checkpoints from this study are available on HuggingFace.
翻译:多阶段文本检索的有效性在预训练语言模型时代之前就已得到充分证实。然而,现有研究大多使用早于大规模语言模型最新进展的模型。本研究旨在探索最先进的大规模语言模型可能带来的改进。我们进行了全面研究,在MS MARCO数据集上针对段落检索和文档检索任务,将最新的LLaMA模型分别微调为稠密检索器(RepLLaMA)和逐点重排序器(RankLLaMA)。研究结果表明,大规模语言模型的效果确实超越小型模型。此外,由于大规模语言模型天然能够处理更长的上下文,它们可以整体性地表示完整文档,从而无需传统分段和池化策略。进一步在BEIR上的评估表明,我们的RepLLaMA-RankLLaMA流水线展现出强大的零样本有效性。本研究的模型检查点已发布在HuggingFace上。